Database and Warehouse Concepts: July 2014

Thursday, 31 July 2014

Dimensional and Relational: Not Opposites

A common misconception holds that the terms dimensional and relational are opposites. They are not.

The word "dimensional" describes a design method. The word "relational" describes a data storage technology.

A dimensional model is a design approach that describes a process in terms of measurements (known as facts) and their context (dimensions)
A star schema is a dimensional model implemented using relational storage technology—that is, in a relational database (RDBMS)
A cube is a dimensional model implemented using multi-dimensional storage technology—that is, in a multidimensional database (MDB)

This simple diagram illustrates these concepts:

As you can see from the diagram, a star schema is both relational and dimensional. So is a snowflake schema.

By the way, don't let this confuse you: most modern day DBMS products accommodate both kinds of storage.

Monday, 28 July 2014

Is snapshot an aggregate

**This information is gathered from blogs of oaktonsoftware.It's purely for interesting reads and not of any commercial use.

Snapshot Fact Tables

A snapshot is a fact table that periodically samples a measurement. A snapshot always includes at least one measurement of status, such as a balance or level. A transaction fact table, in contrast, captures measurements as they occur.

The snapshot is useful for describing the effect of transactions, rather than transactions themselves. The best example is your bank account. Its all well and good if you have a record of each deposit, check, interest payment, fee, direct debit, and so forth. But what is your balance? Aggregating all your transactions is not the most expedient way to figure that out.

Sometimes Snapshots Summarize Transactions, but not Always

People sometimes think of snapshots as aggregates because they can be constructed from a table that contains transactions. That is certainly true in the case of account balances. If you have the transactions, you can compute the snapshots.

In this sense, you might be justified in thinking of a snapshot as an aggregate. But I don't. I'll explain why in just a second.

First, though, it is important to understand that not all snapshots are constructed from transactions. Sometimes, the number of transactions is too numerous to keep around. But in other cases,there simply aren't any. For example, we can't make use of transactions to describe the ambient temperature on the factory floor, or the level of pressure inside an industrial device. In these cases, the snapshot is clearly not a summary.

It really depends on how you define an aggregate.

The word aggregate is normally reserved for a summary table that does not transform the structure of its corresponding base table. For example, suppose account transactions are recorded in a fact table, with a fact called transaction_amount. A monthly aggregate would have exactly the same fact. Still called transaction_amount, is the same as the fact in the base table. The only difference is in level of detail: it represents a month's worth of transactions, rather than individual transactions. It is the same measurement.

Account balance (or other status measurements like levels) is a not the same as what the base fact table records. It is not an aggregation of transaction that occurred during the snapshot period; it is something more. It describes the net effect of the account transactions. It is a different measurement.

Conclusion

I use the term derived table to describe any table in a dimensional design uses another as its source. I save the word aggregate for non-transformative summaries.

To sum things up, then:

Yes, a snapshot sometimes summarizes a transaction fact table.

But if you accept these definitions for aggregate tables and derived tables, the answer is different:

A snapshot is is sometimes a derived table, but never an aggregate.

Tuesday, 22 July 2014

Set Theory of JOINS

Those who are not familiar with the JOINS concept.Let's go back to the school.We had SET Theory in Mathematics.You might have heard like UNION,INTERSECT,MINUS.If you digest these things,JOINS is not a matter,but a syntax.

Let's Consider two sets,
A={1,2,3,4,5} B={1,3,5,7}

A UNION B ={1,2,3,4,5,7} --Just clubs all the elements without duplicates.

A UNION ALL B={1,1,2,3,3,4,5,5,7} ---Same as UNION,but with duplicates

A INTERSECT B={1,3,5} ----figures out only the common values in the sets.

A MINUS B={2,4} -- figures out the values which are common in A and B but not common in A.

B MINUS A={7} -- figures out the values which are common in A and B but not common in B.

Here,we have deptno as the common column between EMP and DEPT tables.Also,observe the distinct department numbers that the
tables have.
EMP table has {10,20,30,50}
DEPT table has {10,20,30,40}

If we apply the same mathematics here,then it really works.

UNION operator

SELECT DEPTNO FROM EMP
UNION
SELECT DEPTNO FROM DEPT;

Output:
10,20,30,40,50

UNION ALL operator

SELECT DEPTNO FROM EMP
UNION ALL
SELECT DEPTNO FROM DEPT;

Output:Unlike UNION,Combines all the departments without eliminating the duplicates.

INTERSECT operator
SELECT DEPTNO FROM EMP
INTERSECT
SELECT DEPTNO FROM DEPT;

Output:10,20,30

MINUS operator
SELECT DEPTNO FROM EMP
MINUS
SELECT DEPTNO FROM DEPT;
Output:50

SELECT DEPTNO FROM DEPT
MINUS
SELECT DEPTNO FROM EMP;
Output:40

Example:If you want to get the all the details of employees who has common departments between the tables.

INTERSECT vs INNER JOIN

SELECT * FROM EMP WHERE DEPTNO
IN
(
SELECT DEPTNO FROM EMP
INTERSECT
SELECT DEPTNO FROM DEPT
);

This is equivalent to INNER JOIN

SELECT * FROM EMP INNER JOIN DEPT ON EMP.DEPTNO=DEPTNO.DEPTNO;

INNER JOIN plays the role of INTERSECT

Like wise,

A LEFT OUTER JOIN B means

A INTERSECT B UNION A MINUS B

How to calculate Nth highest/lowest salaries without using Analytical functions

Nth Highest Salary:

SELECT EMPNO,ENAME,SAL FROM EMP X WHERE &N =

(SELECT COUNT(DISTINCT SAL) FROM EMP WHERE SAL >=X.SAL);

Nth Lowest Salary:

SELECT EMPNO,ENAME,SAL FROM EMP X WHERE &N =

(SELECT COUNT(DISTINCT SAL) FROM EMP WHERE SAL <=X.SAL);

How to find second highest/lowest salary

Second Highest:
SELECT T1.EMPNO,T1.SAL,T1.RNK
FROM

(SELECT
EMPNO,SAL,
RANK() OVER(ORDER BY SAL DESC) RNK
FROM
EMP) T1

WHERE T1.RNK=2

Second Lowest:
SELECT T1.EMPNO,T1.SAL,T1.RNK
FROM

(SELECT
EMPNO,SAL,
RANK() OVER(ORDER BY SAL ASC) RNK
FROM
EMP) T1

WHERE T1.RNK=2

Display the Total salaries job wise and department wise in a matrix style report

SELECT

JOB,

SUM(DECODE(DEPTNO,10,SAL)) DEPT10,

SUM(DECODE(DEPTNO,20,SAL)) DEPT20,

SUM(DECODE(DEPTNO,30,SAL)) DEPT30,

SUM(SAL) TOTAL

FROM EMP

GROUP BY JOB;

Tuesday, 15 July 2014

FULL OUTER JOIN concept

I have 3 tables with some values.Some of the ID's are present in Table A,Some in Table B and other in Table C.I just want to know,what are the values present in one table are not present in another table.

SELECT A.ID,B.ID,C.ID

FROM A FULL OUTER JOIN B ON A.ID=B.ID

FULL OUTER JOIN C ON A.ID=C.ID OR B.ID=C.ID

ORDER BY A.ID,B.ID,C.ID

It is easy only when you have understanding of JOINS concept in SQL.

Monday, 14 July 2014

Handling NULL's

I have a table with values below and want to delete the entire data except ID=2

SELECT * FROM ID;

DELETE FROM ID WHERE ID<>2 OR ID IS NULL;
(or)
DELETE FROM ID WHERE COALESCE(ID,0)<>2 ;

Most committed mistake:
DELETE FROM ID WHERE ID<>2 ;

Saturday, 12 July 2014

Count of NULL values in each column in a table

Display the count of all NULL values under each column in a table.

I have a table with random values ,but some of the values under each columns as NULL's.

Input: Output:

SELECT 'ID' AS COLNAME ,COUNT(*) FROM ABC WHERE ID IS NULL

UNION ALL

SELECT 'NAME' AS COLNAME ,COUNT(*) FROM ABC WHERE NAME IS NULL

UNION ALL

SELECT 'HIREDATE' AS COLNAME ,COUNT(*) FROM ABC WHERE HIREDATE IS NULL

Note:UNION ALL is used just to club the result fetched from each query.You can use UNION as well,but UNION checks for any duplicate rows.Since we know that there wont be any duplicate rows then we can use straight away UNION ALL in order to avoid the duplicate check.

Swapping the values

I have the data in the table with their gender's reversed as below.We need to regain .

Input Tablename: SWAP Output:SWAP

UPDATE SWAP

SET GENDER=CASE

WHEN GENDER='M' THEN 'F'

WHEN GENDER='F' THEN 'M'

END;

TRUNCATE versus DELETE

AVG() function

Calculate the average commission without using AVG()

In general,we use

SELECT AVG(COMM) FROM EMP;

But here,there are NULL values under the comm.

SELECT SUM(COMM)/COUNT(COMM) FROM EMP;

There is a chance to commit mistake here.Some people write

SELECT SUM(COMM)/COUNT(*) FROM EMP;

Here COUNT(*) gives 14 and the average value changes,which is incorrect.

RANK() versus DENSE RANK()

Both of them are OLAP(Online Analytical Processing) functions.There is a difference between these two functions in assigning the ranks.

RANK function leaves the gaps between ranks when there is a same rank applied for multiple rows.

DENSE RANK functions applies the rank without leaving any gaps,though the same rank is applied for multiple rows.

SELECT EMPNO,ENAME,SAL,

RANK() OVER(ORDER BY SAL DESC) AS RANK,

DENSE_RANK() OVER(ORDER BY SAL DESC) AS DENSE_RANK

FROM EMP;

Note:The difference is precisely highlighted in the adjacent figure.

Same function can be applicable for Teradata database in TD14.

Find the repeated salaries

Display the details of the employees who are earning same salaries.

SELECT * FROM EMP WHERE SAL IN
(
SELECT SAL FROM EMP GROUP BY SAL HAVING COUNT(*)>1
);

--HAVING is used to filter the data only after GROUPing of data.

We cannot use the Aggregate functions in WHERE clause. eg: WHERE COUNT(*)=1 ,WHERE SUM(SAL)=1000 etc.

Multicolumn Subquery

Display the employee details who are earning maximum salaries of each department.

SELECT * FROM EMP WHERE(DEPNO,SAL) IN

(

SELECT DEPTNO,MAX(SAL) FROM EMP GROUP BY DEPTNO

);

Note:Here subquery plays an important role.

The columns in the subquery must be present in the WHERE clause of the outer query.

ACID Properties

ACID properties:

Atomicity: Transaction can be 0 or 1. All or None.

Eg: Suppose you are transferring funds from your account to another account,the transaction should be successful in such a way that the amount should be debited from your account and credited into another account.Then you can say the transaction is complete.It is not like the amount got debited from your account and not credited into another account or vice versa. In that case,the transaction gets rollback and transaction is unsuccessful.

Consistency:All the executed transactions data should be consistently maintained in the database.It's never left in half-finished state.

Isolation:All the transactions executes concurrently without any intervention of other transactions.

Durability: Database keeps track of pending changes in such a way that the server can recover from an abnormal termination.

Using SELF JOIN

Display the Employee Name and his Manager's Name from the below employee table:

Using SELF JOIN:
SELECT

WORKER.ENAME,MANAGER.ENAME
FROM EMP WORKER,EMP MANAGER
WHERE WORKER.MGR=MANAGER.EMPNO;

Let's take an example,empno=7499 is ALLEN.
His Manager Id is 7698.Taking this manager id and check the same in empno column and get the ename of that empid.

Calculate TOP 3 salaries from employee table

Employee Table:

Inline View:
SELECT SAL FROM
(SELECT DISTINCT SAL FROM EMP ORDER BY SAL DESC)
WHERE ROWNUM<=3

DENSE_RANK:

SELECT DISTINCT SAL,RANK FROM
(SELECT SAL,DENSE_RANK() OVER(ORDER BY SAL DESC) RANK FROM EMP )
WHERE RANK<=3

Friday, 11 July 2014

Using COUNT function properly

I have a table with a single column ID with the below values.Here the '-' indicates Null value.
What would be the output of :

SELECT COUNT(*),COUNT(ID),COUNT(1) FROM ID;

Using GROUP BY,HAVING,DISTINCT

Using GROUP BY,HAVING,DISTINCT

I have the data as below:

I need to get the employees who owns multiple cars and multiple distinct cars.

The above data says the cars owned by the employees in a company.An employee has multiple cars.It can be same car or different car.

Multiple Cars:

SELECT ID FROM CARS GROUP BY ID HAVING COUNT(*)>1;

--Gives 2,4,5,6

Multiple Distinct Cars:

SELECT ID FROM CARS GROUP BY ID HAVING COUNT(DISTINCT CAR)>1;

--Gives 2 and 4

Pages