Database and Warehouse Concepts

Monday, 29 June 2015

Pivoting a Result Set into One Row – Columns to row

Problem

You wish to take values from groups of rows and turn those values into columns in a single row per group. For example, you have a result set displaying the number of employees in each department:

DEPTNO	CNT
10	3
20	5
30	6

You would like to reformat the output such the result set looks as follows:

DEPTNO_10	DEPTNO_20	DEPTNO_30
3	5	6

Solution

Transpose the result set using a CASE expression and the aggregate function SUM:

select sum(case when deptno=10 then 1 else 0 end) as deptno_10,
sum(case when deptno=20 then 1 else 0 end) as deptno_20,
sum(case when deptno=30 then 1 else 0 end) as deptno_30
from emp

Discussion

This example is an excellent introduction to pivoting. The concept is simple: for each row returned by the unpivoted query, use a CASE expression to separate the rows into columns. Then, because this particular problem is to count the number of employees per department, use the aggregate function SUM to count the occurrence of each DEPTNO. If you’re having trouble understanding how this works exactly, execute the query with the aggregate function SUM and include DEPTNO for readability:

select deptno,
case when deptno=10 then 1 else 0 end as deptno_10,
case when deptno=20 then 1 else 0 end as deptno_20,
case when deptno=30 then 1 else 0 end as deptno_30
from emp
order by 1

DEPTNO	DEPTNO_10	DEPTNO_20	DEPTNO_30
10	1	0	0
10	1	0	0
10	1	0	0
20	0	1	0
20	0	1	0
20	0	1	0
20	0	1	0
30	0	0	1
30	0	0	1
30	0	0	1
30	0	0	1
30	0	0	1
30	0	0	1

You can think of each CASE expression as a flag to determine which DEPTNO a row belongs to. At this point, the “rows to columns” transformation is already done; the next step is to simply sum the values returned by DEPTNO_10, DEPTNO_20, and DEPTNO_30, and then to group by DEPTNO. Following are the results:

select deptno,
sum(case when deptno=10 then 1 else 0 end) as deptno_10,
sum(case when deptno=20 then 1 else 0 end) as deptno_20,
sum(case when deptno=30 then 1 else 0 end) as deptno_30
from emp
group by deptno

DEPTNO	DEPTNO_10	DEPTNO_20	DEPTNO_30
10	3	0	0
20	0	5	0
30	0	0	6

If you eyeball this result set, you see that logically the output makes sense; for example, DEPTNO 10 has 3 employees in DEPTNO_10 and zero in the other departments. Since the goal is to return one row, the last step is to lose the DEPTNO and GROUP BY, and simply sum the CASE expressions:

DEPTNO_10	DEPTNO_20	DEPTNO_30
3	5	6

Following is another approach that you may sometimes see applied to this same sort of problem:

select max(case when deptno=10 then empcount else null end) as deptno_10
max(case when deptno=20 then empcount else null end) as deptno_20,
max(case when deptno=10 then empcount else null end) as deptno_30
from (
select deptno, count(*) as empcount
from emp
group by deptno
) x

This approach uses an inline view to generate the employee counts per department. CASE expressions in the main query translate rows to columns, getting you to the following results:

DEPTNO_10	DEPTNO_20	DEPTNO_30
3	NULL	NULL
NULL	5	NULL
NULL	NULL	6

Then the MAX functions collapses the columns into one row:

DEPTNO_10	DEPTNO_20	DEPTNO_30
3	5	6

6750 H Model

Today, the Teradata Active Enterprise Data Warehouse platform is the 6750H Model which contains 12 Core 2.7Ghz Xeon processors, combinations of solid state and hard disk drives with a T-Perf rating of 240 per node.

DBC Queries

To find the number of nodes

Select count(distinct nodeid) from dbc.resusagescpu

To find the number of AMP's

Select nodeid,count(distinct Vproc) from dbc.ResCpuUsageByAmp

SELECT HASHAMP()+1

Friday, 26 June 2015

FastLoad Has Some Limits

There are more reasons why FastLoad is so fast. Many of these become restrictions and therefore, cannot slow it down. For instance, can you imagine a sprinter wearing cowboy boots in a race? Of course, not! Because of its speed, FastLoad, too, must travel light! This means that it will have limitations that may or may not apply to other load utilities. Remembering this short list will save you much frustration from failed loads and angry colleagues. It may even foster your reputation as a smooth operator!

Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only

allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only data rows. A secondary index is stored in a subtable block and many times on a different AMP from the data row. This would slow FastLoad down and they would have to call it: get ready now, HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just drop them. You may easily recreate them after completing the load.

Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are

defined with Referential Integrity (RI). This would require too much system checking to prevent

referential constraints to a different table. FastLoad only does one table. In short, RI constraints will need to be dropped from the target table prior to the use of FastLoad.

Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay attention to the needs of other tables, which is what Triggers are all about. Additionally, these require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER the Triggers to the DISABLED status prior to using FastLoad.

Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that allow duplicate rows — that is when the values in every column are identical. When FastLoad finds duplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoad will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!

Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP must be repaired before the load process can be restarted. Other than this, FastLoad can recover from system glitches and perform restarts.

Rule #6: No more than one data type conversion is allowed per column during a FastLoad.

Why just one? Data type conversion is highly resource intensive job on the system, which requires a "search and replace" effort. And that takes more time. Enough said!

How FastLoad Works

What makes FastLoad perform so well when it is loading millions or even billions of rows? It is because FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiple sessions simultaneously, taking further advantage of Teradata's parallel processing.

This is different from BTEQ and TPump, which load data at the row level.It takes full advantage of Teradata's parallel architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module Processor — the software processor in Teradata responsible for reading and writing data to the disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad user in terms of awesome performance. Teradata is the only data warehouse loads data, processes data and backs up data in parallel.

Pages