Database and Warehouse Concepts

Friday, 26 June 2015

Space

Space (whether permanent, spool, or temporary) is measured in BYTES. The following table lists the three types of database space.


Type of Space	Description
Permanent (PERM)	Permanent space allocated to a user or database is a uniquely defined, logical repository for database objects, for example, tables, indexes, and journals. When an object is created or data rows are inserted, the system allocates space as needed from the PERM space of the immediate owner. The space is returned automatically when no longer needed. Note: Perm space allocated to a user or database that is currently unused is available for use as spool or temporary space. A database or user with no PERM space can still own views, macros, and triggers but cannot have objects that require space such as tables, UDFs, stored procedures, HIs, JIs, or journals.
Spool	Spool space holds intermediate query results or formatted answer sets to queries and volatile tables. The system can use unassigned PERM space for spool space. When a user creates new users or databases, the amount of spool space for those new objects must not exceed the spool space limit of their immediate owner. If you do not specify a spool space limit, a new user automatically inherits the spool limit of its parent. To more easily manage spool space, define the value for spool in a profile. You can then assign a profile to users and all of the users will inherit the spool space definition.
Temporary (TEMP)	Temp space defines the number of bytes the system will use to store data for global temporary tables. The value you specify must not exceed the value of the immediate parent at the time of creation. If you do not specify a value, the maximum value defaults to that of the parent. Note: Global temporary tables require both PERM and TEMP space. A minimum of 512 bytes of PERM space per AMP is required for the table header for the base global temporary table definition. Temp space is required for storing the actual rows. To more easily manage TEMP space, define the value for it in a profile. You can then assign a group of users to this profile and all of the users will inherit the TEMP space definition.

Problem Statement:

There are 3 users created under database which has PERM space 50 GB. Each user is allocated 5 GB PERM space & 20 GB Spool space.

Now, the PERM Space left in database- 35 GB.

I've created the another user for whom i allocated 20 GB PERM space.

Now, the PERM Space left in database- 15 GB ==> UNUSED space.

Even though i allocated 20 GB spool space to each user, if i submit a sql query which takes more than 15 GB should be allocated since the unused space is only 15 GB ??

Solution:

Spool memory = unused perm + spool assigned.

The particular user will have this much spool for his queries.

SPOOL is the currently not used perm space on system level, but not on user/database level.

The spool on a database level is only used for inheritance:

When there's no spool assigned for a new user he inherits the owner's spool size.

And when there's spool it may not exceed the owner's size.

Let us say if only 20 GB unused PERM space left at SYSTEM level. And there are 2 users in a system. If i run queries in both the users simultaneously and each is expected to take more than 15 GB spool space. In this scenario one query has to be aborted right?

The system will run out of spool and return a 2507 error: "Out of spool space on disk", this probably affects all active queries.

Thursday, 18 June 2015

Tracing Levels

Workflow and Task Status

Transformations that supports Sorted Input

1)Aggregator
2)Joiner
3)Look up

Stop vs Abort

Stop:
- The reading process will immediately be killed by using Stop command
- It does not have time out period

Abort:

- It gives a time out period of 60 seconds to the Informatica server.
- Otherwise it kills the DTM process.

List of output files during a session run

1. Session log
2. Badfile
3. Errors log
4. Workflow log

Mapping, Session, Worklet, Workflow, Mapplet

Mapping : Represents flow of data from source to destination

Session : A set of instructions which describes the data movement from source to destination

Worklet : Set of tasks

Workflow : A set of instructions that specifies the way to execute the tasks to Informatica

Mapplet : Used for creation and configuration of a group of transformations

factless fact table

A fact table with statistical information but without measures

It can view the number of occurring events

Example :
- A number of accidents occurred in a month.

ETL Architect Sample Questions

Depending on who's doing the interview and how serious they are:

-Describe advantages of the CIF architecture versus the bus architecture with conformed dimensions. Which would fit best in our environment given [some parameters they give you] and why
-Describe snowflaking
-Describe factless fact tables.
-Draw a star schema of our business
-Describe common optimization techniques applied at the data model level
-How do you handle data rejects in a warehouse architecture?
-Describe common techniques for loading from the staging area to the warehouse when you only have a small window.
-How do you load type 1 dimensions
-How do you load type 2 dimensions, and how would you load it given our [insert business particularity]
-How would you model unbalanced hierarchies
-How would you model cyclic relations
-What major elements would you include in an audit model?
-How would you implement traceability?

Wednesday, 17 June 2015

Load Duplicates and Unique values in separate targets

If the source contains Duplicates,then we need to separate all the duplicate and unique records

Source			TGT_DUP
col1	col2	col3	col1	col2	col3
a	b	c	a	b	c
a	b	c	m	n	o
d	e	f
m	n	o	TGT_UNIQUE
m	n	o	col1	col2	col3
			d	e	f

Sunday, 14 June 2015

Send Mail From Shell Script

Here we will see simple bash script to send emails using the mail command in linux operating system.

#!/bin/bash

TO_ADDRESS="recipient@domain.com"
FROM_ADDRESS="sender"
SUBJECT="Mail Server Hosting Demo"
BODY="This is a linux mail system. Linux is one of the email operating systems which can be used to send and receive emails."

echo ${BODY}| mail -s ${SUBJECT} ${TO_ADDRESS} -- -r ${FROM_ADDRESS}

We will enhance the above script to attach files, to read body from a file and specifying a list of users in CC. The enhanced mail script is shown below:

#!/bin/bash

TO_ADDRESS="recipient@domain.com"
FROM_ADDRESS="sender"
SUBJECT="linux mail send attachment example"
BODY_FILE="script.dat"
ATTACHMENT_FILE="logfile.txt"
CC_LIST="user1@gmail.com;user2@yahoomail.com;user3@earthlink.com;user4@cheetahmail.com"

uuencode ${ATTACHMENT_FILE} | mail -s ${SUBJECT} -c ${CC_LIST} ${TO_ADDRESS} -- -r ${FROM_ADDRESS} < ${BODY_FILE}

Tar Command

The syntax of tar command is

tar [options] [Archive file] [files list]

The options of tar command are:

c : creates a tar file.
v : verbose. Displays the files information.
f : Specify the tar file name.
r : updates the tar file with new files.
x : Extracts files from the archive (tar file).
t : view contents of tar file.
z : Specify the tar command to create a tar file using gzip in unix.
j : uses bzip2 to create the tar file.

1. Creating a tar file

Let see a sample example by archiving all the files in my current directory. The ls -l command displays the files and directories in the current directory.

> ls -l 
drwxr-xr-x 2 user group 4096 Aug  8 03:23 debian
-rw-r--r-- 1 user group  174 Aug  2 23:39 file
-rw-r--r-- 1 user group    0 Aug  8 03:22 linux_server.bat
-rw-r--r-- 1 user group   76 Aug  2 02:21 test.sh
-rw-r--r-- 1 user group    0 Aug  8 03:22 unix_distro

We see how to tar all these files using the -c option with the tar command. This is shown below:

> tar -cvf archive.tar *
debian/
file
linux_server.bat
test.sh
unix_distro

> ls
archive.tar  debian  file  linux_server.bat  test.sh  unix_distro

2. Printing the contents of tar file

We have created the tar file and we dont know whether it contains the actual files or not. To view the contents of the tar file use the -t option as

> tar -tvf archive.tar
drwxr-xr-x user/group   0 2012-08-08 03:23:07 debian/
-rw-r--r-- user/group 174 2012-08-02 23:39:51 file
-rw-r--r-- user/group   0 2012-08-08 03:22:19 linux_server.bat
-rw-r--r-- user/group  76 2012-08-02 02:21:32 test.sh
-rw-r--r-- user/group   0 2012-08-08 03:22:09 unix_distro

3. Updating the tar file with new contents.

You can add new files to the existing archive (tar) file using the -r option.

>touch red-hat-linux.dat

>tar -rvf archive.tar red-hat-linux.dat
red-hat-linux.dat

>tar -tvf archive.tar
drwxr-xr-x pcenter/pcenter   0 2012-08-08 03:23:07 debian/
-rw-r--r-- pcenter/pcenter 174 2012-08-02 23:39:51 file
-rw-r--r-- pcenter/pcenter   0 2012-08-08 03:22:19 linux_server.bat
-rw-r--r-- pcenter/pcenter  76 2012-08-02 02:21:32 test.sh
-rw-r--r-- pcenter/pcenter   0 2012-08-08 03:22:09 unix_distro
-rw-r--r-- pcenter/pcenter   0 2012-08-08 04:00:00 red-hat-linux.dat

4. Extracting the contents of tar file

In the first example, we have created the archive file. Now we will see how to extract the set of files from the archive. To extract the contents of the tar file use the -x option.

> tar -xvf archive.tar
debian/
file
linux_server.bat
test.sh
unix_distro

5. Creating compressed tar file

So far we have created a uncompressed tar file in the above examples. We can create a compressed tar file using the gzip

Compressing files using gzip

> tar -zcvf new_tar_file.tar.gz *

Find Command

How to find for a file using name?

find -name "sum.txt"
./bkp/sum.txt
./sum.txt

-i can be used for case insensitive file names.

How to remove files which contain the name "java".

find -name "*java*" -exec rm -r {} \;

This will delete all the files which have the word “java" in the file name in the current directory and sub-directories.

How to find for a file in the current directory only?

find -maxdepth 1 -name "sum.txt"

Sed Command

Sed is a Stream Editor used for modifying the files in unix (or linux). Whenever you want to make changes to the file automatically, sed comes in handy to do this. Most people never learn its power; they just simply use sed to replace text. You can do many things apart from replacing text with sed.

Consider the below text file as an input.

>cat file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.

Sed Command Examples

1. Replacing or substituting string

Sed command is mostly used to replace the text in a file. The below simple sed command replaces the word "unix" with "linux" in the file.

>sed 's/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string.

By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line.

2. Replacing the nth occurrence of a pattern in a line.

Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word "unix" with "linux" in a line.

>sed 's/unix/linux/2' file.txt
unix is great os. linux is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.

3. Replacing all the occurrence of the pattern in a line.

The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

>sed 's/unix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.

4. Replacing from nth occurrence to all occurrences in a line.

Use the combination of /1, /2 etc and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces the third, fourth, fifth... "unix" word with "linux" word in a line.

>sed 's/unix/linux/3g' file.txt
unix is great os. unix is opensource. linux is free os.
learn operating system.
unixlinux which one you choose.

Replacing string on a specific line number.

You can restrict the sed command to replace the string on a specific line number. An example is

>sed '3 s/unix/linux/' file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

Replacing string on a range of lines.

You can specify a range of line numbers to the sed command for replacing a string.

>sed '1,3 s/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

Here the sed command replaces the lines with range from 1 to 3.

Deleting lines.

You can delete the lines a file by specifying the line number or a range or numbers.

>sed '2 d' file.txt
>sed '5,$ d' file.txt

courtesy:folkstalk

Target Update Override

One can perform updates in the target in mapping using 2 approaches:

Using a UPDATE STRATEGY transformation

Configuring Session and defining properties to "Treat all rows as UPDATE"

But both these approach works only when there is a Primary Key defined in the target table. The session will fail when there will not be any key defined in the target.

Alternatively, if you want to perform TARGET updates, with putting condition on COLUMNs not included as PRIMARY KEY in target table, it will not be feasible with out changing the Keys on target table. i.e. If PRIMARY KEY is defined on COLUMN_A in TABLE_A and you want to Update TABLE_A with conditions on or including COLUMN_B

In such cases the UPDATE_OVERRIDE property of TARGET Object can be useful.

How to access UPDATE_OVERRIDE property?

Go to Mapping

Double Clink on the concerned TARGET or edit the Target

Click on Properties Tab

The second Transformation Attribute is the property we are looking for

Syntax for UPDATE_OVERIDE SQL:

UDATE <TARGET TABLE>
SET <COLUMN NAME to be updated> = :TU.<TARGET COLUMN PORT NAME (As in Designer)>

                   , [Other columns need to be updated]
WHERE      <COLUMN NAME to be treated as KEY > = :TU.<corresponding TARGET COLUMN PORT NAME (As in Designer)>
AND          [other conditions]

Example:

UPDATE EMPL_POST_HIST

SET POST = :TU.POST

, UPDATE_DATE = :TU.UPDATE_DATE

WHERE EMPL = :TU.EMPL

Following is an example showing Problem statement and step by step Instructions, of two approaches, of suggested solution.

Problem Scenario:

Target table has not defined Primary key

Informatica target object has no Keys - Since the Database table has not Keys the Imported Object in Informatica will not have any Key in the Target definition