Database and Warehouse Concepts

Sunday, 14 June 2015

PMCMD Command Usage in Informatica

Informatica provides four built-in command line programs or utilities to interact with the informatica features. They are:

infacmd
infasetup
pmcmd
pmrep

This article covers only about the pmcmd command. The pmcmd is a command line utility provided by the informatica to perform the following tasks.

Start workflows.
Start workflow from a specific task.
Stop, Abort workflows and Sessions.
Schedule the workflows.

How to use PMCMD Command in Informatica:

1. Scheduling the workflow

The pmcmd command syntax for scheduling the workflow is shown below:

pmcmd scheduleworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

You cannot specify the scheduling options here. This command just schedules the workflow for the next run.

2. Start workflow

The following pmcmd command starts the specified workflow:

pmcmd startworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

3. Stop workflow

Pmcmd command to stop the infromatica workflow is shown below:

pmcmd stopworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

4. Start workflow from a task

You can start the workflow from a specified task. This is shown below:

pmcmd startask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name -startfrom task-name

5. Stopping a task.

The following pmcmd command stops the specified task instance:

pmcmd stoptask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name

courtesy:folkstalk

6. Aborting workflow and task.

The following pmcmd commands are used to abort workflow and task in a workflow:

pmcmd abortworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

pmcmd aborttask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name

Informatica Workflow Successful : No Data in target !

Stop On Errors:

Indicates how many non-fatal errors the Integration Service can encounter before it stops the session. Non-fatal errors include reader, writer, and DTM errors. Enter the number of non-fatal errors you want to allow before stopping the session. The Integration Service maintains an independent error count for each source, target, and transformation. If you specify 0, non-fatal errors do not cause the session to stop.
Optionally use the $PMSessionErrorThreshold service variable to stop on the configured number of errors for the Integration Service.

In Oracle, it is the infamous “when others then null” .

a) Having “1” in your default session config : Fail the session on the first non-fatal error.
b) Over-write the session Configuration details and enter the “Stop On Errors” to “1” or a fixed number.
c) Use the $PMSessionErrorThreshold variable and set it at the integration service level. You can always override the variable in the parameter file.

/

In Depth:Unconnected LookUp in Informatica.

What is the absolute necessity of Unconnected lookup, if same functionality can be achieved by Connected Lookup Transformation?

The main advantage of using an unconnected lookup is the fact that you can use it in an expression/any other transformation like a “function” in other programming languages. The Lookup Cache is calculated only once as part of the session initialization and hence more efficient, since all subsequent look-ups will re-use the Cache.

a)When you are looking up against a Static dimension (or any table) that is rarely ever updated

The most common example for this is the Date Dimension. Usually loaded once when your Data Warehouse goes to Production and very rarely updated.What makes it even more appealing to use unconnected lookup for Date Dimension is the Date Dimension Role Playing.
Example, An online order can have Order Date, Ship Date, Cancelled Date, Recieved Date and so on and the same Date Dimension table plays multiple roles.
Without an unconnected Lookup, here’s how your mapping would look..Also note that the Lookup Is being done on the same table internally (DATE_DIM), but the cache is being calculated one for each lookup.

Using an unconnected Lookup For Date Dimension, this is how it would be transformed..

As you can see, this promotes greater reuse ,a less complex mapping and is more efficient becuase of lesser Caching.

b) When you are looking up against a Base/Conformed Dimension that is loaded before any of the Facts are loaded.

Another example (one which is not static Data) is looking up against any customer Master Data/Conformed Dimension. Your DataWarehouse can have many Confirmed Dimensions, which are loaded before any of the Fact Loads.
One such example is the Employee Dimension. You can lookup the employee key multiple times in a given mapping and this makes a great candidate for Unconnected Lookup.

c) The logic used in the Lookup-override is required at a lot of other places.
This is more of a good coding practise than an Informatica specific “tip”. If your lookup involves a lookup override and you calculate the actual fact using a formula instead of using a direct column value, it would make sense to get the result using an unconnected lookup and use it at multiple places using a :LKP expression.

One such example is when you want to Lookup Revenue based on the Status of the Order instead of a direct Lookup, and you have a look-up override like the one below.

1

2

3

4

5

6

Select order_id,

       (case when order_status = 'Pending' then revenue*0.7

            when order_status = 'COnfirmed' then revenue

            when order_status = 'Cancelled' then -1*revenue

        end) revenue           

  from order_revenue

So if your lookup data is constant during your load and you have a good case for reuse, Unconnected Lookup can be very useful, both in terms of maintainability and efficiency.

Courtesy:ETL Developer

Unconnected Lookup Scenario

Lets say you have 50 million records coming from the source out of which you only want to look up for some 1 million records based on some condition then at this kind of scenario go for unconnected lookup

In unconnected lookup's, its not necessary that we call it for every row. Because its unconnected, and not connected to the data flow pipeline, we can choose to call when it meets certain conditions.

This way we can use the unconnected lookup more efficiently

Active Lookup Transformation

Active Lookup Transformation Restrictions:

We cannot return multiple rows from an unconnected Lookup transformation
We cannot enable dynamic cache for a Active Lookup transformation.
Active Lookup Transformation that returns multiple rows cannot share a cache with a similar Passive Lookup Transformation that returns one matching row for each input row.

What is Degenerate Dimension ?

A Degenerate dimension is a Dimension which has only a single attribute. This dimension is typically represented as a single field in a fact table.The data items that are not facts and data items that do not fit into the existing dimensions are termed as Degenerate Dimensions. Degenerate Dimensions are used when fact tables represent transaction data.They can be used as primary key for the fact table but they cannot act as foreign keys.

Lookup Cache Types in Informatica

To boost the performance of Lookup Transformation , we mention to cache the lookup data , so it can directly look into cache file instead of connecting to source and then reading data from there for each row -

Lookup can generate a cache file which persists for the duration of the session, or even created as a permanent named cache. This means that the data required for the lookup is read from its source only once. Informatica creates an index on the lookup cache minimizing the processing time for calls to the cache.

Static Lookup :

In case of static cache ,the Integration Service does not update the cache while it processes the transformation. This is why it is called as Static. In Static Cache when the Lookup condition is true it return value from lookup table else returns Null or Default value. In Static Cache the important thing is that you cannot insert or update the cache.

Static Cache is normally used while referring to reference Data or when Data source is not a target for the mapping.

Example : Referring to Dept table to get Dept Name based on the DEPT no for Each employee.

Dynamic Lookup:

In Dynamic Cache we can insert or update rows in the cache when we pass the rows. Normally while using Target as the lookup source to identify if new record is already there in Target or not we use this Dynamic cache.When a new record comes and it is not present in Cache (means missing in Target as well) , then it will insert this new record in Cache as well . So the dynamic cache is synchronized with the target with each processed row.

Example : If there are 2 or more entries of the same customer on the same day in Source systems then while loading to the target we want the Lookup Cache to be refreshed dynamically(Insert/Update) and see the latest data in Cache.

Non Persistent Cache:

By default, the Informatica Server uses a non-persistent cache when you enable caching in a Lookup transformation. The Informatica Server deletes the cache files at the end of a session. The next time you run the session, the Informatica Server builds the memory cache from the database

Persistent Cache:

Persistent lookups are used when there is a need to save and reuse the existing cache files. Persistent cache is used when a lookup table does not change between session runs. The first time the Informatica Server runs a session using a persistent lookup cache; it saves the cache files to disk instead of deleting them on session completion. The next time the Informatica Server runs the session which calls the same persistent lookup, it builds the memory cache from the cache files, eliminating the time required to read the lookup table. If the lookup table changes occasionally, you can override session properties to recache the lookup from the database.

Advantages:

Informatica Server uses existing cache files for subsequent lookups, eliminating the time required to built the cache again
Cache file can be shared
Persistent lookup cache can be used for same multiple lookup call with in the same mapping / different mappings

Disadvantages:

Cache needs to be rebuilt if lookup table changes occasionally
Cache needs to be rebuilt if lookup transformation is changed or data movement code is changed or database connection information is changed
Lookup SQL override, has to be same in all lookups using same persistent cache file, else the mapping will fail

Note: In case there are multiple interdependent jobs in a workflow, which uses same persistent lookup cache, it is recommended to have dummy jobs for persistent cache file generation in the beginning of workflow. This eases the support and maintenance and jobs execution time. As a practice, please ensure to delete the existing cache file prior to re-building the cache. Informatica internally handles deletion of existing cache files; it is still preferred to have the UNIX script for deletion. (For e.g.: In case session for building lookup cache is disabled or is not executed, then the subsequent session will use the same cache file available on the server. The addition of UNIX scripts job will ensure files are deleted; this can be done through command task

Courtesy:Techtricks

Pages