ETL Performance optimaization & Aggrigate Cache in Informatica

Ratings:
(4)
Views:0
Banner-Img
  • Share this blog:

ETL Performance optimaization:

Target-Performance optimization:

Use the following techniches to improve the performance of data loading.

(1) Drop the Index beforeloading

(2) Increase commit interval

(3) Use bulk Loading

ETL Performance optimaization & Aggrigate Cache in Informatica

Source -Performance optimization:

Use the following techniques to improve the performance of data extraction.

  1. Create index on the source system
  2. Optimize th equery using hints [SQ override - filter]

Transformation - Performance optimization:

(1) Filter transformation:

Keep the filter T/R as close to the SQ as possible to filter the data carry in the data flow.

(2) Sequence generator transformation:

Configure the sequence generator transformation as close to the target as possible in the mapping.

Other wise mappings will be carrying extra sequence nols through the transformation pross which you not be transfer.

Ex:

ETL Performance optimaization & Aggrigate Cache in Informatica

Sorter transformation:

Keep the sorter transformation before an aggregation may improve the perforation over an order by cause in "SQL override"

Because the source data may not be tone with buffer size need for databse sort.

(4) Joiner trasformation:

Defining the source with the less no of the records as a master which occupy the less amount of memory in the cache.

--> Use sorted input

(5) aggregator T/R:

--> Use sorter input

--> Define incremental aggregation

--> Reduce the no of groups

(6) Expression transformation:

--> Create variable port to simple the complex expression

--> Use pipe operator ll rather than using concatenation function.

--> Use decode function rather than using nested if function.

(7) Router transformation:

Create a one router transformation rather than multiple filter T/R because a row is read once into input groups but evaluate multiple times based on no of groups. Whereas using filter T/R requires the same row data to be duplicated for filter transformation.

 

ETL Performance optimaization & Aggrigate Cache in Informatica

LOOKUP transformation:

Define SQL override to reduce no of records in the LOOKUP cache.

--> Use persistence LOOKUP cache

--> Use shared LOOKUP cache

(9) UPdate Strategy transformation:

Performance Consideration:

  1. The UPdate strategy T/R performance can vary depending on the number of UPdates and inserts. In some cases ther emay be a performance benefit to split a mapping with UPdates and inserts into mapping and sessions, one mapping with inserts and the other with UPdates.
If you want to enrich your career and become a professional in Informatica, then visit Tekslate - a global online training platform: "Informatica Training" This course will help you to achieve excellence in this domain.

Increment Aggregation - Aggregate cache:

How it works:

  1. i) There are two types of cache memory, index and data cache.
  2. ii) All rows are loaded into cache before any aggregation takes place.

iii) The index cache contains group by port values.

  1. iv) The data cache contains all ports values variables and connected output ports.
  2. v) Now group by input port used in non-aggregate output local variable ports.

-> ports containing aggregate function (multiple by three)

  1. vi) One out port rowa will be required for each unique occurrence of the group by ports.

An incremental aggregation is a process of calculating the summary's for only new records that pass through mapping.

At first run the integration service creates index cache which is saved with an extension ".Idx" , a data cahce which is saved with an extension ".det"

At second run the integration service performs following actions.

(a) For each input record the integration service checks historical information in the index cache for a corresponding group.

(b) If it finds the group then the integration service performs the aggregation incrementary.

(c) If it does not find the group , the integration service creates a new group and perform aggregation

An Incremental aggregation with aggregate cache improves the session performance.

Empno                  Ename                  SQL                   Deptno

101                           Sri                        4000                       10

102                          Sree                     3000                       20

103                          Ram                     2500                       30

104                        Shyam                 3500                       10

105                        Prasad                 5000                       20

106                          Ravi                      7000                       20

107                          Srinu                    5000                       10

ETL Performance optimaization & Aggrigate Cache in Informatica

Second Run:

Empno                  Ename                  Sal                 Deptno

101                        Srinu                   3000                    10

102                        Ram                    1500                    20

103                        Ravi                     3500                    30

ETL Performance optimaization & Aggrigate Cache in Informatica

Create source and target definations

Create a mapping with the Name m - Incremental- Agger

Drop the source and target definition

ETL Performance optimaization & Aggrigate Cache in Informatica

Create a session with the name S - m - Incremental - Aggregation

Double click the session select the properties tab

Attribute                             Value

Incrementals Aggregation

Select the mapping tab set the reader connection and writer

Connection click apply and click ok

From repository menu click on save

The following transformation are know castly transformation in the memory usage

  1. Rank transformation
  2. Joiner transformation
  3. Aggregator transformation
  4. Sorter transformation
  5. LOOKUP transformation

The Integration service uses a cache memory to process the transformation, there by it improve the performance.

Rank Cache:

An integration service uses a cache memory to process the rank transformations.

An integration service loads the data into the cache before performing the calculation.

Ex: Identifying top 4 sales

ETL Performance optimaization & Aggrigate Cache in Informatica

Process:

(1) The integration service loads first four records into the data cache.

ETL Performance optimaization & Aggrigate Cache in Informatica

(2) The integration service reads next record from the source, comparist with cache values. If the record is lower in the rank than cached rows it discards the row.

(3) If the row is higher in the rank than one of the cached values then integration service replaced the cached row with the higher ranked input row.

(4) Finally the data cache contains

Sorter Cache:

The integration service uses cache memory to process the sorter transformation.

The integration services pass all of the incoming data into sorter cache before peforming the sort operation.

If the amount of incoming data is greater than the cache size specific then the integration services will temperarly store the data in a sorter transformation work directory.

The integration service create sorter cache to store sort keys and data while the integration services sorts the data.

By default the integration service creates "1 memory cache" and "disk cache" to perform sort operation.

LOOKUP Cache - how it works:

There are 2 types of cache memory , index and data cache

  • All ports values from the LOOKUP table where the port is part of the LOOKUP condition are loaded into index cache.
  • The index cache contains all ports values from the LOOKUP table where the port is specified in the LOOKUP condition.
  • The datacache contains all ports values from the LOOKUP table that are not in LOOKUP condition and that are specifieed as "output" as ports.
  • After the cache of the index cache upon a match the ros from the cache are included in stream.

The following are the types of LOOKUP cache.

(1) static LOOKUP Cache:

It's a read only cache , this is the default cache created by integration service.

The integration service don't UPdate the cache while it process the LOOKUP transformation.

(2) Dynamic LOOKUP Cache:

The integration service UPdates the cache while it process the LOOKUP transformation.

Use dynamic LOOKUP cache when you perform LOOK on "target" The dynamic LOOKUP cache is used in implementing slowly changing dimension type1.

(3) Persistence LOOKUP Cache:

You can save the LOOKUP cache files and reuse them the next time the integration service process the LOOKUP transformation configure to use the cache.

It improves the session performance.

(4) Shared LOOKUP Cache:

You can shared the LOOKUP cache between multiple transformation, It improve the performance of mapping.

 

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.