ETL Performance optimaization:
Use the following techniches to improve the performance of data loading.
(1) Drop the Index beforeloading
(2) Increase commit interval
(3) Use bulk Loading
Source -Performance optimization:
Use the following techniques to improve the performance of data extraction.
- Create index on the source system
- Optimize th equery using hints [SQ override - filter]
Transformation - Performance optimization:
(1) Filter transformation:
Keep the filter T/R as close to the SQ as possible to filter the data carry in the data flow.
(2) Sequence generator transformation:
Configure the sequence generator transformation as close to the target as possible in the mapping.
Other wise mappings will be carrying extra sequence nols through the transformation pross which you not be transfer.
Keep the sorter transformation before an aggregation may improve the perforation over an order by cause in "SQL override"
Because the source data may not be tone with buffer size need for databse sort.
(4) Joiner trasformation:
Defining the source with the less no of the records as a master which occupy the less amount of memory in the cache.
--> Use sorted input
(5) aggregator T/R:
--> Use sorter input
--> Define incremental aggregation
--> Reduce the no of groups
(6) Expression transformation:
--> Create variable port to simple the complex expression
--> Use pipe operator ll rather than using concatenation function.
--> Use decode function rather than using nested if function.
(7) Router transformation:
Create a one router transformation rather than multiple filter T/R because a row is read once into input groups but evaluate multiple times based on no of groups. Whereas using filter T/R requires the same row data to be duplicated for filter transformation.
Define SQL override to reduce no of records in the LOOKUP cache.
--> Use persistence LOOKUP cache
--> Use shared LOOKUP cache
(9) UPdate Strategy transformation:
- The UPdate strategy T/R performance can vary depending on the number of UPdates and inserts. In some cases ther emay be a performance benefit to split a mapping with UPdates and inserts into mapping and sessions, one mapping with inserts and the other with UPdates.
Increment Aggregation - Aggregate cache:
How it works:
- i) There are two types of cache memory, index and data cache.
- ii) All rows are loaded into cache before any aggregation takes place.
iii) The index cache contains group by port values.
- iv) The data cache contains all ports values variables and connected output ports.
- v) Now group by input port used in non-aggregate output local variable ports.
-> ports containing aggregate function (multiple by three)
- vi) One out port rowa will be required for each unique occurrence of the group by ports.
An incremental aggregation is a process of calculating the summary's for only new records that pass through mapping.
At first run the integration service creates index cache which is saved with an extension ".Idx" , a data cahce which is saved with an extension ".det"
At second run the integration service performs following actions.
(a) For each input record the integration service checks historical information in the index cache for a corresponding group.
(b) If it finds the group then the integration service performs the aggregation incrementary.
(c) If it does not find the group , the integration service creates a new group and perform aggregation
An Incremental aggregation with aggregate cache improves the session performance.
Empno Ename SQL Deptno
101 Sri 4000 10
102 Sree 3000 20
103 Ram 2500 30
104 Shyam 3500 10
105 Prasad 5000 20
106 Ravi 7000 20
107 Srinu 5000 10
Empno Ename Sal Deptno
101 Srinu 3000 10
102 Ram 1500 20
103 Ravi 3500 30
Create source and target definations
Create a mapping with the Name m - Incremental- Agger
Drop the source and target definition
Create a session with the name S - m - Incremental - Aggregation
Double click the session select the properties tab
Select the mapping tab set the reader connection and writer
Connection click apply and click ok
From repository menu click on save
The following transformation are know castly transformation in the memory usage
- Rank transformation
- Joiner transformation
- Aggregator transformation
- Sorter transformation
- LOOKUP transformation
The Integration service uses a cache memory to process the transformation, there by it improve the performance.
An integration service uses a cache memory to process the rank transformations.
An integration service loads the data into the cache before performing the calculation.
Ex: Identifying top 4 sales
(1) The integration service loads first four records into the data cache.
(2) The integration service reads next record from the source, comparist with cache values. If the record is lower in the rank than cached rows it discards the row.
(3) If the row is higher in the rank than one of the cached values then integration service replaced the cached row with the higher ranked input row.
(4) Finally the data cache contains
The integration service uses cache memory to process the sorter transformation.
The integration services pass all of the incoming data into sorter cache before peforming the sort operation.
If the amount of incoming data is greater than the cache size specific then the integration services will temperarly store the data in a sorter transformation work directory.
The integration service create sorter cache to store sort keys and data while the integration services sorts the data.
By default the integration service creates "1 memory cache" and "disk cache" to perform sort operation.
LOOKUP Cache - how it works:
There are 2 types of cache memory , index and data cache
- All ports values from the LOOKUP table where the port is part of the LOOKUP condition are loaded into index cache.
- The index cache contains all ports values from the LOOKUP table where the port is specified in the LOOKUP condition.
- The datacache contains all ports values from the LOOKUP table that are not in LOOKUP condition and that are specifieed as "output" as ports.
- After the cache of the index cache upon a match the ros from the cache are included in stream.
The following are the types of LOOKUP cache.
(1) static LOOKUP Cache:
It's a read only cache , this is the default cache created by integration service.
The integration service don't UPdate the cache while it process the LOOKUP transformation.
(2) Dynamic LOOKUP Cache:
The integration service UPdates the cache while it process the LOOKUP transformation.
Use dynamic LOOKUP cache when you perform LOOK on "target" The dynamic LOOKUP cache is used in implementing slowly changing dimension type1.
(3) Persistence LOOKUP Cache:
You can save the LOOKUP cache files and reuse them the next time the integration service process the LOOKUP transformation configure to use the cache.
It improves the session performance.
(4) Shared LOOKUP Cache:
You can shared the LOOKUP cache between multiple transformation, It improve the performance of mapping.
For indepth understanding of Informatica, click on
- Rank Transformation in Informatica
- Aggregator Transformation in Informatica
- Expression Transformation in Informatica
- Lookup Transformation in Informatica
- ETL Project Architecture
- Interview Questions