• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

ETL Project Architecture in Informatica

ETL Project Architecture:

These are two stages defined in current project architecture

  1. ETL Stage1
  2. ETL Stage2

Data Profiling: (ods)

Its process of study and analyzing source data. We can detect records with in the Null Values, duplicate Records, Inconsistency data and data definition.

32

Key Points – Sequence Generator:

1) Start value — First Value

2) Increment By — Amount to Increment at each iteration

3) End Value — Highest value to use

4) Cycle — If checked, sequence generator returns to start value when end value is reached otherwise it stops.

5) Number of cached Values — Enables storing multiple values when the same sequence generator is used in multiple sessions simultaneously

6) Reser — it checked each session returns to starts value; otherwise each new session continues from the last stored value.

7) Trancing Level — Level of detail to be written to session logs

Design a Mapping to perform Round robin Loading:

33

Senarios 2 :

Workflow Senarios:

A workflow is having a 5 session run first 4 sessions in parallel, If all four sesion are soccers then execute 5 sesion

Hint: Use Link Condition

Dynamic LOOKUP Cache:

The Integration Service inserts the records and updates the records in the cache. Use dynamic LOOKUP cache for eliminating duplicates (or) In implementing slowly changing dimensions Type1

The dynamic LOOKUP cache is used “when you perform a LOOKUP on target table”

The dynamic LOOKUP transformation allows for the synchronization of the dynamic target LOOKUP table image in the memory with its physical tale in the database.

New LOOKUP row:

New look upDescription
0The integration service does not update (or) Insert the row in the cache
1The integration service inserts the row into the cache
2The integration service update the row in the cache

 

 

Key Points:

The LOOKUP transformation associates port matches a LOOKUP input port with the corresponding port in the LOOKUP cache.

The “Ignore NULL ports for updates” should be checked to port where NULL

The “Ignore a camparision” should be checked for any port that is not to be compared.

The flag “New LOOKUP Row” indicates the type of row manipulation of the cache. If an input row creates an insert in the LOOKUP cache.

The flag is set to “1” if an input row creates an updates of the LOOKUP ache the flag is set to “2”. If no changes is detected the flow is set to “0”. A filter or router T/R can be used with an updates to set the proper row flag to update a target table.

Procedure:

Source definition emp

Create a target defination with the name emp_DC

Empkey + All 8 columns

34

Create a mapping with the name m-employee-dynamic-cache .

  •  Drop the source defination emp
  •  Drop the target defination as a two Insentation
  •  Create the transformation type LOOKUP which perform a LOOKUP on the target table (emp_Dc)

From SQ-emp copy all the ports to the LOOKUP transformation

Double click on the LOOKUP transformation select the condition tab

LOOKUP table ColumnOperator Transformation port
emp no=EmpNO

Select the properties tab

Transformation attributeValue
Connection informationBSR  – Writer
Dynamic look up cache
Insert else update

Select the ports tab

For a port Name empkey select the data type Integer

Port Name                                          Associated Port

Ename                                                  Ename1

Job                                                         Job1

MGR                                                      MGR1

Hiredate                                              Hiredate1

sal                                                           sal1

comm                                                   comm1

Deptno                                                 Deptno1

Click apply and click ok

Create the transformation type router

From LOOKUP transformation copy the following ports to the router transformation (new LOOKUP row, empkey, empno, ……..deptno)

Double click the router transformation select the groups tab

GroupName                                                       GroupFilter Condition

New                                                                      New LOOKUP row = 1

Update                                                                 New LOOKUP row = 2

Click apply and click Ok

New Record Flow:

Create the transformation type Update strategy

From New output group copy the ports to [except new LOOKUP] Update strategy and develop the express DD-Insert .

From Update strategy transformation connect the ports to target.

Update Flow:

Create the transformation type Update strategy

From Updateoutputgroup copy th eports Update strategy transformation and develop the following expression DD-Update

From Update strategy transformation connect the port to target

UnConnected stored procedure – Drop and Create Index:

Create the following two stored procedures in the target databse account.

Create or Replace procedure Emp_Create_Index

{

V_Table_Name in varchar2;

V_Index_Col_Name in varchar2;

V_Index_Name in varchar2;

}

as

begin

Execute Immediate

Create Index space ‘ll V_Index_Namell’ space on space ‘llV_Table_Namell’ C’llV_Index_Col_Namell’)’;

end;

/

Procedure 2:

Create or REplace procedure Emp_drop_Index

{

V_Index_Name in varchar2

}

as

begin

Execute Immediate

‘Drop Index space ‘ ll V_Index_Name;

end;

/

SetUp Target Table Defination With Index:

SQL> Create table emp_TGt as select * from scott.emp where 1=2; enter

Table create

SQL> Select INDEX_NAME From USER_INDEXEX WHERE TABLE_NAME\’EMP-TGT’;

SQL> Create Index EMP_IDX ON EMP_TGT(EMP NO);

Mapping Designing:

Create source defination with the Name emp

Create a target defiantion withe the Name emp-TGT [using target designer tool]

Create a mapping with the Name m_drop_create_Index

Drop the source and target defination

From Sourcequalifier Connect the ports to target

Create two stored procedure transformations with the Name

SP_Drop_Index and SI_Create_Index:

Double click on the stored procedure transformation with the name

SP_drop_Index

Select the properties tab

Transformation Attribute                                            Value

stored procedure                                                            Target_preeLoad

call text                                                                            emp_drop_index(“emp_Index”)

connection information                                              BSR_Writer

Click Apply and click Ok

Double click on stored procedure with the Name Sp_create_Index

Select the properties tab

Transformation Attribute                                                Value

stored procedure type                                                        terget_postLoad

call text                                                                                emp_create_Index(“Emp_TGT”)

connection Information                                                BSR-Writer

Click Apply and Click Ok

UnConnected Stored Procedure – Returning Multiplpe Output Ports:

Execute the following stored procedure in the source database account(username scott)

SQL> Create or Replace procedure Emp-

{

V_empNo IN Number;

TOTSAL OUT Number;

TAX OUT Number;

HRA OUT Number;

}

as

begin

SELECT SAL+NVL(COMM, 0), Sal *0.1, sal *0.4

INTO

TOTALSAL , TAX, HRA

FROM EMP WHERE EMPNO = V_EMPNO;

END;

/

Create source defination with the Name emp

Create a target defination with the Name emp_prc_var

EmpNo, EName, Job, Sal, TOTALsal, TAX, HRA,deptno

Create a mapping with the Name m_prc_var

drop the source and target defination

Create the transformation type storedprocedure and expression

From source qualifier capo the required ports to the expression transformation

Double click on expression transformation select the ports tab

Port nameData typeprecisionscaleI OVExpression
V – taxDecimal72
V – HRAdecimal72
TOTAL SAL72
TAX72V – Tax
HRA72V – HRA

:SP – PROC – VRA(Empno,proc –result,V- tax, V- HRA)

Click Apply and Click Ok

From expression transformation connects the ports to the targets.

From repository menu click on save

SHORTCUTS:

–>Shortcut is a reusable component

–> For create a shortcut , first we have to create a sharable folders

–> Open the destination folder

–> Drag the object (source, target, mapping)

–> From the shared folder into the destination folder

i.e., source analyzer and mapping designer

–> You can create a shortcut to a shred folder in the same repository

–>

When you will create a shortcut in the same repository that is known as local shortcut.

Creation of Sharable Folder:

Open the client repository manager from folder menu select create enter the folder name AXEDW (any name)

select allow shortcut [-/] click oko

open the client power center designer

Activate the sharable folder (AXEDW)

Create source defination with the emp in the source analyser tool

Activate the destination folder wher we want too shortcut.

From sharable folder drag the source defination emp , drop on source analyzer workspace which belongs to destination folder Click on Yes

Version Control:

–> By using version  control we are maintaining the history of the metadata objects.

–> A versioned repository stores multiple versions of an objects

–> Each version is a seperate object with unique number.

–> You can perform the following change management tasks to create and manage multiple version of objects in the repository.

  1. Checkin
  2. Check out

Check in:

  •  You must save an object before you can check it in.
  •  When you check in an object the repository creates a new version of the object and assigns it a version member.
  •  The repository increments the version member when you check in an object.

Check Out:

  •  It edit an object you must check out the object.
  •  When you check out an object the repository obtains a write intent lock on the object.
  •  No other users can edit the object when you have it checked out.

Check in:

Select the mapping m_customer_dimension , right click on that mapping select the versioning and click on checkIn

Name:Date:Goal

Click on Applyall

Check out:

Select the mapping m_customer_dimension , right click on that mapping select the versioning and click on checkout

Comments

Click on Apply to all

  • From version  menu select the two version of the mappping , right click on compare , select the selected version

Click on save file , enter the file name 4pm_compare click on save

Partition Points Overview:

–> Partition points mark the boundaries between thread in a pipeline the integration service redistributes rows of do at partition points.

–> You can edit partition to increase the number of transformation threads and increase threads and increase session performance.

Types of Partition:

  1. Key range partition
  2. Pass through partition
  3. Round robin partition
  4. Hash partition
  5. Database partition

1) Key Range Partition:

–> With key range partitioning service attributes rows of data based on a port that you define as the partition key

–> For each port you define a range of values.

–> The integration service uses the key and ranges send rows to the appropriate partition.

(1) How many repository are created in real time?

ans: minmum 3 , maximum 5

  1. Designer
  2. Testing(Q A)
  3. Pre Population
  4. Production

Procedure:

–> Create a mapping with the name m_emp_partition

–> Drag and drop you source target emp_partition to the mapping designer

–> Copy all the columns to sa_emp and connect to the target

–> Doble click on the sesion , click on mapping tab

–> From left pane click on partition (below) tab

–> Select the source qualifier (SQ_emp) , click on edit partition point

–> Select the parition type keyrange

Name                                             Destination

Partition 1

Partition 2

Partition 3

Click on Ok

Click on edit key select the SAL column click on Add click Ok

Click on target emp_partition , click on edit partition point

Select the partition type , keyrange , click on Ok

Select the SAL column , click on Add

Click Ok , save on repository

Business Requirements:

Senario 1:

Client Name                       BSR(customer Name)

MARKET                               HYD

Product                                  IBMshare

Client order                        5000

Account                               RAM(A person from Bank of America)

Client allocation                500

Client execution                 500

Pending Amount              0

Senario 2:

Client Name

MARKET                             HYD

Product                                IBMshare

Client order                        1000

Account                               RAM(A person from Bank of America)

Client allocation                1000

Client execution               700

Pending Amount              300

Scenario 3:

Multiple client order and single market order

35

Scenario 4:

Designing Database from Requirements:

The datamart is designed with following dimensional table and facttables.

DM_T_DATE_DIM                                                           * ERVIN is a data tool

DM_T_MARKET_DIM                                                    * Tode is database tool

DM_T_PRODUCT_DIM

DM_T_ACCOUNT_DIM

DM_T_EMPLOYEE_DIM

TRANSACTION_DETAIL_FACT

ETLStage1 Implementation:

The source is defined with the flatfiles, the following are the delimited flatfiles which provides the data for extraction.

Accounts.Txt

Market.Txt

Product.Txt

Client_Allocation.Txt

Client_Execution.Txt

Client_Order.Txt

Employee.Txt

Desired to gain proficiency on Informatica? Explore the blog post on Informatica training in Hyderabad to become a pro in Informatica.

Defining Staging Database:

Define the staging database as a target to perform the dataprofiling.

Create the following table in a staging database account

List of Table:

T_Product

T_Account

T_Employee

T_Market

Client_Order

Client_Allocation

Client_Execution

T_date

ETL Stage 1 Mappings:

Design the simple pass mappings which migrates the data from source to staging.

MT_stg_Account_Flatfile_Ora(source to stage)

MT_stg_Client_Allocation_Flatfile_Ora(source to stage)

MT_stg_Client_Execution_Flatfile_Ora(source to stage)

MT_stg_Client_Order_Flatfile_Ora(source to stage)

MT_stg_Employee_Flatfile_Ora(source to stage)

MT_stg_Market_Flatfile_Ora(source to stage)

MT_stg_Product_Flatfile_Ora(source to stage)

Defining Source metadata:

Logon to oracle with the user account system

–> Create a user Account with the Name SRC

–> Execute the following source table structure

Product_SRC

Account_SRC

Employee_SRC

Market_SRC

Client_Order_SRC

Client_Allocation_SRC

Client_Execution_SR

Create odbe connection with the Name SRC_odbc

Import the source table definations from user account SRC

Convert the table definations to flatfile

Defining Staging Metadata:

Logon to the oracle with the user accout system

Create a user account stg

Executing the following table structures

T_Product

T_Account

T_Employee

T_Market

Client_Order

Client_Allocation

Client_Execution

T_date

Create the following stored procedure to populate the data into the stage table T_date

SQL> Create or Replace procedure date_stg_proc

as

v_start_date   date : = to_date(’01-01-2007′,’dd-mm-yyyy’)

begin

for i in 1…2000

Loop

insert into t_date values (v_start_date);

v_start_date : = v_start_date + 1;

end Loop;

end;

Execute the above stored procedure in staging database

SQL> Exec date_stg_proc;

SQL> commit;

ETL Stage2 Implementation:

Source System:

Define the staging database source system with the following table

T_Account

T_Product

T_Market

T_Employee

Client_Order

Client_Execution

T_Date

Defining The Target System:

Logon to the oracle with the user Account System

SQL> Create user CDM identified by CDM

SQL> grant DBA to CDM

SQL> connect CDM/CDM

Execute the following tables

DM_T_Account_DIM

DM_T_Account_DIM_EXCEP

DM_T_DATE_DIM

DM_T_EMPLOYEE_DIM

DM_T_EMPLOYEE_DIM_EXCEP

DM_T_Product_DIM

DM_T_Product_DIM_EXCEP

DM_T_MARKET_DIM

DM_T_MARKET_DIM_EXCEP

TRANSACTION_DETAIL_FACT

create the ODBC connection with the name CDM_ODBC

Import the target table defination from CDM user Account

–> product and exception product make to change the product key and excep key will make it as a forst and generate SQL.

For indepth understanding of Informatica, click on

Summary
Review Date
Reviewed Item
ETL Project Architecture in Informatica
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on ETL Project Architecture in Informatica"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Site Disclaimer, Copyright © 2016 - All Rights Reserved.

    Support


    Please leave a message and we'll get back to you soon.

    I agree to be contacted via e-mail.