Home
Blog
Informatica
ETL Project Architecture in Informatica

ETL Project Architecture in Informatica

Ratings:

(4)Views: 0

Share this blog:

ETL Project Architecture:

These are two stages defined in current project architecture

ETL Stage1
ETL Stage2

Data Profiling: (ods)

Its process of study and analyzing source data. We can detect records with in the Null Values, duplicate Records, Inconsistency data and data definition.

Key Points - Sequence Generator:

1) Start value --- First Value

2) Increment By --- Amount to Increment at each iteration

3) End Value --- Highest value to use

4) Cycle --- If checked, sequence generator returns to start value when end value is reached otherwise it stops.

5) Number of cached Values --- Enables storing multiple values when the same sequence generator is used in multiple sessions simultaneously

6) Reser --- it checked each session returns to starts value; otherwise, each new session continues from the last stored value.

7) Trancing Level --- Level of detail to be written to session logs

Design a Mapping to perform Round robin Loading:

Senarios 2 :

Workflow Senarios:

A workflow is having a 5 session run first 4 sessions in parallel, If all four sesion are soccers then execute 5 sesion

Hint: Use Link Condition

Dynamic LOOKUP Cache:

The Integration Service inserts the records and updates the records in the cache. Use dynamic LOOKUP cache for eliminating duplicates (or) In implementing slowly changing dimensions Type1

The dynamic LOOKUP cache is used "when you perform a LOOKUP on target table"

Informatica Interview Questions & Answers

The dynamic LOOKUP transformation allows for the synchronization of the dynamic target LOOKUP table image in the memory with its physical tale in the database.

New LOOKUP row:

New look up	Description
0	The integration service does not update (or) Insert the row in the cache
1	The integration service inserts the row into the cache
2	The integration service update the row in the cache

Key Points:

The LOOKUP transformation associates port matches a LOOKUP input port with the corresponding port in the LOOKUP cache.

The "Ignore NULL ports for updates" should be checked to port where NULL

The "Ignore a camparision" should be checked for any port that is not to be compared.

The flag "New LOOKUP Row" indicates the type of row manipulation of the cache. If an input row creates an insert in the LOOKUP cache.

The flag is set to "1" if an input row creates an updates of the LOOKUP ache the flag is set to "2". If no changes is detected the flow is set to "0". A filter or router T/R can be used with an updates to set the proper row flag to update a target table.

Procedure:

Source definition emp

Create a target defination with the name emp_DC

Empkey + All 8 columns

Create a mapping with the name m-employee-dynamic-cache .

Drop the source defination emp
Drop the target defination as a two Insentation
Create the transformation type LOOKUP which perform a LOOKUP on the target table (emp_Dc)

From SQ-emp copy all the ports to the LOOKUP transformation

Double click on the LOOKUP transformation select the condition tab

LOOKUP table Column	Operator	Transformation port
emp no	=	EmpNO

Select the properties tab

Transformation attribute	Value
Connection information	BSR - Writer
Dynamic look up cache
Insert else update

Select the ports tab

For a port Name empkey select the data type Integer

Port Name Associated Port

Ename Ename1

Job Job1

MGR MGR1

Hiredate Hiredate1

sal sal1

comm comm1

Deptno Deptno1

Click apply and click ok

Create the transformation type router

From LOOKUP transformation copy the following ports to the router transformation (new LOOKUP row, empkey, empno, ........deptno)

Double click the router transformation select the groups tab

GroupName GroupFilter Condition

New New LOOKUP row = 1

Update New LOOKUP row = 2

Click apply and click Ok

New Record Flow:

Create the transformation type Update strategy

From New output group copy the ports to [except new LOOKUP] Update strategy and develop the express DD-Insert .

From Update strategy transformation connect the ports to target.

Update Flow:

Create the transformation type Update strategy

From Updateoutputgroup copy th eports Update strategy transformation and develop the following expression DD-Update

From Update strategy transformation connect the port to target

UnConnected stored procedure - Drop and Create Index:

Create the following two stored procedures in the target database account.

Create or Replace procedure Emp_Create_Index

{

V_Table_Name in varchar2;

V_Index_Col_Name in varchar2;

V_Index_Name in varchar2;

}

begin

Execute Immediate

Create Index space 'll V_Index_Namell' space on space 'llV_Table_Namell' C'llV_Index_Col_Namell')';

end;

Procedure 2:

Create or REplace procedure Emp_drop_Index

{

V_Index_Name in varchar2

}

begin

Execute Immediate

'Drop Index space ' ll V_Index_Name;

end;

SetUp Target Table Defination With Index:

SQL> Create table emp_TGt as select * from Scott.emp where 1=2; enter

Table create

SQL> Select INDEX_NAME From USER_INDEXEX WHERE TABLE_NAME'EMP-TGT';

SQL> Create Index EMP_IDX ON EMP_TGT(EMP NO);

Mapping Designing:

Create source defination with the Name emp

Create a target defiantion withe the Name emp-TGT [using target designer tool]

Create a mapping with the Name m_drop_create_Index

Drop the source and target defination

From Sourcequalifier Connect the ports to target

Create two stored procedure transformations with the Name

SP_Drop_Index and SI_Create_Index:

Double click on the stored procedure transformation with the name

SP_drop_Index

Select the properties tab

Transformation Attribute Value

stored procedure Target_preeLoad

call text emp_drop_index("emp_Index")

connection information BSR_Writer

Click Apply and click Ok

Double click on stored procedure with the Name Sp_create_Index

Select the properties tab

Transformation Attribute Value

stored procedure type terget_postLoad

call text emp_create_Index("Emp_TGT")

connection Information BSR-Writer

Click Apply and Click Ok

Informatica Concepts

UnConnected Stored Procedure - Returning Multiplpe Output Ports:

Execute the following stored procedure in the source database account(username scott)

SQL> Create or Replace procedure Emp-

{

V_empNo IN Number;

TOTSAL OUT Number;

TAX OUT Number;

HRA OUT Number;

}

begin

SELECT SAL+NVL(COMM, 0), Sal *0.1, sal *0.4

INTO

TOTALSAL , TAX, HRA

FROM EMP WHERE EMPNO = V_EMPNO;

END;

Create source defination with the Name emp

Create a target defination with the Name emp_prc_var

EmpNo, EName, Job, Sal, TOTALsal, TAX, HRA,deptno

Create a mapping with the Name m_prc_var

drop the source and target defination

Create the transformation type storedprocedure and expression

From source qualifier capo the required ports to the expression transformation

Double click on expression transformation select the ports tab

Port name	Data type	precision	scale	I	O	V	Expression
V – tax	Decimal	7	2
V – HRA	decimal	7	2
TOTAL SAL		7	2
TAX		7	2				V – Tax
HRA		7	2				V - HRA

:SP – PROC – VRA(Empno,proc –result,V- tax, V- HRA)

Click Apply and Click Ok

From expression, transformation connects the ports to the targets.

From repository menu click on save

SHORTCUTS:

-->Shortcut is a reusable component

--> For create a shortcut, first we have to create sharable folders

--> Open the destination folder

--> Drag the object (source, target, mapping)

--> From the shared folder into the destination folder

i.e., source analyzer and mapping designer

--> You can create a shortcut to a shred folder in the same repository

-->

When you will create a shortcut in the same repository that is known as the local shortcut.

Creation of Sharable Folder:

Open the client repository manager from folder menu select create enter the folder name AXEDW (any name)

select allow shortcut [-/] click ok

open the client power center designer

Activate the sharable folder (AXEDW)

Create source definition with the emp in the source analyser tool

Activate the destination folder where we want to shortcut.

From sharable folder drag the source definition emp, drop on source analyzer workspace which belongs to destination folder Click on Yes

Informatica Components

Version Control:

--> By using version control we are maintaining the history of the metadata objects.

--> A versioned repository stores multiple versions of objects

--> Each version is a separate object with an unique number.

--> You can perform the following change management tasks to create and manage multiple version of objects in the repository.

Checkin
Check out

Check in:

You must save an object before you can check it in.
When you check in an object, the repository creates a new version of the object and assigns it a version member.
The repository increments the version member when you check in an object.

Check Out:

It edits an object you must check out the object.
When you check out an object, the repository obtains a write intent lock on the object.
No other users can edit the object when you have it checked out.

Check in:

Select the mapping m_customer_dimension , right click on that mapping select the versioning and click on checkIn

Name:Date:Goal

Click on Applyall

Check out:

Select the mapping m_customer_dimension , right click on that mapping select the versioning and click on checkout

Comments

Click on Apply to all

From version menu select the two version of the mapping, right click on compare, select the selected version

Click on save file, enter the file name 4pm_compare click on save

Partition Points Overview:

--> Partition points mark the boundaries between thread in a pipeline the integration service redistributes rows of do at partition points.

--> You can edit partition to increase the number of transformation threads and increase threads and increase session performance.

Types of Partition:

Key range partition
Pass through partition
Round robin partition
Hash partition
Database partition

1) Key Range Partition:

--> With key range partitioning service attributes rows of data based on a port that you define as the partition key

--> For each port you define a range of values.

--> The integration service uses the key and ranges send rows to the appropriate partition.

(1) How many repositories are created in real time?

ans: minimum 3, maximum 5

Designer
Testing(Q A)
Pre Population
Production

Procedure:

--> Create a mapping with the name m_emp_partition

--> Drag and drop you source target emp_partition to the mapping designer

--> Copy all the columns to sa_emp and connect to the target

--> Doble click on the session, click on mapping tab

--> From left pane click on partition (below) tab

--> Select the source qualifier (SQ_emp), click on edit partition point

--> Select the partition type key range

Name Destination

Partition 1

Partition 2

Partition 3

Click on Ok

Click on edit key select the SAL column click on Add click Ok

Click on target emp_partition, click on edit partition point

Select the partition type, keyrange, click on Ok

Select the SAL column, click on Add

Click Ok, save on repository

Business Requirements:

Scenario 1:

Client Name BSR(customer Name)

MARKET HYD

Product IBMshare

Client order 5000

Account RAM(A person from Bank of America)

Client allocation 500

Client execution 500

Pending Amount 0

Scenario 2:

Client Name

MARKET HYD

Product IBMshare

Client order 1000

Account RAM(A person from Bank of America)

Client allocation 1000

Client execution 700

Pending Amount 300

Scenario 3:

Multiple client order and single market order

Scenario 4:

Designing Database from Requirements:

The datamart is designed with the following dimensional table and fact tables.

DM_T_DATE_DIM * ERVIN is a data tool

DM_T_MARKET_DIM * Tode is a database tool

DM_T_PRODUCT_DIM

DM_T_ACCOUNT_DIM

DM_T_EMPLOYEE_DIM

TRANSACTION_DETAIL_FACT

ETLStage1 Implementation:

The source is defined with the flat files, the following are the delimited flatfiles which provide the data for extraction.

Accounts.Txt

Market.Txt

Product.Txt

Client_Allocation.Txt

Client_Execution.Txt

Client_Order.Txt

Employee.Txt

If you want to enrich your career and become a professional in Informatica, then visit Tekslate - a global online training platform: "Informatica Training" This course will help you to achieve excellence in this domain.

Defining Staging Database:

Define the staging database as a target to perform the data profiling.

Create the following table in a staging database account

List of Table:

T_Product

T_Account

T_Employee

T_Market

Client_Order

Client_Allocation

Client_Execution

T_date

ETL Stage 1 Mappings:

Design the simple pass mappings which migrate the data from source to staging.

MT_stg_Account_Flatfile_Ora(source to stage)

MT_stg_Client_Allocation_Flatfile_Ora(source to stage)

MT_stg_Client_Execution_Flatfile_Ora(source to stage)

MT_stg_Client_Order_Flatfile_Ora(source to stage)

MT_stg_Employee_Flatfile_Ora(source to stage)

MT_stg_Market_Flatfile_Ora(source to stage)

MT_stg_Product_Flatfile_Ora(source to stage)

Defining Source metadata:

Logon to oracle with the user account system

--> Create a user Account with the Name SRC

--> Execute the following source table structure

Product_SRC

Account_SRC

Employee_SRC

Market_SRC

Client_Order_SRC

Client_Allocation_SRC

Client_Execution_SR

Create odbe connection with the Name SRC_odbc

Import the source table definations from user account SRC

Convert the table definations to flatfile

Defining Staging Metadata:

Logon to the oracle with the user accout system

Create a user account stg

Executing the following table structures

T_Product

T_Account

T_Employee

T_Market

Client_Order

Client_Allocation

Client_Execution

T_date

Create the following stored procedure to populate the data into the stage table T_date

SQL> Create or Replace procedure date_stg_proc

v_start_date date : = to_date('01-01-2007','dd-mm-yyyy')

begin

for i in 1...2000

Loop

insert into t_date values (v_start_date);

v_start_date : = v_start_date + 1;

end Loop;

end;

Execute the above stored procedure in staging database

SQL> Exec date_stg_proc;

SQL> commit;

ETL Stage2 Implementation:

Source System:

Define the staging database source system with the following table

T_Account

T_Product

T_Market

T_Employee

Client_Order

Client_Execution

T_Date

Defining The Target System:

Logon to the oracle with the user Account System

SQL> Create user CDM identified by CDM

SQL> grant DBA to CDM

SQL> connect CDM/CDM

Execute the following tables

DM_T_Account_DIM

DM_T_Account_DIM_EXCEP

DM_T_DATE_DIM

DM_T_EMPLOYEE_DIM

DM_T_EMPLOYEE_DIM_EXCEP

DM_T_Product_DIM

DM_T_Product_DIM_EXCEP

DM_T_MARKET_DIM

DM_T_MARKET_DIM_EXCEP

TRANSACTION_DETAIL_FACT

create the ODBC connection with the name CDM_ODBC

Import the target table definitions from CDM user Account

--> product and exception product make to change the product key, and except key will make it as a forst and generate SQL.

For in-depth understanding of Informatica, click on

You liked the article?

Like: 0

Vote for difficulty

Current difficulty (Avg): Medium

EasyMediumHardDifficultExpert

IMPROVE ARTICLEReport Issue

Recommended Courses

DataStage Training 4.95789 : Learners

Teradata Training 4.92763 : Learners

1/6

About Author

Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.

Stay Updated

Get stories of change makers and innovators from the startup ecosystem in your inbox

Related Blogs

ETL Project Architecture in Informatica

ETL Project Architecture:

Data Profiling: (ods)

Key Points - Sequence Generator:

Design a Mapping to perform Round robin Loading:

Senarios 2 :

Key Points:

Recommended Articles

Recommended Courses

About Author