- - > ETL project phases

There are 3 phases

Phase – I

Data profiling


  • Source System Analyses is done in this phase
  • There are 5 Types of Analysis

  • Ca - - >  Column Analysis
  • Pa - - >   primary key Analysis
  • Fa - - >  foreign key Analysis
  • Bl - - >   base line Analysis
  • Cd - - >   cross domain Analysis

  • The output we get is whether the data is Directly or Not.
  • If the data is dirty proceed to next phase

Phase – II

Data Quality or cleansing

  • There are 5 Stages

  1. Parsing
  2. Cording
  3. Standardize
  4. Matching
  5. Consolidate

Golden Copy is sent to the next phase

Phase – III 

Data Transformation

ETL Process 


If Hutch wants to introduce 10 /- recharge price. Then – the top level manager needs

Some information



We have ETL Tools and ETL Programming Tools

ETL Tools Extract source data from heterogeneous source (i.e from different source)

ETL Programming Tools Extract data from only one External source.

Characters  o f Data Work House

  • Subject (that is  w . r. t to customer or sales etc)
  • Integrate
  • Non – volatile (only read)
  • Historical Data 


Active Data Base: (Historical data) 

       OLTP  (Time sensitive /30- 90 days)



The data that is collected from different sources can will be of 30 – 90 days. Later

On it is stored in Achieve Data Base which is an historical data.

  • ETL is a multilayer process.
  • Data Ware house is an data base that collects data from heterogeneous source as per

Business requirements required by an Toped level Manager.

  • Data Warehousing is an process Which has the Combination of ETL activities and

BI (Business Intelligence) activities .




Multi userLess user
Less in sizeLarge in size
Volatile(Read/Write)Non volatile(Read)


Learn DataStage by Tekslate - Fastest growing sector in the industry.

Explore Online DataStage Training and course is aligned with industry needs & developed by industry veterans.

Tekslate will turn you into DataStage Expert.

  • In ETL, Extraction has to be done is an specified time.
  • Loading has also to be done is an Specified time.
  • If Extraction has to be done between 10pm – 11 to 5 pm and loading to be done

Between 12:00am to 2:00am

- - > Extract Window:

Specified time  given by the client to hit source and Extract  the data is known  as Extract Window

- - > Load Window:

Specified time  given by the client to hit the target and load the data is known as

Load Window.

- - > After Extraction the data collected should be stored in an area called as Stage Area.



After Loading the data in to warehouse the permanent data will be deleted Known as flush

For indepth understanding of DataStage click on