There are 3 phases
Phase – I
Data profiling
- Source System Analyses is done in this phase
- There are 5 Types of Analysis
- Ca - - > Column Analysis
- Pa - - > primary key Analysis
- Fa - - > foreign key Analysis
- Bl - - > base line Analysis
- Cd - - > cross domain Analysis
- The output we get is whether the data is Directly or Not.
- If the data is dirty proceed to next phase
Phase – II
Data Quality or cleansing
- There are 5 Stages
- Parsing
- Cording
- Standardize
- Matching
- Consolidate
Golden Copy is sent to the next phase
Phase – III
Data Transformation
ETL Process
Example
If Hutch wants to introduce 10 /- recharge price. Then – the top level manager needs
Some information
We have ETL Tools and ETL Programming Tools
ETL Tools Extract source data from heterogeneous source (i.e from different source)
ETL Programming Tools Extract data from only one External source.
Characters o f Data Work House
- Subject (that is w . r. t to customer or sales etc)
- Integrate
- Non – volatile (only read)
- Historical Data
Active Data Base: (Historical data)
OLTP (Time sensitive /30- 90 days)
The data that is collected from different sources can will be of 30 – 90 days. Later
On it is stored in Achieve Data Base which is an historical data.
- ETL is a multilayer process.
- Data Ware house is an data base that collects data from heterogeneous source as per
Business requirements required by an Toped level Manager.
- Data Warehousing is an process Which has the Combination of ETL activities and
BI (Business Intelligence) activities .
OLTP | OLAP |
Transition | Analysis |
Multi user | Less user |
Less in size | Large in size |
Volatile(Read/Write) | Non volatile(Read) |
Learn DataStage by Tekslate - Fastest growing sector in the industry.Explore Online DataStage Training and course is aligned with industry needs & developed by industry veterans.
Tekslate will turn you into DataStage Expert.
- In ETL, Extraction has to be done is an specified time.
- Loading has also to be done is an Specified time.
- If Extraction has to be done between 10pm – 11 to 5 pm and loading to be done
Between 12:00am to 2:00am
- - > Extract Window:
Specified time given by the client to hit source and Extract the data is known as Extract Window
- - > Load Window:
Specified time given by the client to hit the target and load the data is known as
Load Window.
- - > After Extraction the data collected should be stored in an area called as Stage Area.
After Loading the data in to warehouse the permanent data will be deleted Known as flush
For indepth understanding of DataStage click on