Phages in data acquisition:
- Data extraction
- Data transformation
- Data loading
Data extraction:
From various types of sources like RDBMS systems (oracle, SQL, server, tera data, sy base etc), files (txt, xml) and ERP systems like (SAP, people soft … etc)
Data transformation:
It is a process of transforming the data from one format to another format (client required format) by applying business lagiees
- Data transformation is takes place in the staging area
- Staging area is nothing but a temporary memory area where the transformation activity takes place
- The following are transformation activity takes place in the staging area
- Data cleaning
- Data subbing
- Data merging
- Data aggregating
Data cleaning:
It is a process of converting inconsistent data (means UN uniform data ) in to consistent data (uniform data ) and removing un wanted data.
Customer (source)
Cust no cust f name cust l name amt loc
1 bill gates 50000 Newyork
2 bill Clinton 70000 boston
3 nara chandrababu 40000 hyderabad
Customer target
Cust no cust name amt tax loc
1 billgates 5000 500 Newyork
2 billClinton 7000 700 boston
3 narachandrababu 4000 400 hyd
Staining area :
- Cust name= concat (Init cap (Cust name).’ ‘ , init cap(Cust name)
- Tax = actual
Data merging:
It is a process of combining multiple sources into a single target (single output) (or) it is a process combining multiple input flows into a single output flow there are two types of data merging
- union
- join
Union:
It is a process of combining multiple sources which are having same structure (number of columns should be same and the data types of the corresponding columns should be same)
- sources can be homogeneous (same type of sources) or heterogeneous (default types of sources)
Join:
To perform join, there should be at least one common column between the sources, sources can be homogeneous (or) hetero genius.
Ex: source 1
Data aggregation:
It is a process of converting the details data in to summary data by applying group functions like sum, max, min, avg, count.
Desired to gain proficiency on Cognos? Explore the blog post on Cognos training online to become a pro in Cognos.
- It s a process of performing group calculations like sum, max min, avg , etc…
Data loading:
It is a process of populating (or) dumping the source data (OLTP) into the target system (DWH) there are 2 types of data loading
- initial load (or) full load
- incremental load (or) delta load
Initial load:
Leading the data for the first time from source to target is called initial load (or) full load
Incremental load:
It is a process of inserting new records and updating existing records when coming with new values
Data mart:
It is a subset of data ware house (or) it is a subject oriented database to support middle level management (or) it is a high
There are two approaches to design the data mart
-
top-down approach (W.H inman):
-
bottom-up approach(Ralph Kimball):
- based on the design, data marts are categorized into two types
- dependant data mart
- independent data mart
- Data mart in to down approach is dependent. Because designing of these data marts depends upon design of enterprise data mart
- Data mart bottom approach is independent. Because designing of these data mart doesn’t depend upon enterprise data mart.
For indepth understanding Cognos Training click on
0 Responses on Phages of Data Acquisition in IBM Cognos"