Data warehousing is a process of building a data warehouse. A process includes.
- Business requirement analysis
- database design
- ETL development and testing
- Report development and testing
Business analysis and onsite technical coordinates gather the business requirement and technical requirements.
BRS (Business Requirement Specification)
A BRS contains the business requirement which is collected by an analysis.
An SRS contains software and hardware requirement which are collected by senior technical people.
The process of designing the database is called as a data modeling (or) dimensional modeling. A database artechict (or) data modeler designs the warehouse with a set of tables.
OLAP (Online Analytical Processing)
An OLAP is a technology that supports the business manager to make a query from the data warehouse. An OLAP provides the gateway between users and the data warehouses.
A data warehouse is known as an OLAP database
Ex:- Cognos, BO’s
Differences between OLTP database and Data Warehouse
Enterprise Data warehousing objects
A relational database is defined us collection of objects such as tables. Views, procedures, macro’s triggers etc...
Table:- A table is a two-dimensional object where the data can be stored in the form of rows and columns.
view:- A view is like a window into one (or) more table it provides a customized access to the base table and provides
- Restricting which columns are visible from base tales
- Restricting which rows are visible from base tables
- Combining rows and column from several base tables
It my a define a subject of rows of table
It my a define a subject of columns of table
Data Warehouse – RDBMS
The following are the relation databases can be defined to build a data warehouse.
One of the best RDBMS to store massive historical information. Parallel storage, retrieval is Teradata.
It is a process of extracting the relevant business information. Transforming data into a required business format and loading into the target system.
Data acquisition is defined with the following type of process
- data extraction
- data transformation
- data loading
There are two types of ETL used to build data Acquisition
- Codebase ETL
- GUI based ETL
An ETL application can be developed using some programming language such as SQL, PLSQL
Ex:- SAS base, SAS excess, Teradata, ETL Utilities
TERADATA ETL UTILITIES
- Fast load
- Multi Load
- T pump
GUI based ETL
ETL applications can be design with simple graphical user interfacing, point, and click techniques.
Ex: Informatics, data stage, abnitio, ODI (oracle data integrated), data services, data manager, SSIS (SQL server integration services)
It is a process of reading the data from various types of source systems. The following are a type of sources which are used to define extraction.
- ERP Source
- Oracle applications
- JD Edwards
- people soft
- File Sources
- XML Files
- Flat files
- Relational Sources
- SQL Server
- Sys bases
- Legacy Sources
- Main frames
- AS 400
- Cabal files etc
It is the process of transforming data and cleansing the data into the required business format.
The following are data transformation activities take place in the stage
- Data merging
- data cleansing
- data scrubbing
- data aggregation
Staging is a temporary memory where the following data transformation activities take place
- Data merging
- Data scrubbing
- Data cleansing
- Data aggregation
It is a process of changing inconsistencies and inaccuracy
It’s the process of removing unwanted data from staging
- Removing duplicates is a data cleansing
- Records which contains a null
- Removing specs
It is a process of deriving new attributes.
Attributes nothing but a table columns.
It’s process of calculating the summary ls from detailed data.
It’s a process of integrating the data from multiple source’s system
There are two type of data merge operation takes places in the staging
- Horizontal merging
- vertical merging
It’s process of merging the records horizontally.
It’s process of merging the records vertically when the two sources are having same “meta data” (union)
Meta data means data structures (Two (or) three table column names are same)
It’s the process of inserting the data into a target system. There are two type of data loads
- Initial load (or) delta load
- Incremental load (or) delta load
ETL Client Server Technology
An ETL plan defines extraction, transformation, and loading. An ETL plan is a design with following types of metadata.
It is the structure of source data (or) source table from data extracts.
It is the structure of the garget table to which data load.
It defined the business logic used to for processing the data.
It defined the structure of the data. Which is represented as column name, data type, precision, scale and key’s (primary, foreign key).
An ETL client is a graphical user component where an ETL developer can design ETL plane
An ETL Repository is a brain of an ETL system where you can store metadata such as ETL plants.
It is an ETL Engine that performer extraction transformation and loading.
In informatics, an ETL plan is a call as mapping
In data stage, it is called as job
Initial Load on full load
We are extracting data from source system to load the data into target system first time the records are entered directly into the target system
Incremental load (or) delta load
We are extracting data from the source system and load into targets system first time newly entered the records as well as update the records into the target system
If we want to design ETLPlan we need only metadata.
- Data definition we call it as metadata
- GUI base ETL uses we are extracting the data from the different databases (ex:- SQL, SQL Server, Sybase, Oracle) in the different places sources system.
- We have to give the length of data type same (or) more of the target system, but we have not given the less length of the source system.
Data warehouse – Database design
A data warehouse is a design with the following types of schema.
- star schema
- snow flake schema
- Galaxy schema (consternation schema, integrated schema, highbred schema and multi star schema)
- The process of designing the database is known as data modeling.
- A data base Architects (or) data modeler creates database schemas using a GUI base database designing tool called “ERIWN” it is a process of computer associates.
- Star Schema:- A star schema is a database design which contains centenary located fact table which is surrounded by dimension tables.
- Since database design looks like a star hence it called as star schema database design.
- In data warehouse facts are numeric. A fact table contains facts
- not every numeric is a fact, but numeric which is of type key performance indicator are known as facts.
- Facts are business measures which are used to evaluate the performance of an entire price
- A fact table contains the facts at lowest level granularity.
- A fact granularity determine the level of details
- A dimension is a descriptive data which describes the key performance know as facts
- A Dimension table contains a de-normalized data.
- A fact table contains a normalized data
- A fact table contains a component key where each candidate key s a foreign key to the dimension table
- A dimension provides an answer to the following business question.
Who ii. What iii. When iv. Where
Snow flake a large dimension table is spited into one (or) more table to flat and high achy
- n snow flake a dimension may have a parent tables.
- * a large dimension table spited into one (or) more normalized table (de composite dimension)
- Dim means dimension
- In snow flake, a large dimension table is spited into one or more table to flat and hierarchy
- In snowflake, a dimension may have a parent table
- A large dimension table spited into one or more normalized table (de-composite dimension)
- Dimension (D4) s spited into dimension four one (D4) and dimension four two (D42). D42 is a parent table, and all are child table.
- A schema s nothing but a table
KPI means key performance indicator
A numeric which is performing key role in the business analysis and estimating the business enterprise in the data warehouse is a fact.
A facts table contain a facts. Facts are numeric
Every numeric is not a fact
- Facts are business measure because the estimate the business performance
- A dimension is a descriptive data is stored called as facts.
- A Facts are analyses by descriptive data is called dimension
Mapping is an ETL plan which extraction – Transaction – Loading.
Data Warehouse – Galaxy schema
A Data warehouse is a design with integration of multiple stars schemas or snowflakes schema are both.
A galaxy schema is also known as hybrid schema or constellation schema.
It is a process of joining two fact tables.
A dimension table which can be shared by multiple fact tables is known as conformed dimensions.
Fact less fact table
A fact table with out any facts is known as fact less fact table.
A dimension which can’t be used to describe key performance indicators is known as junk dimension.
ex:- phone number, fax number, customer address, etc.
Slowly Changing dimensions
A dimension which can be altered over the period is known as the slowly changing dimension. There are three type3s of dimensions.
Type one dimension
A type one dimension stores only current data in the target it doesn’t maintain any history.
Type two dimension
A type two dimension mentions the full history in the target in each update it inserts a new record in the target.
Type three dimension
A type three dimension maintains partial history (current and previous information)
Difference of star and snowflake schema
- Star schema is a de-normalized it means “duplicated records are maintained:
- Snow flake schema spited table is a normalized it means don’t maintain the duplicate records.
- A snowflake schema is used to reduces the tablespaces
- Integration of star and snow flakes schema are called galaxy schema.
- Common (or) conformed, or reusable dimensional are shared by fact tables such as dimensional tables are called conformed
- A constellation is a process of joining two fact tables.
- Galaxy is the schema which can be more multiple schema
- Key performance indicator are called facts
- A fact table without any facts are called fact less
- A dimension which can’t be used two describe facts such dimension is Junk dimension
- A Junk dimension provide additional information to the main dimension
Common or conformed or reusable dimensions
- The dimensions are shared by facts tables such as dimensions tables are called conformed or reusable or common dimensions.
- A surrogate key is an artificial key that is treated as a “primary key”.
- A surrogate key is a system generated sequential number that is treated as a primary key.
OLAP is nothing but a set of specification (or) technology which client application in retrieving the data from the data warehouse.
OLAP is nothing but an interface (or) get way between the user and database.
Types of OLAPS
DOLAP (Desktop OLAP):-
An OLAP which can query the data from a database which is constructed by using desktop databases like dbase, FoxPro, clipper, etc.
XML file, TXT file, XL there are desktop database.
- ROLAP (Relation OLAP):- ROLAP is used to Query the data from relational sources like SQL, Oracle, Sybase, Teradata.
- MOLAP (MULTIOLAP):-It is used to Query the data from multi dimensional sources like a cube. DMR
- HDLAP (highbred OLAP):-It is a combination ROLAP and MOLAP.
For in-depth understanding of Informatica click on
- Informatica Tutorials & Interview Questions
- Introduction to Informatica
- Informatica Components
- Flat Files in Informatica
- Working with Tasks in Informatica