Informatica Concepts

Ratings:

(4)Views: 0

Share this blog:

Data warehousing

Data warehousing is a process of building a data warehouse. A process includes.

Business requirement analysis
database design
ETL development and testing
Report development and testing

Business analysis and onsite technical coordinates gather the business requirement and technical requirements.

BRS (Business Requirement Specification)

A BRS contains the business requirement which is collected by an analysis.

An SRS contains software and hardware requirement which are collected by senior technical people.

The process of designing the database is called as a data modeling (or) dimensional modeling. A database artechict (or) data modeler designs the warehouse with a set of tables.

If you want to enrich your career and become a professional in Informatica, then visit Tekslate - a global online training platform: "Informatica Training" This course will help you to achieve excellence in this domain.

OLAP (Online Analytical Processing)

An OLAP is a technology that supports the business manager to make a query from the data warehouse. An OLAP provides the gateway between users and the data warehouses.

A data warehouse is known as an OLAP database

Ex:- Cognos, BO’s

Differences between OLTP database and Data Warehouse

It is designed to support business transactions processing	it is design to support a decision making processing
Volatile data	non-volatile data
current data	historical data
Detailed data	summary data
design for running the business	design for analyzing the business
normalization	ok- normalization
application oriented data	subject oriented data
design for cortical operation	design for managerial operation
ER- modeling	dimensional modeling

Enterprise Data warehousing objects

A relational database is defined us collection of objects such as tables. Views, procedures, macro’s triggers etc...

Table:- A table is a two-dimensional object where the data can be stored in the form of rows and columns.

view:- A view is like a window into one (or) more table it provides a customized access to the base table and provides

Restricting which columns are visible from base tales
Restricting which rows are visible from base tables
Combining rows and column from several base tables

It my a define a subject of rows of table

It my a define a subject of columns of table

Data Warehouse – RDBMS

The following are the relation databases can be defined to build a data warehouse.

Oracle
SQL server
IBM DB2
Tera data
Greenplum
Netezza
Sybase
redbrick
Informix

One of the best RDBMS to store massive historical information. Parallel storage, retrieval is Teradata.

Data Acquisition

It is a process of extracting the relevant business information. Transforming data into a required business format and loading into the target system.

Data acquisition is defined with the following type of process

data extraction
data transformation
data loading

There are two types of ETL used to build data Acquisition

Codebase ETL
GUI based ETL

CodeBase ETL

An ETL application can be developed using some programming language such as SQL, PLSQL

Ex:- SAS base, SAS excess, Teradata, ETL Utilities

TERADATA ETL UTILITIES

BTEQ
Fast load
Multi Load
T pump

GUI based ETL

ETL applications can be design with simple graphical user interfacing, point, and click techniques.

Ex: Informatics, data stage, abnitio, ODI (oracle data integrated), data services, data manager, SSIS (SQL server integration services)

Data Extraction

It is a process of reading the data from various types of source systems. The following are a type of sources which are used to define extraction.

ERP Source
SAP
Oracle applications
JD Edwards
people soft
File Sources
XML Files
Flat files
Relational Sources
Oracle
SQL Server
DB2
Sys bases
Legacy Sources
Main frames
AS 400
Cabal files etc

Data transformation:-

It is the process of transforming data and cleansing the data into the required business format.

The following are data transformation activities take place in the stage

Data merging
data cleansing
data scrubbing
data aggregation

Informatica Interview Questions & Answers

Staging

Staging is a temporary memory where the following data transformation activities take place

Data merging
Data scrubbing
Data cleansing
Data aggregation

Data Cleansing

It is a process of changing inconsistencies and inaccuracy

(or)

It’s the process of removing unwanted data from staging

Ex:-

Screenshot_7

Examples

Removing duplicates is a data cleansing
Records which contains a null
Removing specs

Data Scrubbing

It is a process of deriving new attributes.

Attributes nothing but a table columns.

Screenshot_8

Data Aggregation

It’s process of calculating the summary ls from detailed data.

Screenshot_9

Data Merging

It’s a process of integrating the data from multiple source’s system

There are two type of data merge operation takes places in the staging

Horizontal merging
vertical merging

Horgintal Merging

It’s process of merging the records horizontally.

Vertical Merging

It’s process of merging the records vertically when the two sources are having same “meta data” (union)

Meta data means data structures (Two (or) three table column names are same)

Screenshot_10

Data Loading

It’s the process of inserting the data into a target system. There are two type of data loads

Initial load (or) delta load
Incremental load (or) delta load

ETL Client Server Technology

ETL Plan

An ETL plan defines extraction, transformation, and loading. An ETL plan is a design with following types of metadata.

Source definition

It is the structure of source data (or) source table from data extracts.

Target definition

It is the structure of the garget table to which data load.

Transformation rule

It defined the business logic used to for processing the data.

Meta Data

It defined the structure of the data. Which is represented as column name, data type, precision, scale and key’s (primary, foreign key).

Screenshot_12

ETL Client

An ETL client is a graphical user component where an ETL developer can design ETL plane

ETL Repository

An ETL Repository is a brain of an ETL system where you can store metadata such as ETL plants.

ETL Server

It is an ETL Engine that performer extraction transformation and loading.

In informatics, an ETL plan is a call as mapping

In data stage, it is called as job

Initial Load on full load

We are extracting data from source system to load the data into target system first time the records are entered directly into the target system

Screenshot_3

Incremental load (or) delta load

We are extracting data from the source system and load into targets system first time newly entered the records as well as update the records into the target system

If we want to design ETLPlan we need only metadata.

Data definition we call it as metadata
GUI base ETL uses we are extracting the data from the different databases (ex:- SQL, SQL Server, Sybase, Oracle) in the different places sources system.
We have to give the length of data type same (or) more of the target system, but we have not given the less length of the source system.

Data warehouse – Database design

A data warehouse is a design with the following types of schema.

star schema
snow flake schema
Galaxy schema (consternation schema, integrated schema, highbred schema and multi star schema)
The process of designing the database is known as data modeling.
A data base Architects (or) data modeler creates database schemas using a GUI base database designing tool called “ERIWN” it is a process of computer associates.
Star Schema:- A star schema is a database design which contains centenary located fact table which is surrounded by dimension tables.

Since database design looks like a star hence it called as star schema database design.
In data warehouse facts are numeric. A fact table contains facts
not every numeric is a fact, but numeric which is of type key performance indicator are known as facts.
Facts are business measures which are used to evaluate the performance of an entire price
A fact table contains the facts at lowest level granularity.
A fact granularity determine the level of details
A dimension is a descriptive data which describes the key performance know as facts
A Dimension table contains a de-normalized data.
A fact table contains a normalized data
A fact table contains a component key where each candidate key s a foreign key to the dimension table
A dimension provides an answer to the following business question.

Who ii. What iii. When iv. Where

Snow Flake

Snow flake a large dimension table is spited into one (or) more table to flat and high achy

n snow flake a dimension may have a parent tables.
* a large dimension table spited into one (or) more normalized table (de composite dimension)

Screenshot_5

Dim means dimension

Snow Flake

In snow flake, a large dimension table is spited into one or more table to flat and hierarchy
In snowflake, a dimension may have a parent table
A large dimension table spited into one or more normalized table (de-composite dimension)

Screenshot_6

Dimension (D4) s spited into dimension four one (D4) and dimension four two (D42). D42 is a parent table, and all are child table.

Extra:-

A schema s nothing but a table

Screenshot_7

KPI means key performance indicator

A numeric which is performing key role in the business analysis and estimating the business enterprise in the data warehouse is a fact.

A facts table contain a facts. Facts are numeric

Every numeric is not a fact

Facts are business measure because the estimate the business performance
A dimension is a descriptive data is stored called as facts.
A Facts are analyses by descriptive data is called dimension

Mapping is an ETL plan which extraction – Transaction – Loading.

Data Warehouse – Galaxy schema

A Data warehouse is a design with integration of multiple stars schemas or snowflakes schema are both.

A galaxy schema is also known as hybrid schema or constellation schema.

Fact Constellation

It is a process of joining two fact tables.

Conformed dimensions

A dimension table which can be shared by multiple fact tables is known as conformed dimensions.

Fact less fact table

A fact table with out any facts is known as fact less fact table.

Junk dimensions

A dimension which can’t be used to describe key performance indicators is known as junk dimension.

ex:- phone number, fax number, customer address, etc.

Slowly Changing dimensions

A dimension which can be altered over the period is known as the slowly changing dimension. There are three type3s of dimensions.

Type one dimension

A type one dimension stores only current data in the target it doesn’t maintain any history.

Type two dimension

A type two dimension mentions the full history in the target in each update it inserts a new record in the target.

Type three dimension

A type three dimension maintains partial history (current and previous information)

Dirty dimensions

Screenshot_11

Screenshot_12

Extra

Difference of star and snowflake schema

Star schema is a de-normalized it means “duplicated records are maintained:
Snow flake schema spited table is a normalized it means don’t maintain the duplicate records.
A snowflake schema is used to reduces the tablespaces

Galaxy Schema

Integration of star and snow flakes schema are called galaxy schema.
Common (or) conformed, or reusable dimensional are shared by fact tables such as dimensional tables are called conformed
A constellation is a process of joining two fact tables.
Galaxy is the schema which can be more multiple schema
Key performance indicator are called facts
A fact table without any facts are called fact less
A dimension which can’t be used two describe facts such dimension is Junk dimension

Screenshot_13

A Junk dimension provide additional information to the main dimension

Common or conformed or reusable dimensions

The dimensions are shared by facts tables such as dimensions tables are called conformed or reusable or common dimensions.

Screenshot_14

Screenshot_15

A surrogate key is an artificial key that is treated as a “primary key”.
A surrogate key is a system generated sequential number that is treated as a primary key.

OLAP

OLAP is nothing but a set of specification (or) technology which client application in retrieving the data from the data warehouse.

Screenshot_16

OLAP is nothing but an interface (or) get way between the user and database.

Types of OLAPS

DOLAP (Desktop OLAP):-

An OLAP which can query the data from a database which is constructed by using desktop databases like dbase, FoxPro, clipper, etc.

XML file, TXT file, XL there are desktop database.

ROLAP (Relation OLAP):- ROLAP is used to Query the data from relational sources like SQL, Oracle, Sybase, Teradata.
MOLAP (MULTIOLAP):-It is used to Query the data from multi dimensional sources like a cube. DMR
HDLAP (highbred OLAP):-It is a combination ROLAP and MOLAP.

For in-depth understanding of Informatica click on

You liked the article?

Like: 1

Vote for difficulty

Current difficulty (Avg): Medium

EasyMediumHardDifficultExpert

IMPROVE ARTICLEReport Issue

Recommended Courses

DataStage Training 4.95789 : Learners

Teradata Training 4.92763 : Learners

1/6

About Author

Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.

Stay Updated

Get stories of change makers and innovators from the startup ecosystem in your inbox

Related Blogs

Informatica Concepts

Data warehousing

BRS (Business Requirement Specification)

OLAP (Online Analytical Processing)

Differences between OLTP database and Data Warehouse

Enterprise Data warehousing objects

Data Warehouse – RDBMS

Data Acquisition

CodeBase ETL

TERADATA ETL UTILITIES

GUI based ETL

Data Extraction

Staging

Data Cleansing

Examples

Data Scrubbing

Data Aggregation

Data Merging

Horgintal Merging

Vertical Merging

Data Loading

ETL Client Server Technology

Source definition

Target definition

Transformation rule

Meta Data

ETL Client

ETL Repository

ETL Server

Initial Load on full load

Informatica Components

Incremental load (or) delta load

Data warehouse – Database design

Snow Flake

Snow Flake

Extra:-

Data Warehouse – Galaxy schema

Fact Constellation

Conformed dimensions

Fact less fact table

Junk dimensions

Slowly Changing dimensions

Type one dimension

Type two dimension

Type three dimension

Dirty dimensions

Informatica Transformations

Extra

Difference of star and snowflake schema

Galaxy Schema

Common or conformed or reusable dimensions

OLAP

Types of OLAPS

Recommended Articles

Recommended Courses

About Author