• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Informatica Concepts

Data warehousing

A data warehousing is a process of building a data warehouse. A process includes.

  • Business requirement analysis
  • database design
  • ETL development and testing
  • Report development and testing

A business analysis and onsite technical co-ordinates gathers the business requirement and technical requirements.

BRS (Business Requirement Specification)

A BRS contains the business requirement which are collected by a analysis.

A SRS contains software and hardware requirement which are collected by senior technical people.

The process of designing the database is called as a data modeling (or) dimensional modeling. A database artechict (or) data modeler designs the warehouse with set of tables.

OLAP (Online Analytical Processing)

An OLAP is a technology which supports the business manager to make a query from data warehouse. An OLAP provides the gateway between users and data warehouse.

A data warehouse is known as OLAP database

Ex:- Cognos, BO’s

Differences between OLTP database and Data Warehouse

  • It is design to support business transactions processing
  • it is design to support a decision making processing
  • Volatile data
  • non-volatile data
  • current data
  • historical data
  • Detailed data
  • summary data
  • design for running the business
  • design for analyzing the business
  • normalization
  • ok- normalization
  • application oriented data
  • subject oriented data
  • design for cortical operation
  • design for managerial operation
  • ER- modeling
  • dimensional modeling

Enterprise Data warehousing objects

A relational database is a defined us collection of objects such as table. Views, procedures, macro’s triggers etc…

Table:- A table is a two dimensional object where the data can be stored in the form of rows and columns.

view:- A view is like a window into one (or) more table it provides a customize access to the base table and provides

  • Restricting which columns are visible from base tales
  • Restricting which rows are visible from base tables
  • Combining rows and column from several base tables

It my a define a subject of rows of table

It my a define a subject of columns of table

Data Warehouse – RDBMS

The following are the relation databases can be defined to build data warehouse.

One of the best RDBMS to store massive historical information. Parallel storage, retrival is Teradata.

Data Acquisition

It is a process of extracting the relevant business information. Transforming data into a required business format and loading into the target system.

A data acquision is defined with following type of process

  • data extraction
  • data transformation
  • data loading

There are two type of ETL used to build data Acquisition

  • Code base ETL
  • GUI base ETL

Code Base ETL

An ETL application can be developed using sum programming language such as SQL, PLSQL

Ex:- SAS base, SAS excess, Teradata, ETL Utilities

TERADATA ETL UTTILITIES

  • BTEQ
  • Fast load
  • Multi Load
  • T pump

GUI base ETL

An ETL applications can be design with simple graphical user interfacing, point and click techniques.

Ex: Informatics, data stage, abnitio, ODI (oracle data integrated), data services, data manager, SSIS (SQL server integration services)

Data Extraction

It is a process of reading the data from various types of source systems. The following are type of sources which are used to define extraction.

  • ERP Source
  • SAP
  • Oracle applications
  • JD Edwards
  • people soft
  • File Sources
  • XML Files
  • Flat files
  • Relational Sources
  • Oracle
  • SQL Server
  • DB2
  • Sys bases
  • Legacy Sources
  • Main frames
  • AS 400
  • Cabal files etc

Data transformation:-

It is the process of transformation data and cleansing the data into the required business format.

The following are data transformation activities takes place in the stage

  • Data merging
  • data cleansing
  • data scrubbing
  • data aggregation

Staging

A staging is a  temporary memory where the following data transformation activities takes place

  • Data merging
  • Data scrubbing
  • Data cleansing
  • Data aggregation

Data Cleansing

It is a process of changing inconsistencies and inaccuracy

(or)

It’s process of removing unwanted data from staging

Ex:-

Screenshot_7

Examples

  • Removing duplicates is a data cleansing
  • Records which contains a null
  • Removing specs

Data Scrubbing

It is a process of deriving new attributes.

Attributes nothing but a table columns.

Screenshot_8

Data Aggregation

It’s process  of calculating the summary ls from detailed data.

Screenshot_9

Data Merging

It’s a process of integrating the data from multiple source’s system

There are two type of data merge operation takes places in the staging

  • Horizontal merging
  • vertical merging

Horgintal Merging

It’s process of merging the records horizontally.

Vertical Merging

It’s process of merging the records vertically when the two sources are having same “meta data” (union)

Meta data means data structures (Two (or) three table column names are same)

Screenshot_10

Data Loading

It’s the process of inserting the data into a target system. There are two type of data lods

  • Initial load (or) delta load
  • Incremental load (or) delta load

ETL Client Server Technology

ETL Plan

An ETL plan defines extraction, transformation, and loading. An ETL plan is design with following types of metadata.

  • Source definition

It is the structure of source data (or) source table from data extracts.

  •  Target definition

It is the structure of the garget table to which data load.

  • Transformation rule

It defined the business logic used to for processing the data.

Meta Data

It defined the structure of the data. Which is represent as column name, data type, precision, scale and key’s (primary, foreign key).

Screenshot_12

ETL Client

An ETL client is a graphical user component where an ETL developer can design ETL plane

ETL Repository

An ETL Repository is a brain of an ETL system where you can store metadata such as ETL plants.

ETL Server

It is an ETL Engine which performer extraction transformation and loading.

In informatics  an ETL plan is call as mapping

In data stage it is called as job

Initial Load on full load

We are extracting data from source system to load the data into target system first time the records are entered directly to the target system

Screenshot_3

Incremental load (or) delta load

We are extracting data from the source system and load into targets system first time newly entered the records as well as update the records into the target system

If we want to design ETLPlan we need only metadata.

  • Data definition we call it as metadata
  • GUI base ETL uses we are extracting the data from the different databases (ex:- SQL, SQL server, Sybase, oracle) in the different places sources system.
  • We have to give the length of data type same (or) more of the target system but we have not give the less length of source system.

Data warehouse – Database design

A data warehouse is design with the following types of schema.

  • star schema
  • snow flake schema
  • Galaxy schema (consternation schema, integrated schema, highbred schema and multistar schema)
  • The process of designing the database is know as data modeling.
  • A data base Architects (or) data modeler creates database schemas using a GUI base database designing tool called “ERIWN” it is process of computer associates.
  • Star Schema:- A star schema is a database design which contains centenary located fact table which is surrounded by dimension tables.
  • Since database design looks like a star hence it called as star schema database design.
  • In data warehouse facts are numeric. A fact table contains facts
  • not every numeric is a fact but numeric which are of type key performance indicator are know as facts.
  • facts are business measures which are used to evaluate the performance of an entair price
  • A fact table contain the facts at lowest level granularity.
  • a fact granularity determine level of details
  • A dimension is a descriptive data which describes the key performance know as facts
  • A Dimension table contains a de-normalized data.
  • A fact table contains a normalized data
  • A fact table contains a component key where each candidate key s a foreign key to the dimension table
  • A dimension provides answer to the following business question.

who ii. what iii. when iv. where

  1. Snow Flake

Snow flake a large dimension table is spited in to one (or) more table to flat and high achy

  • n snow flake a dimension may have a parent tables.
  • * a large dimension table spited into one (or) more normalized table (de composite dimension)

Screenshot_5

  • Dim means dimension

Snow Flake

  • In snow flake a large dimension table is spited into one or more table to flat and hierarchy
  • In snowflake a dimension may have a parent table
  • A large dimension table spited into one or more normalized table (de-composite dimension) 

Screenshot_6

  • Dimension (D4) s spited into dimension four one (D4) and dimension four two (D42). D42 is a parent table and all are child table.

Extra:-

  • A schema s nothing but a table

Screenshot_7

KPI means key performance indicator

A numeric which is performing a key role in the business analysis and estimating the business enterprise in the data warehouse is a facts.

A facts table contain a facts. Facts are numeric

Every numeric is not a facts

  • Facts are business measure because the estimate the business performance
  • A dimension is a descriptive data is stored is called as facts.
  • A Facts are analyses by descriptive data is called dimension Screenshot_8

Mapping is a ETL plan which extraction – Transaction – Loading.

Data Warehouse – Galaxy schema

A Data warehouse is a design with integration of multiple stars schemas or snow flakes schema are both.

A galaxy schema is also known as hybrid schema or constellation schema.

Fact Constellation

It is a process of joining two fact tables.

Conformed dimensions

A dimension table which can be shared by multiple fact tables is know as conformed dimensions.

Fact less fact table

A fact table with out any facts is know as fact less fact table.

Junk dimensions

A dimension which can’t be used to describe key performance indicators is known as junk dimension.

ex:- phone number, fax number, customer address etc.

Slowly Changing dimensions

A dimension which can be changed over the period of time is known as slowly changing dimension. There are three type3s of dimensions.

  • Type one dimension

A type one dimension stores only current data in the target it doesn’t maintain any history.

  • Type two dimension

A type two dimension mentions the full history in the target in each update it inserts a new record in the target.

  • Type three dimension

A type three dimension maintains partial history (current and previous information)

Interested in mastering Informatica Training? Enroll now for FREE demo on Informatica Training Online.

Dirty dimensions

Screenshot_11

Screenshot_12

Extra

Difference of star and snow flake schema

  • Star schema is a de-normalized it means “duplicated records are maintained:
  • Snow flake schema spited table are is a normalized it means don’t maintained the duplicate records.
  • Snow flake schema is used to reduces the table spaces

Galaxy Schema

  • Integration of star and snow flakes schema are called galaxy schema.
  • Common (or) conformed or reusable dimensional are shared by fact tables such as dimensional tables are called conformed
  • A constellation is a process of joining two fact tables.
  • Galaxy is the schema which can be more multiple schema
  • Key performance indicator are called facts
  • A fact table without any facts are called fact less
  • A dimension which can’t be used two describe facts such dimension are Junk dimension

Screenshot_13

  • A Junk dimension provide addition information to the main dimension

Common or conformed or reusable dimensions

  • The dimensions are shared by facts tables such as dimensions tables are called conformed or reusable or common dimensions.

Screenshot_14

Screenshot_15

  • A surrogate key is a artificial key that is treated as a “primary key”.
  • A surrogate key is a system generated sequential number that is treated as a primary key.

OLAP

OLAP is nothing but set of specification (or) technology which sclient application in retrieving the data from data warehouse.

Screenshot_16

OLAP is nothing but a interface (or) get way between the user and database.

Types of OLAPS

DOLAP (Desktop OLAP):-

       A OLAP which can query the data from a database which is constructed by using desktop databases like dbase, FoxPro, clipper etc

  • XML file, TXT file, XL there are desktop database.
  • ROLAP (Relation OLAP):- ROLAP is used to Query the data from relational sources like SQL, Oracle, Sybase, Teradata.
  • MOLAP (MULTIOLAP):-It is used to Query the data from multi dimensional sources like cube. DMR
  • HDLAP (highbred OLAP):-It is a combination ROLAP and MOLAP.

For indepth understanding of Informatica click on

Summary
Review Date
Reviewed Item
Informatica Concepts
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Informatica Concepts"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Site Disclaimer, Copyright © 2016 - All Rights Reserved.

    Support


    Please leave a message and we'll get back to you soon.

    I agree to be contacted via e-mail.