Informatica Concepts

Ratings:
(4)
Views:0
Banner-Img
  • Share this blog:

Data warehousing

Data warehousing is a process of building a data warehouse. A process includes.

  • Business requirement analysis
  • database design
  • ETL development and testing
  • Report development and testing

Business analysis and onsite technical coordinates gather the business requirement and technical requirements.

BRS (Business Requirement Specification)

A BRS contains the business requirement which is collected by an analysis.

An SRS contains software and hardware requirement which are collected by senior technical people.

The process of designing the database is called as a data modeling (or) dimensional modeling. A database artechict (or) data modeler designs the warehouse with a set of tables.

 

If you want to enrich your career and become a professional in Informatica, then visit Tekslate - a global online training platform: "Informatica Training" This course will help you to achieve excellence in this domain.

 

OLAP (Online Analytical Processing)

An OLAP is a technology that supports the business manager to make a query from the data warehouse. An OLAP provides the gateway between users and the data warehouses.

A data warehouse is known as an OLAP database

Ex:- Cognos, BO’s

Differences between OLTP database and Data Warehouse

  • It is designed to support business transactions processing
  • it is design to support a decision making processing
  • Volatile data
  • non-volatile data
  • current data
  • historical data
  • Detailed data
  • summary data
  • design for running the business
  • design for analyzing the business
  • normalization
  • ok- normalization
  • application oriented data
  • subject oriented data
  • design for cortical operation
  • design for managerial operation
  • ER- modeling
  • dimensional modeling

Enterprise Data warehousing objects

A relational database is defined us collection of objects such as tables. Views, procedures, macro’s triggers etc...

Table:- A table is a two-dimensional object where the data can be stored in the form of rows and columns.

view:- A view is like a window into one (or) more table it provides a customized access to the base table and provides

  • Restricting which columns are visible from base tales
  • Restricting which rows are visible from base tables
  • Combining rows and column from several base tables

It my a define a subject of rows of table

It my a define a subject of columns of table

Data Warehouse – RDBMS

The following are the relation databases can be defined to build a data warehouse.

One of the best RDBMS to store massive historical information. Parallel storage, retrieval is Teradata.

Data Acquisition

It is a process of extracting the relevant business information. Transforming data into a required business format and loading into the target system.

Data acquisition is defined with the following type of process

  • data extraction
  • data transformation
  • data loading

There are two types of ETL used to build data Acquisition

  • Codebase ETL
  • GUI based ETL

CodeBase ETL

An ETL application can be developed using some programming language such as SQL, PLSQL

Ex:- SAS base, SAS excess, Teradata, ETL Utilities

TERADATA ETL UTILITIES

  • BTEQ
  • Fast load
  • Multi Load
  • T pump

GUI based ETL

ETL applications can be design with simple graphical user interfacing, point, and click techniques.

Ex: Informatics, data stage, abnitio, ODI (oracle data integrated), data services, data manager, SSIS (SQL server integration services)

Data Extraction

It is a process of reading the data from various types of source systems. The following are a type of sources which are used to define extraction.

  • ERP Source
  • SAP
  • Oracle applications
  • JD Edwards
  • people soft
  • File Sources
  • XML Files
  • Flat files
  • Relational Sources
  • Oracle
  • SQL Server
  • DB2
  • Sys bases
  • Legacy Sources
  • Main frames
  • AS 400
  • Cabal files etc

Data transformation:-

It is the process of transforming data and cleansing the data into the required business format.

The following are data transformation activities take place in the stage

  • Data merging
  • data cleansing
  • data scrubbing
  • data aggregation

Informatica Interview Questions & Answers

Staging

Staging is a  temporary memory where the following data transformation activities take place

  • Data merging
  • Data scrubbing
  • Data cleansing
  • Data aggregation

Data Cleansing

It is a process of changing inconsistencies and inaccuracy

(or)

It’s the process of removing unwanted data from staging

Ex:-

Screenshot_7

Examples

  • Removing duplicates is a data cleansing
  • Records which contains a null
  • Removing specs

Data Scrubbing

It is a process of deriving new attributes.

Attributes nothing but a table columns.

Screenshot_8

 

Data Aggregation

It’s process of calculating the summary ls from detailed data.

Screenshot_9

Data Merging

It’s a process of integrating the data from multiple source’s system

There are two type of data merge operation takes places in the staging

  • Horizontal merging
  • vertical merging

Horgintal Merging

It’s process of merging the records horizontally.

Vertical Merging

It’s process of merging the records vertically when the two sources are having same “meta data” (union)

Meta data means data structures (Two (or) three table column names are same)

Screenshot_10

Data Loading

It’s the process of inserting the data into a target system. There are two type of data loads

  • Initial load (or) delta load
  • Incremental load (or) delta load

ETL Client Server Technology

ETL Plan

An ETL plan defines extraction, transformation, and loading. An ETL plan is a design with following types of metadata.

  • Source definition

It is the structure of source data (or) source table from data extracts.

  •  Target definition

It is the structure of the garget table to which data load.

  • Transformation rule

It defined the business logic used to for processing the data.

Meta Data

It defined the structure of the data. Which is represented as column name, data type, precision, scale and key’s (primary, foreign key).

Screenshot_12

ETL Client

An ETL client is a graphical user component where an ETL developer can design ETL plane

ETL Repository

An ETL Repository is a brain of an ETL system where you can store metadata such as ETL plants.

ETL Server

It is an ETL Engine that performer extraction transformation and loading.

In informatics, an ETL plan is a call as mapping

In data stage, it is called as job

Initial Load on full load

We are extracting data from source system to load the data into target system first time the records are entered directly into the target system

Screenshot_3

Informatica Components 

Incremental load (or) delta load

We are extracting data from the source system and load into targets system first time newly entered the records as well as update the records into the target system

If we want to design ETLPlan we need only metadata.

  • Data definition we call it as metadata
  • GUI base ETL uses we are extracting the data from the different databases (ex:- SQL, SQL Server, Sybase, Oracle) in the different places sources system.
  • We have to give the length of data type same (or) more of the target system, but we have not given the less length of the source system.

Data warehouse – Database design

A data warehouse is a design with the following types of schema.

  • star schema
  • snow flake schema
  • Galaxy schema (consternation schema, integrated schema, highbred schema and multi star schema)
  • The process of designing the database is known as data modeling.
  • A data base Architects (or) data modeler creates database schemas using a GUI base database designing tool called “ERIWN” it is a process of computer associates.
  • Star Schema:- A star schema is a database design which contains centenary located fact table which is surrounded by dimension tables.
  • Since database design looks like a star hence it called as star schema database design.
  • In data warehouse facts are numeric. A fact table contains facts
  • not every numeric is a fact, but numeric which is of type key performance indicator are known as facts.
  • Facts are business measures which are used to evaluate the performance of an entire price
  • A fact table contains the facts at lowest level granularity.
  • A fact granularity determine the level of details
  • A dimension is a descriptive data which describes the key performance know as facts
  • A Dimension table contains a de-normalized data.
  • A fact table contains a normalized data
  • A fact table contains a component key where each candidate key s a foreign key to the dimension table
  • A dimension provides an answer to the following business question.

Who ii. What iii. When iv. Where

  1. Snow Flake

Snow flake a large dimension table is spited into one (or) more table to flat and high achy

  • n snow flake a dimension may have a parent tables.
  • * a large dimension table spited into one (or) more normalized table (de composite dimension)

Screenshot_5

  • Dim means dimension

Snow Flake

  • In snow flake, a large dimension table is spited into one or more table to flat and hierarchy
  • In snowflake, a dimension may have a parent table
  • A large dimension table spited into one or more normalized table (de-composite dimension) 

Screenshot_6

  • Dimension (D4) s spited into dimension four one (D4) and dimension four two (D42). D42 is a parent table, and all are child table.

 

Extra:-

  • A schema s nothing but a table

Screenshot_7

KPI means key performance indicator

A numeric which is performing key role in the business analysis and estimating the business enterprise in the data warehouse is a fact.

A facts table contain a facts. Facts are numeric

Every numeric is not a fact

  • Facts are business measure because the estimate the business performance
  • A dimension is a descriptive data is stored called as facts.
  • A Facts are analyses by descriptive data is called dimension Screenshot_8

Mapping is an ETL plan which extraction – Transaction – Loading.

 

Data Warehouse – Galaxy schema

A Data warehouse is a design with integration of multiple stars schemas or snowflakes schema are both.

A galaxy schema is also known as hybrid schema or constellation schema.

Fact Constellation

It is a process of joining two fact tables.

Conformed dimensions

A dimension table which can be shared by multiple fact tables is known as conformed dimensions.

Fact less fact table

A fact table with out any facts is known as fact less fact table.

Junk dimensions

A dimension which can’t be used to describe key performance indicators is known as junk dimension.

ex:- phone number, fax number, customer address, etc.

Slowly Changing dimensions

A dimension which can be altered over the period is known as the slowly changing dimension. There are three type3s of dimensions.

  • Type one dimension

A type one dimension stores only current data in the target it doesn’t maintain any history.

  • Type two dimension

A type two dimension mentions the full history in the target in each update it inserts a new record in the target.

  • Type three dimension

A type three dimension maintains partial history (current and previous information)

Dirty dimensions

Screenshot_11

Screenshot_12

Informatica Transformations 

Extra

Difference of star and snowflake schema

  • Star schema is a de-normalized it means “duplicated records are maintained:
  • Snow flake schema spited table is a normalized it means don’t maintain the duplicate records.
  • A snowflake schema is used to reduces the tablespaces

Galaxy Schema

  • Integration of star and snow flakes schema are called galaxy schema.
  • Common (or) conformed, or reusable dimensional are shared by fact tables such as dimensional tables are called conformed
  • A constellation is a process of joining two fact tables.
  • Galaxy is the schema which can be more multiple schema
  • Key performance indicator are called facts
  • A fact table without any facts are called fact less
  • A dimension which can’t be used two describe facts such dimension is Junk dimension

Screenshot_13

  • A Junk dimension provide additional information to the main dimension

Common or conformed or reusable dimensions

  • The dimensions are shared by facts tables such as dimensions tables are called conformed or reusable or common dimensions.

Screenshot_14

Screenshot_15

  • A surrogate key is an artificial key that is treated as a “primary key”.
  • A surrogate key is a system generated sequential number that is treated as a primary key.

OLAP

OLAP is nothing but a set of specification (or) technology which client application in retrieving the data from the data warehouse.

Screenshot_16

OLAP is nothing but an interface (or) get way between the user and database.

Types of OLAPS

DOLAP (Desktop OLAP):-

       An OLAP which can query the data from a database which is constructed by using desktop databases like dbase, FoxPro, clipper, etc.

XML file, TXT file, XL there are desktop database.

  • ROLAP (Relation OLAP):- ROLAP is used to Query the data from relational sources like SQL, Oracle, Sybase, Teradata.
  • MOLAP (MULTIOLAP):-It is used to Query the data from multi dimensional sources like a cube. DMR
  • HDLAP (highbred OLAP):-It is a combination ROLAP and MOLAP.

For in-depth understanding of Informatica click on

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.