Introduction to DataStage

Ratings:

(4)Views: 0

Share this blog:

DataStage Overview

It is a Comprehensive ETL Tool, Which provides, end to end ERP Solutions.

Some of the Most popular ETL Tools are:

DSPX àleader of ETL Tools, Started from 2006
Informatics
ODI
SAS (ETL STUDIO)
BODI
ABNITRO

Do you want to master DataStage? Then enrol in "DataStage Training" This course will help you to master DataStage

History of DataStage

Has more than 12 years of History

1^st release was in 1997

1997 – VMARK – UK - - >

Mr. LEE SCHEFFLER - - > Father of Data stage

- - >Data Stage was called as Data Integrator during 1997 - - > Torrent (Data Integrator)

DataStage

IBM has acquired Informix with Database is 2000.

2000 ASCENTIAL Data Stage Server Edition

Cit is the combination if Informix + Data Integrator

2000 ASCENTIAL Data Stage Server + DRCHESTRATE
Orchestrate is an ETL Tool, and is has Extensive parallel capabilities
It is only Executed on UNIX flavors

UNIX flavors

ALX
Linux
HPUX
SUNSOLARIS

- - > Due to the Combination with DRCHESTRATE, Data Stage acquired a parallel combination

Version – 6
Version – 6 – 7.5.1
Parallel Environment

ASCENTIAL Data Stage PX (Parallel extender)

It Can be Configured only on UNIX flavours

- - > Up to Version 7.5.1, Server Components are configured only on UNIX flavours à

2004 December

5 * 2 ASDSPX + MKS Tool kit

↓ ↓

Accentual Data Stage PX To create a virtual environment (like UNIX) In XP to run the Data Stage.

So, the MKS Tool kit has the capability to run the Data stage on windows.
5 * 2 ASDSPX + MKS Tool kit

Can perform only Data Transformation

- - >MKS Tool kit à Assential Suite Components

Release

(a) Profile Stage

(b) Quality Stage

(d)Meta Stage

(e)Data Stage PX

(f)Data Stage TX as Software

2005

- ->IBM has acquired entire ASCENTIAL - - > IBM Data Stage Enterprise Edition 7.5 *2 - - > (used by 50 % of users)

2006

- ->IBM Web sphere Data Stage & Quality Stage 8.0.1 - -> IDE (Integrated Environment) - -> (used by 40 % of users)

Integrated Environment of

(a) Profile Stage

(b) Quality Stage

(d)Meta Stage

(e)Data Stage PX

2009

- ->IBM infasphure Data Stage & Quality Stage 8.0.1 - -> Improved web servicers & Server has changed. - -> (used by 10 % of users)

Features of Data Stage

Any to Any
Platform Independent
Node Configuration
Portion Parallelism
Pipeline Parallelism

Any to Any

Reads the data from any Source and loads it to any Target.

Any SRC ↔ Any Target

Platform Independent

Designed for one O.S, can be executed

- - >Platform generally can be either Software or Hardware.

Platform Independent

- > In the Data stage, Platform is w. r. t Hardware.

Hardware environment

Uni processing Environment

Hard disk à CPU - - > RAM

Symmetric Multi-Processing: - (SMP)

Hard Disk

can have 32–64 CPU that is Hard disk with multiple CPU‘S

Massively Parallel processing:- (MPP)

MPP

Collection of different SMPS

Node Configuration

The best feature of the Data stage
It is a technique of creating logical CPUs

Node - - > logical CPU (or) instance of (physical) CPU

àIt is an S/W which will Create virtual CPU’S

Data Stage is Executed on logical CPU’S
TO run a job in the Data stage, WE require at least 1 Node.

EX:- ETL

UNI Process

Hard disk - - > CPU - - > RAM

To access 1000 records, it takes 10 mins.

SMP SMP

To access 1000 records, with 4 CPU’S it takes 2.5 min

Node config:

Uni Processing - - > Virtual SMF

Node Configuration

S is not using the max. capabilities of CPU, So Node config. is an S/W Which drives into different Nodes. That is Boost up the Capabilities & Energy level of CPU

Partition parallelism

- - > Horizontal Combining

- - > Combining primary rows with Secondary rows w. r. t Key column values

Partition Parallelism

Partitioning

It is a technique of distributing the records across the nodes, based on partitioning techniques.

Partitioning Techniques

In addition, We have a 9^th technique known as ‘AUTO’

NOTE:

Partitioning techniques plays an important role in Performance Tuning

Note:-

- - > Key-based technique assures that the same key column values are collected at the same partition.

Ex:-

EMP

DNO= Primary key

E NO	E Name	DNO
11	a	10
12	b	20
13	c	10
14	d	30
15	e	20

D NO	D Name	Loc
10	ACE	Hyd
20	Meter	Sec
30	Sales	Eng

When combine, I.e, using a horizontal combination

Horizontal combining

That is Same key column values are collected at the same partition

Repatriating

The Portioned data is once again repatriated

Ex:

EName	Dno	Loc
A	10	AP
B	20	TN
C	10	TN
D	20	KN
E	30	TN
F	10	KN
G	20	AP

Repatriating

Partitioning and Repatriating are automatic processes in the Data stage

Reverse Partitioning

Reverse Partitioning is collecting the data from the nodes.
It happens only in 1 Situation that is Parallel to Sequential.

Reverse Partitioning

Reverse Partitioning is also called as Collecting

Different Collecting Methods

Ordered
Round Robin
Sort – Merge
Auto

Pipeline Parallelism

Simultaneously doing the extraction of Transforming and loading jobs.

Pipe link

A channel through which data moves from one stage to another stage

Traditional Batch Processing:-

(Server jobs)

Sequential processing

EX:- for Suppose, We have 3 instructions

I1 – Fetch (F), Decode (D), Execute (E), Write lock (W)

I2 – F, D, E, W

I3 –F,D, E,W

- - > In sequential process

Traditional Batch Processing

Parallel Processing

Running all transactions in parallel

T1	T2	T3	T4	T5	T6	T7	T8
F	D	E	W
	F	D	E	W
		F	D	E	W
			F	D	E	W
				F	D	E	W

The core **difference between Version 7.5 *2 and 8.0.1 of DataStage**

*7.52**	8.0.1
4 client components DS Designer DS Director DS Manger DS Admin	5 client components DS Designer DS Director DS admin Web console
OS-dependent(OS; the user will be data stage users)	OS independent(User can be created at datastage, but one dependent)
File-based repository(Folder)	Database repository (default is DB/2)
No web-based administration	Web-based administration
2 architecture components Server client	5 architecture components common user interface common repository common engine common connectivity common shared services
can perform phase 3,4	Can perform phase 1,2,3,4
2 tier	N tier

Note:--

Features of Manager in 7.5 *2, are integrated into a designer in 8.0.1

(a) In 7.5 * 2 user id and used to login for authentication, are created in the O.S, O.S wires will become D.S users

(b) In 8.0.1, they are created at the Data stage Environment

Repository

In 7.5 *2, everything is Stored in the folder in the form of files

8.0.1

Data is organized in 2 layers

Global Repository àData base à more security
Local Repository à folder àperformance
In 8.0.1, admin can work from home that is using the web console component

4.(a) In 7.5 * 2 it is 2 –tier

S - - > server

C - - > machine

(b)In 8, We can have multiple Servers / Engine, Only 1 Repository

R- C1, C- C2, E1 – C3, E2 – C4, E3 – C4 ------En – Cn - - > n –tier components can be configured in n no of machines.

**Client components of 7.5 * 2 and 8.0.1**

**7.5*2 Designer**

Create jobs (Mainframe Jobs (MF), Server jobs (SJ) Sequence jobs (SJ), Parallel Jobs (PJ)
Compile
Run
Multiple job Compile

Director

Views

- Jobs
- Status
- Logs
Monitor
Batch Jobs
unlock jobs
Message Handing
Schedule jobs

Manager

Import / Export
Node Configuration

Admin

Create projects
Delete projects
Organize project

8.1.0 Designer

Create jobs (Mainframe Jobs (MF), Server jobs (SJ) Sequence jobs (SJ), Parallel Jobs (PJ)
Compile
Run
Multiple job Compile
Import / Export
Node Configuration
Advanced Find
Performance Analysis
Estimate Resource

Director

Views
Jobs
Status
Logs
Monitor
Batch Jobs
unlock jobs
Message Handing
Schedule jobs

Admin

Create projects
Delete projects
Organize projects

Web Console

Security Services
Reporting Services
Logging Services
Scheduling Services
Domain Management
Session Management

Information Analyzer:--/ Console for IBM Information Service

Data profiling (CA, PA, FA, Baseline, Cross-domain)

For an in-depth understanding of DataStage click on

You liked the article?

Like: 0

Vote for difficulty

Current difficulty (Avg): Medium

EasyMediumHardDifficultExpert

IMPROVE ARTICLEReport Issue

Recommended Courses

DataStage Training 4.95789 : Learners

Teradata Training 4.92763 : Learners

1/6

About Author

Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.

Stay Updated

Get stories of change makers and innovators from the startup ecosystem in your inbox

Related Blogs

Introduction to DataStage

DataStage Overview

History of DataStage

UNIX flavors

ASCENTIAL Data Stage PX (Parallel extender)

2004 December

↓ ↓

Integrated Environment of

Features of Data Stage

Any to Any

Platform Independent

Hardware environment

Uni processing Environment

Symmetric Multi-Processing: - (SMP)

Massively Parallel processing:- (MPP)

Node Configuration

UNI Process

Node config:

Partition parallelism

Partitioning

Ex:-

EMP

Repatriating

Reverse Partitioning

Different Collecting Methods

Pipeline Parallelism

Pipe link

Traditional Batch Processing:-

Parallel Processing

The core difference between Version 7.5 *2 and 8.0.1 of DataStage

Repository

8.0.1

Client components of 7.5 * 2 and 8.0.1

7.5*2 Designer

Director

8.1.0 Designer

Director

Admin

Information Analyzer:--/ Console for IBM Information Service

For an in-depth understanding of DataStage click on

Recommended Articles

Recommended Courses

About Author

The core **difference between Version 7.5 *2 and 8.0.1 of DataStage**

**Client components of 7.5 * 2 and 8.0.1**

**7.5*2 Designer**