• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Introduction to DataStage

DataStage Overview

It is an Comprehensive ETL Tool, Which provides, end to end ERP Solutions

Some of the Most popular ETL Tools are:

DSPX àleader of ETL Tools, Started from 2006

Informatics

ODI

SAS (ETL STUDIO)

BODI

ABNITRO

 

History of DataStage

Has more than 12 years of History

1st release was in 1997

1997 – VMARK – UK  – – >

Mr. LEE SCHEFFLER  – – > Father of Data stage

 – – >Data Stage was called as Data Integrator   during 1997  – – > Torrent (Data Integrator)

Screenshot_21

 

 

IBM has acquired Informix with Database is 2000.

  • 2000 ASCENTIAL Data Stage Server Edition

Cit is the combination if Informix + Data Integrator

  • 2000 ASCENTIAL Data Stage Server + DRCHESTRATE
  • Orchestrate is an ETL Tool, and is has Extensive parallel capabilities
  • It is only Executed on UNIX flavors

 

 UNIX flavors

  • ALX
  • Linux
  • HPUX
  • SUNSOLARIS

– – > Due to the Combination with DRCHESTRATE, Data Stage acquired parallel combination

  • Version – 6
  • Version – 6 – 7.5.1
  • Parallel Environment

 

ASCENTIAL Data Stage PX (Parallel extender)

It Can be Configured only on UNIX flavors

– – > Up to Version 7.5.1, Server Components are configured only on UNIX flavors à

 

2004 December

  • 5 * 2 ASDSPX + MKS Tool kit

            ↓            

Accentual Data Stage PX      To create virtual environment (like UNIX)  In XP to run the Data Stage.

  • So, MKS Tool kit has the capability to run the Data stage on windows.
  • 5 * 2 ASDSPX + MKS Tool kit

Can perform only Data Transformation

 

– – >MKS Tool kit à Assential  Suite Components

Release

(a) Profile Stage

(b) Quality  Stage

(c) Audit Stage

(d)Meta Stage

(e)Data Stage PX

(f)Data Stage TX as Software

 

2005

 

– ->IBM has acquired entire ASCENTIAL – – >     IBM Data Stage Enterprise Edition 7.5 *2 – – >  (used by 50 % of users)

 

2006

 

– ->IBM Web sphere Data Stage & Quality Stage 8.0.1   – -> IDE (Integrated Environment)  – -> (used by 40 % of users)

 

Integrated Environment of

(a) Profile Stage

(b) Quality Stage

(c) Audit Stage

(d)Meta Stage

(e)Data Stage PX

 

 

2009

– ->IBM infasphure   Data Stage & Quality Stage 8.0.1   – ->  Improved web servicers  & Server has changed.  – -> (used by 10 % of users)

 

Features of Data Stage

 

  1. Any to Any
  2. Platform Independent
  3. Node Configuration
  4. Portion Parallelism
  5. Pipeline Parallelism

 

Any to Any

Reads the data from any Source and loads it to any Target.

Any SRC    ↔   Any Target

 

Platform Independent

Designed for one O.S, can be executed

 

– – >Platform generally can be either Software or Hardware.

Screenshot_22

 

 

 

– – > In Data stage, Plat form is w. r. t Hardware.

Hardware environment 

  • Uni processing Environment

 

Hard disk à CPU – – > RAM

 

  • Symmetric Multi Processing : – (SMP)

 

Screenshot_23

can have 32–64 CPU that is Hard disk with multiple CPU‘S

 

Massively Parallel processing  :-  (MPP)

Screenshot_24

  • Collection of different SMPS

 

Node Configuration

  • Best feature of Data stage
  • It is a technique of creating logical CPUS

 

Node – – > logical CPU (or) instance of (physical) CPU

àIt is a S/W which will Create virtual CPU’S

  • Data Stage is Executed on logical CPU’S
  • TO run a job in Data stage , WE require at least 1 Node.

EX:- ETL

 UNI Process

Hard disk – – > CPU – – > RAM

  • To access 1000 records , it takes 10 mins.

SMP

Screenshot_25

  • To access 1000 records , with 4 CPU’S it takes 2.5 min

                  Node config:

  • Uni Processing – – > Virtual SMF

Screenshot_26

S is not using the max. capabilities of CPU, So Node config. is a S/W Which drives in to different Nodes. That is Boost up the Capabilities & Energy level of CPU

 

  1. Partition parallelism

– – > Horizontal Combining

 

– – > Combining primary rows with Secondary rows w. r. t  Key column values

Screenshot_27

Interested in mastering DataStage? Check out this blog post to learn more DataStage Tutorials

Partitioning

It is  a technique of distributing the records across the nodes , based on partitioning techniques.

Screenshot_28

 

  • In addition , We have 9th technique known as ‘AUTO’

 

NOTE:

  • Partitioning techniques plays an important rules in Performance Tuning

 

Note:-

– – > Key based technique assures that the same key column values are collected at the same partition.

 

EX: –

 EMP

DNO= Primary key

 

E NOE NameDNO
11a10
12b20
13c10
14d30
15e20

 

 

D NO D Name Loc 
10ACEHyd
20MeterSec
30SalesEng

 

When combine, I.e, using horizontal combination

 

Screenshot_29

 

That is Same key column values are collected at the same partion

 

Repatriating

The Portioned data is once again repatriated

Ex:

 

ENameDnoLoc
A10AP
B20TN
C10TN
D20KN
E30TN
F10KN
G20AP

 

Screenshot_30

 

  • Partitioning and Repatriating are automatic process in Data stage

 

Reverse  Partitioning

  • Reverse Partitioning is collecting the data from the nodes.
  • It happens only in 1 Situation that is Parallel to Sequential.

Screenshot_31

 

 

Reverse Partitioning is also called as Collecting

 

Different Collecting Methods

  1. Ordered
  2. Round Robin
  3. Sort – Merge
  4. Auto

 

Pipeline Parallelism

Simultaneously doing the extraction Transforming  and loading jobs.

Pipe link

A channel through which data moves from one stage to another stage

 

Screenshot_32

 

Traditional Batch Processing :- (Server jobs)

Sequential processing

 EX:-  for Suppose , We have 3 instructions

I1 – Fetch (F), De code (D), Execute (E), Write lock (W)

I2 – F, D, E, W

I3 –F,D, E,W

– – > In sequential process

Screenshot_33

 

Parallel Processing

Screenshot_34

 

Running all transactions  parallel

 

T1T2T3T4T5T6T7T8
FDEW
FDEW
FDEW
FDEW
FDEW

 

 Core  difference between Version 7.5 *2 and 8.0.1  of DataStage

 

7.5*28.0.1
  1. 4 client components
  1. DS Designer
  2. DS Director
  3. DS Manger
  4. DS Admin
5 client components

  1. DS Designer
  2. DS Director
  3. DS admin
  4. Web console
OS dependent(OS ;user will be data stage users)OS independent(User can be created at datastage, but one dependent)
File based repository(Folder)Database repository (default is DB/2)
No web based administrationWeb based administration
  1. 2 architecture components
  1. Server
  2. client
5 architecture components

  1. common user interface
  2. common repository
  3. common engine
  4. common connectivity
  5. common shared services
can perform phase 3,4Can perform phase 1,2,3,4
2 tierN tier

 

 

Note:–

Features of Manager in 7.5 *2 , are integrated to designer in 8.0.1

(a) In 7.5 * 2 user id and  used to login for authentication , are created in the O.S, O.S wires will   become D.S users

(b) In 8.0.1, they are created at the Data stage Environment

 

 

  1. (a) Repository

In  7.5 *2 , every thing is Stored in the folder in the form of files

(b)  8.0.1

Data is organized in 2 layers

  1. Global Repository àData base à more security
  2. Local Repository à folder àperformance
  3. In 8.0.1 , admin can work from home that is using web console component

4.(a) In 7.5 * 2 it is 2 –tier

S – – > server

C – – >  machine

 

(b)In 8, We can have multiple Servers / Engine , Only 1 Repository

R- C1, C- C2, E1 – C3, E2 – C4,E3 – C4 ——En – Cn – – > n –tier components can be configured in n no of machines.

 

Client components of 7.5 * 2 and 8.0.1

 7.5*2

Designer

 

  1. Create jobs (Main frame Jobs (MF),  Server jobs  (SJ) Sequence jobs (SJ),Parallel Jobs (PJ)
  2. Compile
  3. Run
  4. Multiple job Compile

 

Director

  1. Views
    • Jobs
    • Status
    • Logs
  2. Monitor
  3. Batch Jobs
  4. unlock jobs
  5. Message Handing
  6. Schedule jobs

 

Manager

 

  • Import / Export
  • Node Configuration

 

Admin

  • Create projects
  • Delete projects
  • Organize project

 

          8.1.0 Desiner

 

  1. Create jobs (Main frame Jobs (MF),  Server jobs  (SJ) Sequence jobs (SJ),Parallel Jobs (PJ)
  2. Compile
  3. Run
  4. Multiple job Compile
  5. Import / Export
  6. Node Configuration
  7. Advanced Find
  8. Performance Analysis
  9. Estimate Resource

 

Director

  1. Views
  2. Jobs
  3. Status
  4. Logs
  5. Monitor
  6. Batch Jobs
  7. unlock jobs
  8. Message Handing
  9. Schedule jobs

 

Admin

 

  1. Create projects
  2.  Delete projects
  3. Organize projects

 

Web Console

  1. Security  Services
  2. Reporting Services
  3. Logging Services
  4. Scheduling Services
  5. Domain Management
  6. Session Management

Information Analyzer :–/ Console for IBM Information Service

Data profiling  (CA, PA, FA, Base line, Cross domain)

For indepth understanding of DataStage click on

Summary
Review Date
Reviewed Item
Introduction to DataStage

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Introduction to DataStage"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Site Disclaimer, Copyright © 2016 - All Rights Reserved.

    Support


    Please leave a message and we'll get back to you soon.

    I agree to be contacted via e-mail.