To overcome the limitations of sequential file, we use Data set

  1. Data set is a parallel processing Stage which is used for staging the data, when we design dependent jobs

(That is Transformation jobs are dependent on extraction jobs.  And loading jobs are dependent on Transformation jobs ).

  1. Data set can accommodate more than 2 GB
  2. Data set stores data in Native format, so no conversion is required
  3. Data lands inside the data stage repository

Types of Data Set

Virtual :-  Data moving through the link is virtual, (temporary)

Persistent :- Data  created with Data set is persistent,  (permanent)

In target file,               C: /data/output. DS àpersistent àpermanent

Alias of Data set

  1. Orchestrate file
  2. Operating System file

Data set files

Data set is not a single file, but has multiple files

  • Descriptor file
  • Data file
  • Control file
  • header file

  1. Descriptor file : -

Schema details and address of data à Structure () table definition

C:/data/file. ds

  1. Data file :-

Contains data in Native format

C:/IBM/Information Server / Server/data set/ file. Ds

  1. Control file (or) header file :-

Resides in operating system.

How to create and recues Data set



  • After compiling à Run
  • Now, copy the Target file path that is .ds file path
  • Go to columnsàSaveàTable DefineràOkàSelect the fileàOk




Here are properties 

Paste the Extraction Target path7Columns àLoad

  • Data is the Extraction Target gets into Transformation Data set.

To view Data set outside the job 

  • Data set cannot be directly copied or deleted because, it has multiple files

  • Organize Data set :- (View, copy, delete)

  • We use Data set utilities for organizing Data.


Dataset utilities

  1. GUI- Data set Management – wind one
  2. CMD - $ orch admin – UNIX7     Command line


Tools 7 Data set Management 7Shows the list of files 7Select the file 7ok 7 Data set Management window opens 7Here we can view data, copy data, delete data.


Data Set version

  • By default 4.1
  • We can create Dataset with whatever version we want
  • Version control can be done using an Environmental variable.




Step 1:

Select the job properties Symbol 7 parameters 7Add  Environmental variable 7 Select APT – WPITE –DS-VERSION –ok7Now Compile and Run 7While RUN, We can Select the version


Data set operators

It does not have an operator generally but uses copy operator

To see the operators of each stage :

Job properties 7 Generated ASH

How to flush Data in the Dataset ?

File set :-  (.fs extension)

File set is a file stage, which is used for staging the data when we design jobs.

7Similarities between Dataset and file set

  • S > 2GB PX
  • S >2GB PX



  1. No reject
  2. Internal use
  3. Native format
  4. Ds
  5. Copy operator
  6. Data is organized as segments.



  1. Rejects
  2. External Applications
  3. Binary format
  4. .fs
  5. Import / export operator
  6. Multiple files


Internal use 

Data that is created with Dataset can be used only for internal use. That is, .ds format is the only w. r. t Dataset

7file set Creation is same as Dataset creations

Sequential Stage Target properties 


Target properties

File =?

File update mode:

  1. a) overwrite
  2. b) Append
  3. c) Create (error if exists)

Create the file, if the target file does not exit


Clean up on failure = True

First line is column name  = False

Reject mode = Continue

Step 1

Select the job properties Symbol 7 parameters 7Add  Environmental variable 7Select APT – CLOBER – OUTPUT 7Now compile and Full 7At Run time

APT – CLOBER – OUTPUT  = False 7 Aborts

APT – CLOBER – OUTPUT  = True 7 create file

Clean up on failure :- (Works with Append mode)

If true = automatically clears partially loaded records, if the job is failed for any restart from the point where it has drooped.

Development and Debug Stage 

Divided into three groups

  1. Stages that generate sample data

    1. row generator
    2. Column generator


Row generator

  It is a development Stage, Which generates Sample data (sys/user defined) And Supports only one output.

Column  generator

   It is a development Stage, Which generates columns with Sample data (sys/user defined) And Supports input and  output.

  1. Satges that pick Sample data

  • Head
  • Tail
  • Sample

  1. Stage that helps in debugging

  • Peek

7Development Stage creates Sample data, Suppose if the

Client does not give data, we create the Structure and sample data.

 Row generator :-  (one input)

7System Generated 


Right click on Row generator 7properties 7No. of Records = 100 7okàcolumn 7load 7Select the file 7 ok 7 view data


Properties 7   No. of Records 7 columns 7 load 7 click on Serial fileàEdit column Meta data window opens 7 Generator 7 type  (cycle / random)

  • Limit

Var char 


Algorithm  = cycle

Value = ?

(Here values takes n, no. of name)

Algorithm  =  Alphabet

String = ? (String data only 1 name, and display each Alphabet )

That is string = abc

Output is :  a




  • Generator
  • Type = cycle 7Increment initial value limit 7part part count

Desired to gain proficiency on DataStage? 

Explore the blog post on DataStage training to become a pro in DataStage.

Type  = Random 7 limit seed signed


  • Generator

  • Eparch

  • Percent invalid

  • Type = cycle Random

  • Use current date.


Column Generator :-  (only 1 input /output)



Column Generator is associated with

  1. Aggregate
  2. Tunneling
  3. Supports

Based on this we can group all Employee into 1 group

Right click on column Generator




With in options


Column method = Explicit


Column to Generator = COMPANEY


Column to Generator = COUNTRY


Click, on output options



Left hold the mouse, and drag it and perform Mapping



View data

[Mapping is needed for column Generator, because it has both input and output]










Double click on country




Algorithm = cycle


Value = IBM



For indepth understanding of DataStage click on