To overcome the limitations of sequential file, we use Data set
- Data set is a parallel processing Stage which is used for staging the data, when we design dependent jobs
(That is Transformation jobs are dependent on extraction jobs. And loading jobs are dependent on Transformation jobs ).
- Data set can accommodate more than 2 GB
- Data set stores data in Native format, so no conversion is required
- Data lands inside the data stage repository
Types of Data Set
Virtual :- Data moving through the link is virtual, (temporary)
Persistent :- Data created with Data set is persistent, (permanent)
In target file, C: /data/output. DS àpersistent àpermanent
Alias of Data set
- Orchestrate file
- Operating System file
Data set files
Data set is not a single file, but has multiple files
- Descriptor file
- Data file
- Control file
- header file
- Descriptor file : –
Schema details and address of data à Structure () table definition
- Data file :-
Contains data in Native format
C:/IBM/Information Server / Server/data set/ file. Ds
- Control file (or) header file :-
Resides in operating system.
How to create and recues Data set
- After compiling à Run
- Now, copy the Target file path that is .ds file path
- Go to columnsàSaveàTable DefineràOkàSelect the fileàOk
Here are properties
Paste the Extraction Target pathColumns àLoad
- Data is the Extraction Target gets into Transformation Data set.
To view Data set outside the job
- Data set cannot be directly copied or deleted because, it has multiple files
- Organize Data set :- (View, copy, delete)
- We use Data set utilities for organizing Data.
- GUI- Data set Management – wind one
- CMD – $ orch admin – UNIX Command line
Tools Data set Management Shows the list of files Select the file ok Data set Management window opens Here we can view data, copy data, delete data.
Data Set version
- By default 4.1
- We can create Dataset with whatever version we want
- Version control can be done using an Environmental variable.
Select the job properties Symbol parameters Add Environmental variable Select APT – WPITE –DS-VERSION –okNow Compile and Run While RUN, We can Select the version
Data set operators
It does not have an operator generally but uses copy operator
To see the operators of each stage :
Job properties Generated ASH
How to flush Data in the Dataset ?
File set :- (.fs extension)
File set is a file stage, which is used for staging the data when we design jobs.
Similarities between Dataset and file set
- S > 2GB PX
- S >2GB PX
- No reject
- Internal use
- Native format
- Copy operator
- Data is organized as segments.
- External Applications
- Binary format
- Import / export operator
- Multiple files
Data that is created with Dataset can be used only for internal use. That is, .ds format is the only w. r. t Dataset
file set Creation is same as Dataset creations
Sequential Stage Target properties
File update mode:
- a) overwrite
- b) Append
- c) Create (error if exists)
Create the file, if the target file does not exit
Clean up on failure = True
First line is column name = False
Reject mode = Continue
Select the job properties Symbol parameters Add Environmental variable Select APT – CLOBER – OUTPUT Now compile and Full At Run time
APT – CLOBER – OUTPUT = False Aborts
APT – CLOBER – OUTPUT = True create file
Clean up on failure :- (Works with Append mode)
If true = automatically clears partially loaded records, if the job is failed for any restart from the point where it has drooped.
Development and Debug Stage
Divided into three groups
- Stages that generate sample data
- row generator
- Column generator
It is a development Stage, Which generates Sample data (sys/user defined) And Supports only one output.
It is a development Stage, Which generates columns with Sample data (sys/user defined) And Supports input and output.
- Satges that pick Sample data
- Stage that helps in debugging
Development Stage creates Sample data, Suppose if the
Client does not give data, we create the Structure and sample data.
Row generator :- (one input)
Right click on Row generator properties No. of Records = 100 okàcolumn load Select the file ok view data
Properties No. of Records columns load click on Serial fileàEdit column Meta data window opens Generator type (cycle / random)
Algorithm = cycle
Value = ?
(Here values takes n, no. of name)
Algorithm = Alphabet
String = ? (String data only 1 name, and display each Alphabet )
That is string = abc
Output is : a
- Type = cycle Increment initial value limit part part count
Desired to gain proficiency on DataStage? Explore the blog post on DataStage training to become a pro in DataStage.
Type = Random limit seed signed
- Percent invalid
- Type = cycle Random
- Use current date.
Column Generator :- (only 1 input /output)
Column Generator is associated with
Based on this we can group all Employee into 1 group
Right click on column Generator
With in options
Column method = Explicit
Column to Generator = COMPANEY
Column to Generator = COUNTRY
Click, on output options
Left hold the mouse, and drag it and perform Mapping
[Mapping is needed for column Generator, because it has both input and output]
Double click on country
Algorithm = cycle
Value = IBM
For indepth understanding of DataStage click on