Home
Blog
IBM DataStage
Types of Data Sets in Data Stage

Types of Data Sets in Data Stage

Ratings:

(4)Views: 0

Share this blog:

To overcome the limitations of sequential file, we use Data set

Data set is a parallel processing Stage which is used for staging the data, when we design dependent jobs

(That is Transformation jobs are dependent on extraction jobs. And loading jobs are dependent on Transformation jobs ).

Data set can accommodate more than 2 GB
Data set stores data in Native format, so no conversion is required
Data lands inside the data stage repository

Types of Data Set

Virtual :- Data moving through the link is virtual, (temporary) Persistent :- Data created with Data set is persistent, (permanent) In target file, C: /data/output. DS àpersistent àpermanent

Alias of Data set

Orchestrate file
Operating System file

Data set files

Data set is not a single file, but has multiple files

Descriptor file
Data file
Control file
header file

Descriptor file : -

Schema details and address of data à Structure () table definition C:/data/file. ds

Data file :-

Contains data in Native format C:/IBM/Information Server / Server/data set/ file. Ds

Control file (or) header file :-

Resides in operating system.

How to create and recues Data set

Extraction: Screenshot_45

After compiling à Run
Now, copy the Target file path that is .ds file path
Go to columnsàSaveàTable DefineràOkàSelect the fileàOk

Transformation Screenshot_46 Here are properties Paste the Extraction Target pathColumns àLoad

Data is the Extraction Target gets into Transformation Data set.

To view Data set outside the job

Data set cannot be directly copied or deleted because, it has multiple files

Organize Data set :- (View, copy, delete)

We use Data set utilities for organizing Data.

Dataset utilities

GUI- Data set Management – wind one
CMD - $ orch admin – UNIX Command line

Tools Data set Management Shows the list of files Select the file ok Data set Management window opens Here we can view data, copy data, delete data. Data Set version

By default 4.1
We can create Dataset with whatever version we want
Version control can be done using an Environmental variable.

Screenshot_47 Step 1: Select the job properties Symbol parameters Add Environmental variable Select APT – WPITE –DS-VERSION –okNow Compile and Run While RUN, We can Select the version Data set operators It does not have an operator generally but uses copy operator To see the operators of each stage : Job properties Generated ASH How to flush Data in the Dataset ? File set :- (.fs extension) File set is a file stage, which is used for staging the data when we design jobs. Similarities between Dataset and file set

S > 2GB PX
S >2GB PX

Differences D.S

No reject
Internal use
Native format
Ds
Copy operator
Data is organized as segments.

F.S

Rejects
External Applications
Binary format
.fs
Import / export operator
Multiple files

Internal use Data that is created with Dataset can be used only for internal use. That is, .ds format is the only w. r. t Dataset file set Creation is same as Dataset creations Sequential Stage Target properties Screenshot_48 Target properties File =? File update mode:

a) overwrite
b) Append
c) Create (error if exists)

Create the file, if the target file does not exit Options Clean up on failure = True First line is column name = False Reject mode = Continue Step 1 Select the job properties Symbol parameters Add Environmental variable Select APT – CLOBER – OUTPUT Now compile and Full At Run time APT – CLOBER – OUTPUT = False Aborts APT – CLOBER – OUTPUT = True create file Clean up on failure :- (Works with Append mode) If true = automatically clears partially loaded records, if the job is failed for any restart from the point where it has drooped. Development and Debug Stage Divided into three groups

Stages that generate sample data
1. row generator
2. Column generator

Row generator It is a development Stage, Which generates Sample data (sys/user defined) And Supports only one output. Column generator It is a development Stage, Which generates columns with Sample data (sys/user defined) And Supports input and output.

Satges that pick Sample data

Head
Tail
Sample

Stage that helps in debugging

Peek

Development Stage creates Sample data, Suppose if the Client does not give data, we create the Structure and sample data. Row generator :- (one input) System Generated Screenshot_49 Right click on Row generator properties No. of Records = 100 okàcolumn load Select the file ok view data User-defined Properties No. of Records columns load click on Serial fileàEdit column Meta data window opens Generator type (cycle / random)

Limit

Var char Generator Algorithm = cycle Value = ? (Here values takes n, no. of name) Algorithm = Alphabet String = ? (String data only 1 name, and display each Alphabet ) That is string = abc Output is : a b c Integer

Generator
Type = cycle Increment initial value limit part part count

Desired to gain proficiency on DataStage? 
Explore the blog post on DataStage training to become a pro in DataStage.

Type = Random limit seed signed Data

Generator

Eparch

Percent invalid

Type = cycle Random

Use current date.

Column Generator :- (only 1 input /output) Screenshot_50 Column Generator is associated with

Aggregate
Tunneling
Supports

Based on this we can group all Employee into 1 group Right click on column Generator

↓

Properties

↓

With in options

↓

Column method = Explicit

↓

Column to Generator = COMPANEY

↓

Column to Generator = COUNTRY

↓

Click, on output options

↓

Left hold the mouse, and drag it and perform Mapping

↓

Ok View data [Mapping is needed for column Generator, because it has both input and output] User-defined C.G

↓

Properties

↓

Output

↓

Columns

↓

Double click on country

↓

Generator

↓

Algorithm = cycle

↓

Value = IBM

↓

You liked the article?

Like: 0

Vote for difficulty

Current difficulty (Avg): Medium

EasyMediumHardDifficultExpert

IMPROVE ARTICLEReport Issue

Recommended Courses

DataStage Training 4.95789 : Learners

Teradata Training 4.92763 : Learners

1/6

About Author

Name

TekSlate

Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.

Stay Updated

Get stories of change makers and innovators from the startup ecosystem in your inbox

Related Blogs