• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Types of Data Sets in Data Stage

To overcome the limitations of sequential file, we use Data set

  1. Data set is a parallel processing Stage which is used for staging the data , when we design dependent jobs

(That is Transformation jobs are dependent on extraction jobs.  And loading jobs are dependent on Transformation jobs ).

  1. Data set can accommodate more than 2 GB
  2. Data set stores data in Native format , so no conversion is required
  3. Data lands inside the data stage repository

Types of Data Set

Virtual :-  Data moving through the link is virtual , (temporary)

Persistent :- Data  created with Data set is persistent,  (permanent)

In target file,               C: /data/output. DS àpersistent àpermanent

Alias of Data set

  1. Orchestrate file
  2. Operating System file

Data set files

Data set is not a single file, but has multiple files

  • Descriptor file
  • Data file
  • Control file
  • header file
  1. Descriptor file : –

Schema details and address of data à Structure () table definition

C:/data/file. ds

  1. Data file :-

Contains data in Native format

C:/IBM/Information Server / Server/data set/ file. Ds

  1. Control file (or) header file :-

Resides in operating system.

How to create and recues Data set

Extraction:

Screenshot_45

  • After compiling à Run
  • Now , copy the Target file path that is .ds file path
  • Go to columnsàSaveàTable DefineràOkàSelect the fileàOk

Transformation

Screenshot_46

 

Here is properties 

Paste the Extraction Target path7Columns àLoad

  • Data is the Extraction Target gets in to Transformation Data set.

To view Data set outside the job 

  • Data set cannot be directly copied or deleted because , it has multiple files
  • Organize Data set :- (View, copy, delete)
  • We use Data set utilities for organizing Data.

 

Dataset utilities

  1. GUI- Data set Management – wind one
  2. CMD – $ orch admin – UNIX7     Command line

 

Tools 7 Data set Management 7Shows the list of files 7Select the file 7ok 7 Data set Management window opens 7Here we can view data , copy data , delete data.

 

Data Set version

  • By default 4.1
  • We can create Dataset with what ever version we want
  • Version control can be done using an Environmental variable.

 

Screenshot_47

 

Step 1:

Select the job properties Symbol 7 parameters 7Add  Environmental variable 7 Select APT – WPITE –DS-VERSION –ok7Now Compile and Run 7While RUN, We can Select the version

 

Data set operators

It does not have an operator generally but uses copy operator

To see the operators of each stage :

Job properties 7 Generated ASH

How to flush Data in the Dataset ?

File set :-  (.fs extension)

File set is a file stage, which is used for staging the data when we design jobs.

7Similarities between Dataset and file set

  • S > 2GB PX
  • S >2GB PX

Differences

D.S

  1. No reject
  2. Internal use
  3. Native format
  4. Ds
  5. Copy operator
  6. Data is organized as segments.

 

F.S 

  1. Rejects
  2. External Applications
  3. Binary format
  4. .fs
  5. Import / export operator
  6. Multiple files

 

Internal use 

Data that is created with Dataset can be used only for  internal use. That is, .ds format is only w. r. t Dataset

7file set Creation is same as Dataset creations
Sequential Stage Target properties 

Screenshot_48

Target properties

File =?

File update mode:

  1. a) overwrite
  2. b) Append
  3. c) Create (error if exists)

Create the file , if the target file does not exit

Options

Clean up on failure = True

First line is column name  = False

Reject mode = Continue

Step 1

Select the job properties Symbol 7 parameters 7Add  Environmental variable 7Select APT – CLOBER – OUTPUT 7Now compile and Full 7At Run time

APT – CLOBER – OUTPUT  = False 7 Aborts

APT – CLOBER – OUTPUT  = True 7 create file

Clean up on failure :- (Works with Append mode)

If true = automatically clears partially loaded records, if the job is failed for any restart from the point where it has drooped.

Development and Debug Stage 

Divided  in to three groups

  1. Stages that generate sample data
    1. row generator
    2. Column generator

 

Row generator

  It is a development Stage , Which generates Sample data   (sys/user defined) And Supports only one output.

Column  generator

   It is a development Stage , Which generates columns with  Sample data   (sys/user defined)   And Supports input and  output.

  1. Satges that pick Sample data
  • Head
  • Tail
  • Sample
  1. Stage that helps in debugging
  • Peek

7Development Stage creates Sample data , Suppose if the

Client does not give data, we create the Structure and sample data.

 Row generator :-  (one input)

7System Generated 

Screenshot_49

Right click on Row generator 7properties 7No. of Records = 100 7okàcolumn 7load 7Select the file 7 ok 7 view data

User defined

Properties 7   No. of Records 7 columns 7 load 7 click on Serial fileàEdit column Meta data window opens 7 Generator 7 type  (cycle / random)

  • Limit

Var char 

Generator

Algorithm  = cycle

Value = ?

(Here values takes n, no. of name)

Algorithm  =  Alphabet

String = ? (String data only 1 name , and display each  Alphabet )

That is string = abc

Output is :  a

b

c

Integer 

  • Generator
  • Type = cycle 7Increment initial value limit 7part part count
Desired to gain proficiency on DataStage? 
Explore the blog post on DataStage training to become a pro in DataStage.

Type  = Random 7 limit seed signed

Data 

  • Generator
  • Eparch
  • Percent invalid
  • Type = cycle Random
  • Use current date.

 

Column Generator :-  (only 1 input /output)

Screenshot_50

 

Column Generator is associated with

  1. Aggregate
  2. Tunneling
  3. Supports

Based on this we can group all Employee in to 1 group

Right click on column Generator

      ↓

Properties

    ↓

With in options

    ↓

Column method = Explicit

    ↓

Column to Generator = COMPANEY

    ↓

Column to Generator = COUNTRY

    ↓

Click , on output options

    ↓

 

Left hold the mouse , and drag it and perform Mapping

    ↓

Ok

View data

[Mapping is need for column Generator , because it has both input and output]

User defined 

C.G

    ↓

Properties

    ↓

Output

  ↓

Columns

  ↓

Double click on country

  ↓

Generator

  ↓

Algorithm = cycle

  ↓

Value = IBM

  ↓

Ok

For indepth understanding of DataStage click on

Summary
Review Date
Reviewed Item
Types of Data Sets in Data Stage
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Types of Data Sets in Data Stage"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Site Disclaimer, Copyright © 2016 - All Rights Reserved.

    Support


    Please leave a message and we'll get back to you soon.

    I agree to be contacted via e-mail.