Types of Data Sets in Data Stage

Ratings:
(4)
Views:0
Banner-Img
  • Share this blog:

To overcome the limitations of sequential file, we use Data set

  1. Data set is a parallel processing Stage which is used for staging the data, when we design dependent jobs

(That is Transformation jobs are dependent on extraction jobs.  And loading jobs are dependent on Transformation jobs ).

  1. Data set can accommodate more than 2 GB
  2. Data set stores data in Native format, so no conversion is required
  3. Data lands inside the data stage repository

Types of Data Set

Virtual :-  Data moving through the link is virtual, (temporary) Persistent :- Data  created with Data set is persistent,  (permanent) In target file,               C: /data/output. DS àpersistent àpermanent

Alias of Data set

  1. Orchestrate file
  2. Operating System file

Data set files

Data set is not a single file, but has multiple files

  • Descriptor file
  • Data file
  • Control file
  • header file
  1. Descriptor file : -

Schema details and address of data à Structure () table definition C:/data/file. ds

  1. Data file :-

Contains data in Native format C:/IBM/Information Server / Server/data set/ file. Ds

  1. Control file (or) header file :-

Resides in operating system.

How to create and recues Data set

Extraction: Screenshot_45

  • After compiling à Run
  • Now, copy the Target file path that is .ds file path
  • Go to columnsàSaveàTable DefineràOkàSelect the fileàOk

Transformation Screenshot_46   Here are properties  Paste the Extraction Target path7Columns àLoad

  • Data is the Extraction Target gets into Transformation Data set.

To view Data set outside the job 

  • Data set cannot be directly copied or deleted because, it has multiple files
  • Organize Data set :- (View, copy, delete)
  • We use Data set utilities for organizing Data.

  Dataset utilities

  1. GUI- Data set Management – wind one
  2. CMD - $ orch admin – UNIX7     Command line

  Tools 7 Data set Management 7Shows the list of files 7Select the file 7ok 7 Data set Management window opens 7Here we can view data, copy data, delete data.   Data Set version

  • By default 4.1
  • We can create Dataset with whatever version we want
  • Version control can be done using an Environmental variable.

  Screenshot_47   Step 1: Select the job properties Symbol 7 parameters 7Add  Environmental variable 7 Select APT – WPITE –DS-VERSION –ok7Now Compile and Run 7While RUN, We can Select the version   Data set operators It does not have an operator generally but uses copy operator To see the operators of each stage : Job properties 7 Generated ASH How to flush Data in the Dataset ? File set :-  (.fs extension) File set is a file stage, which is used for staging the data when we design jobs. 7Similarities between Dataset and file set

  • S > 2GB PX
  • S >2GB PX

Differences D.S

  1. No reject
  2. Internal use
  3. Native format
  4. Ds
  5. Copy operator
  6. Data is organized as segments.

  F.S 

  1. Rejects
  2. External Applications
  3. Binary format
  4. .fs
  5. Import / export operator
  6. Multiple files

  Internal use  Data that is created with Dataset can be used only for internal use. That is, .ds format is the only w. r. t Dataset 7file set Creation is same as Dataset creations Sequential Stage Target properties  Screenshot_48 Target properties File =? File update mode:

  1. a) overwrite
  2. b) Append
  3. c) Create (error if exists)

Create the file, if the target file does not exit Options Clean up on failure = True First line is column name  = False Reject mode = Continue Step 1 Select the job properties Symbol 7 parameters 7Add  Environmental variable 7Select APT – CLOBER – OUTPUT 7Now compile and Full 7At Run time APT – CLOBER – OUTPUT  = False 7 Aborts APT – CLOBER – OUTPUT  = True 7 create file Clean up on failure :- (Works with Append mode) If true = automatically clears partially loaded records, if the job is failed for any restart from the point where it has drooped. Development and Debug Stage  Divided into three groups

  1. Stages that generate sample data
    1. row generator
    2. Column generator

  Row generator   It is a development Stage, Which generates Sample data (sys/user defined) And Supports only one output. Column  generator    It is a development Stage, Which generates columns with Sample data (sys/user defined) And Supports input and  output.

  1. Satges that pick Sample data
  • Head
  • Tail
  • Sample
  1. Stage that helps in debugging
  • Peek

7Development Stage creates Sample data, Suppose if the Client does not give data, we create the Structure and sample data.  Row generator :-  (one input) 7System Generated  Screenshot_49 Right click on Row generator 7properties 7No. of Records = 100 7okàcolumn 7load 7Select the file 7 ok 7 view data User-defined Properties 7   No. of Records 7 columns 7 load 7 click on Serial fileàEdit column Meta data window opens 7 Generator 7 type  (cycle / random)

  • Limit

Var char  Generator Algorithm  = cycle Value = ? (Here values takes n, no. of name) Algorithm  =  Alphabet String = ? (String data only 1 name, and display each Alphabet ) That is string = abc Output is :  a b c Integer 

  • Generator
  • Type = cycle 7Increment initial value limit 7part part count
Desired to gain proficiency on DataStage? 
Explore the blog post on DataStage training to become a pro in DataStage.

Type  = Random 7 limit seed signed Data 

  • Generator
  • Eparch
  • Percent invalid
  • Type = cycle Random
  • Use current date.

  Column Generator :-  (only 1 input /output) Screenshot_50   Column Generator is associated with

  1. Aggregate
  2. Tunneling
  3. Supports

Based on this we can group all Employee into 1 group Right click on column Generator

      ↓

Properties

    ↓

With in options

    ↓

Column method = Explicit

    ↓

Column to Generator = COMPANEY

    ↓

Column to Generator = COUNTRY

    ↓

Click, on output options

    ↓

  Left hold the mouse, and drag it and perform Mapping

    ↓

Ok View data [Mapping is needed for column Generator, because it has both input and output] User-defined  C.G

    ↓

Properties

    ↓

Output

  ↓

Columns

  ↓

Double click on country

  ↓

Generator

  ↓

Algorithm = cycle

  ↓

Value = IBM

  ↓

Ok

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.