Step 1

  • Start DB/2 repository and Data stage Server (In the Task box, we have a Green color icon, àRight click àStart)

DataStage Server  

Start 7Program 7 Web sphere  7 Application server 7 profiles à default 7 start the   server 7 Next click on Web Console  7 we find the login page 7 that is Server has Started

  • If the server is not started, the page cannot be displayed is displayed.

Step 2

Select Designer from Desktop  7 Show a login page 7 Authenticate your self  

User id: Admin

Password: phi, and attach to an appropriate project

Project: INDIA / DWS 7 ok7Display a New Window

Step 3

  • Select parallel jobs from new
  • A designer canvas and palette of parallel jobs is opened
  • Select appropriate Stages from the palette
  • Palette is a component that has Short shortcuts of the stages.
  • 8 categories of stages are divided into 2 groups

  1. Active Stages: ETL –“T”
  2. Passive Stages: ETL – “E” & “L”
  3. General
  4. Data Quality
  5. Database
  6. Development
  7. File
  8. Processing
  9. Real-time
  10. Restructure
  11. Favorites

7In General, Annotations are used for comments.

Step 4

  • Drag and Drop required Stage from Palette, and create a Sample job
  • Set the Stage properties.
  • Save, Compile and Run
  • Shortcuts:-

Compile –F7 Run – Ctrl +F5

Reading Data From Files

We have 3 types of files.

  1. Txt
  2. CSU
  3. XML

  .txt:  ( also have different formats)

  1. Fixed width flat files
  2. comma-separated files
  3. variable length files
  4. Head & Tail files
  5. Single column format
  6. Space & tab Separated

In this current job, We are using a “comma-separated format”. We are using “Sequential file” Sequential file – to read the file When we read the Sequential file, we have to follow 4 things

  • File Name
  • Location (paths)
  • Format (comma Structure)
  • Structure (Table Definition or Meat data)
  • IN this jobs, Source is a file, loading the data that is Target is also a file

  Target file

  • The naming conversion will be decided by the people who prepare transformation SQ- PRIMARY –DATA
  • Double click on Sequential file (or) Right-click on  Sequential file7We get the properties window 7Select file

  File -7 C:/data /primary .net

   

Columns 7 (importing the Structure of the text file into O.S from D.S load into Repository  T>B)

   

Click load

   

We get a window

   

“Table Definition” 7 it shows the list of files in Tb (Right-click on TB )

   

Import Table Definition

   

Sequential file  Definition

   

Browse Directory

 

Ok

  

Select files

   

Select “primary 22 .txt”

   

Import   (click)

   

The first line is the column name “Define” Sequential Metadata

   

Ok  now the file imports the location into D.S if we want to view the data click “VIEW DATA”

   

Target file à “Select and drive”  take forex: D:/

   

File Name  -- Sample.txt

   

Save  -- jobs  (RC) 7new folder 7 Shilpa 7 jot name (or) item

   

Compile 7 run  

  • Green color indicates that jobs have finished

o/p:- total rows  

Sequential files stage properties:

It is a file Stage which reads or writes Sequentially or parallels.

Note:

  1. 1 file – Sequential (Reading data from only 1 file)

N file – parallel    (Reading data from more than 1 file)

  1. Sequential Stage supports 1 i/p or 1 o/p and 1 reject link

(That is either 1 input or 1 output, but not both from each sequential file) - - > IT can  Support i/p + reject link (or) o/p +reject link

Multiple Sources

We have an option to read data from Multiple Source Structure of file 1 and file 2 should be the same There is a Memory limit for Text files (That is  As the text file, can accommodate only 2 GB, if we have 5 GB, so we need To have multiple text file) - - >TO Read the data from Multiple Sources /files,  Sequential file Supports. “Read Method 2 types”

  1. Specific files 7give file name specifically
  2. File pattern 7users wild card characters

Ex:- if we select “filename” C:/data/primary * .txt 7wild card characters7 Any characters which have multi-purpose That is we have different text files as Primary 11.txt Primary 22.net Primary 11.det * specific char. Matching? multiple char matching # numbers 7Primary is common, for that reason, we use * symbol C:/ Data primary ? ? txt After primary Characters or numbers or numbers occupied.

Handling Format Mismatch Records

“Reject Mode”  à 3 options Continue - Drop format mismatch record and load rest. Fail - job aborts if any format mismatch records Output - captures rejected data, through Reject link   7 Reject link is allowed only with Reject link “output option” is Reject  Mode.

Reject Data 

  • Format Mismatch
  • Data type Mismatch
  • Condition Mismatch

Reject link is allowed only with the “output option” is Reject Mode

Missing file Mode: ---  (if the file is missed)

  1. Error – fails job if any file is missed.
  2. Ok – continue with the rest of the file.

Option:-  (To different which record is Coming from which file including the path)

  1. fileName column 7source 7properties 7option
  2. Get src into tag 7Select file name column 7come to the column
  3. Select FNC – add to columns 7Add the at Column Name
  4. Row number columns

1) gives record no. at src to target gets sequence no. of the src at the target, it is integers

  1. Read first rows
  2. Get first ‘n’ records from f1,f2,f3……….fn(that is Read rows = 3, display 1st 3 records from f1,f2,f3……….fn)
  3. Filter – we can use UNIX based comments like a group

  • grep “William”  grep   “moon” – case Sensitive

grep –   I “moon” – ignore the case

grep –   I “moon” – exact Matching

grep –   I “moon” –  other than Moon,(case sensitive)

Inclined to build a profession as Datastage Developer? Then here is the blog post on, explore Datastage Training

Types of links

  Link Types

link Default Colors

7Red – Stage Not Connected property (or) job Aborted

7Blue – In process

7Green – finished

7Black  - Stage is ready to set the properties

Link Marks

Indicates  what data is moved between Stages

Link Marks  

  • Box at SRC 7 SRC Stage ready with Mata Data.
  • BOX a Tag 7 Parallel to parallel auto Partitioning

  Link mark  

W.r.to Data stage :-

  • The compilation is a process of Converting GUI into its machine code, in Process, it checks link requirements, Mandatory Stage properties, and logical  errors

  Datastage    

Lab Exercise

In the file pattern, What are the options that are missing ?

Sol: Options

First line is column Name  = false

Keep file partitions = false

Reject Mode = continue

Report progress = yes

Properties  File Name column Row Number column Schema file

Specific files 

Options 

First line is column Name  = false

Keep file partitions = false

Missing File Mode = Depends

Reject Mode = continue

Report progress = yes

Properties

  • File Name column
  • Filter
  • of Record deeper Node
  • Read first Rows
  • Read from multiple nodes
  • Row number column
  • Schema file

SRC = 1 file (10 records) , Tag = 2 files, how the output will at the target.

Sol:  Output will come in 2 files with alternate records  .

That is if we have to records, we get 5 in each target  file

Example:-    Support, we have 10 records will 1,2,3,4…………..10

The output will be

Target 1  1,3,5,7,9

Target 2 2,4,6,8,10 SRC = 2 files , Tag = 2 files, output = ?

Sol:  1st SRC file into 1st target file

2st SRC file into 2nd target file

SRC = 2 files , Tag = 1 files, output = ?

File 1 = c id, c name, address  72 records

File 2 = c id, c name, address, country 72 record

Sol:  When we load a file for the first – time

File 1

C idC nameAddress
1abcHYD
2defBNG

File 2

C id C nameAddressCountry
3grichneIndia
4shaknkIndia

While file 1 loads, the output will be 4 rows

C idC nameaddress
1abcHYd
2defBNG
3griChne, India
4shaKNK, India

That is, c id, c name, Address in the file is these in file 2(file 1 Structure is there file 2, so it display records from 2 tables)

- - > While file 2 Is loaded, the output will be 2 rows

C id C nameAddressCountry
3grichneIndia
4shaknkIndia

That is cid,c name, Address,  in file 2, are not there in file 1, to it does not consider file 2.

That is file 2 structure is not there in file 1   structure so file 1 is discarded and display only File records.

For an in-depth understanding of DataStage click on