• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Scheduling Jobs in DataStage

Step 1

  • Start DB/2 repository and Data stage Server (In the Task box, we have Green color icon, àRight click àStart)

DataStage Server

 

Start 7Program 7 Web sphere  7 Application server 7 profiles à default 7 start the   server 7 Next click on Web Console  7 we find login page 7 that is Server has Started

  • If server in not started, page cannot be displayed is displayed.

Step 2

Select Designer from Desktop  7 Show an login page 7 Authenticate your self

 User id : Admin

Password : phi , and attach to an appropriate project

Project : INDIA / DWS 7 ok7Display an New Window

Step 3

  • Select parallel jobs from new
  • A designer canvas and palette of parallel jobs is opened
  • Select appropriate Stages from palette
  • Palette is a component which has Short shortcuts of the stages.
  • 8 categories of stages are divided in to 2 groups
  1. Active Stages : ETL –“T”
  2. Passive Stages : ETL – “E” & “L”
  3. General
  4. Data Quality
  5. Data base
  6. Development
  7. File
  8. Processing
  9. Real time
  10. Restructure
  11. Favorites

7In General Annotations are used for comments.

Step 4

  • Drag and Drop required Stage from Palette, and create a Sample job
  • Set the Stage properties.
  • Save, compile and Run
  • Shortcuts :-

Compile –F7

Run – Ctrl +F5

Reading Data From Files

We have 3 types of files.

  1. Txt
  2. CSU
  3. xml

 

.txt:  ( also have different formats)

  1. Fixed width flat files
  2. comma separated files
  3. variable length files
  4. Head & Tail files
  5. Single column format
  6. Space & tab Separated

In this current job , We are using “comma separated format”.

We are using “Sequential file”

Sequential file – to read the file

When we read the Sequential file, we have to follow 4 things

  • File Name
  • Location (paths)
  • Format (comma Structure)
  • Structure (Table Definition or Meat data)
  • IN this jobs, Source is a file , loading the data that is Target is also a file

 

Screenshot_37

  • The naming conversion will be decided the people who prepare transformation
    SQ- PRIMARY –DATA
  • Double click on Sequential file (or) Right click on  Sequential file7We get the properties window 7Select file

 

File –7 C:/data /primary .net

   

Columns 7 (importing the Structure of the text file in to O.S from D.S load in to Repository  T>B)

   

Click load

   

We get an window

   

“Table Definition” 7 it shows the list of files in Tb (Right click on TB )

   

Import Table Definition

   

Sequential file  Definition

   

Browse Directory

 

Ok

  

Select files

   

Select “primary 22 .txt”

   

Import   (click)

   

First  line is column name “Define” Sequential Meta data

   

Ok  now the file imports the location into D.S if we want to view the data click “VIEW DATA”

   

Target file à “Select and drive”  take for ex: D:/

   

File Name  — Sample.txt

   

Save  — jobs  (RC) 7new folder 7 shilpa 7 jot name (or) item

   

Compile 7 run

 

  • Green color indicates that jobs has finished

o/p :– total rows

 

Sequential files stage properties:

It is a file Stage which reads or writes Sequentially or paralley.

Note:

  1. 1 file – Sequential (Reading data from only 1 file)

N file – parallel    (Reading data from more than 1 file)

  1. Sequential Stage supports 1 i/p or 1 o/p and 1 reject link

(That is either 1 input or 1 output , but not both from each sequential file)

– – > IT can  Support i/p + reject link (or) o/p +reject link

Multiple Sources

We have an option to read data from Multiple Source

Structure of file 1 and file 2 should be same

There is an Memory limit for Text files

(That is  As the text file , can accommodate only 2 gb, if we have 5 gb, so we need To have multiple text file)

– – >TO Read the data from Multiple Sources /files,  Sequential file Supports.

“Read Method 2 types”

  1. Specific files 7gives file name specifically
  2. File pattern 7users wild card characters

Ex:- if we select “file name”

C:/data/primary * .txt 7wild card   characters7 Any characters which has multi purpose

That is we have different text files as

Primary 11.txt

Primary 22.net

Primary 11.det

* specific char. Matching

? multiple char matching

# numbers

7Primary is common , for that reason , we use * symbol

C:/ Data primary ? ? txt

After primary Characters or numbers or numbers  occupied.

Handling Format Mismatch Records

“Reject Mode”  à 3 options

Continue – Drop format mismatch record and load rest.

Fail – job aborts if any format mismatch records

Output – captures rejected data, through Reject link

 

7 Reject link is allowed only with Reject link “output option” is Reject  Mode.

Reject Data 

  • Format Mismatch
  • Data type Mismatch
  • Condition Mismatch

Reject link is allowed only with the “output option” is Reject Mode

Missing file Mode: —  (if the file is missed)

  1. Error – fails job if any file is missed.
  2. Ok – continue with rest of the file.

Option :-  (To different which record is Coming from which file including the path)

  1. file Name column 7source 7properties 7option
  2. Get src into tag 7Select file name column 7come to column
  3. Select FNC – add to columns 7Add the at Column Name
  4. Row number columns

1) gives record no. at src to target gets sequence no. of the src at the target , it is integers

  1. Read first rows
  2. Get first ‘n’ records from f1,f2,f3……….fn(that is Read rows = 3, display 1st 3 records from f1,f2,f3……….fn)
  3. Filter – we can use UNIX based comments like group
  • grep “William”  grep   “moon” – case Sensitive

grep –   I “moon” – ignore the case

grep –   I “moon” – exact Matching

grep –   I “moon” –  other than Moon ,(case sensitive)

Interested in mastering DataStage Training?
Check out this blog post to learn more DataStage Tutorials.

Types of links 

Screenshot_38

link Default Colors

7Red – Stage Not Connected property (or) job Aborted

7Blue – In process

7Green – finished

7Black  – Stage is ready to set the properties

Link Marks

Indicates  what data is moved between Stages

Screenshot_39

 

  • Box at SRC 7 SRC Stage ready with Mata Data  .
  • BOX an Tag 7 Parallel to parallel auto Partitioning

 

Screenshot_40

 

  1. W.r.to Data stage :-
  • Compilation is a process of Converting GUI in to its machine code , in Process it checks link requirements , Mandatory Stage properties Stage  properties and logical  errors

 

Screenshot_41

 

 

Lab Exercise

In file  pattern , What are the options that are missing ?

Sol : Options

First line is column Name  = false

Keep file partitions = false

Reject Mode = continue

Report progress = yes

Properties 

File Name column Row Number column Schema file

Specific files 

Options 

First line is column Name  = false

Keep file partitions = false

Missing File Mode = Depends

Reject Mode = continue

Report progress = yes

Properties

  • File Name column
  • Filter
  • of Record deeper Node
  • Read first Rows
  • Read from multiple nodes
  • Row number column
  • Schema file

SRC = 1 file (10 records) , Tag = 2 files, how the output will at the target.

Sol:  Output will come in 2 files with alternate records  .

That is if we have to records, we get 5 in each target  file

Example:-    Support , we have 10 records will 1,2,3,4…………..10

The output will be

Target 1  1,3,5,7,9

Target 2 2,4,6,8,10

SRC = 2 files , Tag = 2 files, output = ?

Sol :  1st SRC file in to 1st target file

2st SRC file in to 2st target file

SRC = 2 files , Tag = 1 files, output = ?

File 1 = c id, c name, address  72 records

File 2 = c id, c name, address  , country 72 record

Sol :  When we load file for the first – time

File 1

C idC nameAddress
1abcHYD
2defBNG

File 2

C id C nameAddressCountry
3grichneIndia
4shaknkIndia

While file 1 loads , output will be 4 rows

C idC nameaddress
1abcHYd
2defBNG
3griChne, India
4shaKNK, India

That is , c id , c name, Address in file is these in file 2(file 1 Structure is there file 2, so it display records from 2 tables)

– – > While file 2 Is loaded, output will be 2 rows

C id C nameAddressCountry
3grichneIndia
4shaknkIndia

That is  cid ,c name, Address ,  in file 2 , are not there in file 1, to it does not consider file 2.

That is file 2 structure  is not there in file 1   structure  so file 1 is discarded and display only File records.

For indepth understanding of DataStage click on

Summary
Review Date
Reviewed Item
Scheduling Jobs in DataStage
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Scheduling Jobs in DataStage"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Site Disclaimer, Copyright © 2016 - All Rights Reserved.

    Support


    Please leave a message and we'll get back to you soon.

    I agree to be contacted via e-mail.