Step 1
- Start DB/2 repository and Data stage Server (In the Task box, we have a Green color icon, àRight click àStart)
DataStage Server
Start Program
Web sphere
Application server
profiles à default
start the server
Next click on Web Console
we find the login page
that is Server has Started
- If the server is not started, the page cannot be displayed is displayed.
Step 2
Select Designer from Desktop Show a login page
Authenticate your self
User id: Admin
Password: phi, and attach to an appropriate project
Project: INDIA / DWS ok
Display a New Window
Step 3
- Select parallel jobs from new
- A designer canvas and palette of parallel jobs is opened
- Select appropriate Stages from the palette
- Palette is a component that has Short shortcuts of the stages.
- 8 categories of stages are divided into 2 groups
- Active Stages: ETL –“T”
- Passive Stages: ETL – “E” & “L”
- General
- Data Quality
- Database
- Development
- File
- Processing
- Real-time
- Restructure
- Favorites
In General, Annotations are used for comments.
Step 4
- Drag and Drop required Stage from Palette, and create a Sample job
- Set the Stage properties.
- Save, Compile and Run
- Shortcuts:-
Compile –F7 Run – Ctrl +F5
Reading Data From Files
We have 3 types of files.
- Txt
- CSU
- XML
.txt: ( also have different formats)
- Fixed width flat files
- comma-separated files
- variable length files
- Head & Tail files
- Single column format
- Space & tab Separated
In this current job, We are using a “comma-separated format”. We are using “Sequential file” Sequential file – to read the file When we read the Sequential file, we have to follow 4 things
- File Name
- Location (paths)
- Format (comma Structure)
- Structure (Table Definition or Meat data)
- IN this jobs, Source is a file, loading the data that is Target is also a file
- The naming conversion will be decided by the people who prepare transformation SQ- PRIMARY –DATA
- Double click on Sequential file (or) Right-click on Sequential file
We get the properties window
Select file
File - C:/data /primary .net
↓
Columns (importing the Structure of the text file into O.S from D.S load into Repository T>B)
↓
Click load
↓
We get a window
↓
“Table Definition” it shows the list of files in Tb (Right-click on TB )
↓
Import Table Definition
↓
Sequential file Definition
↓
Browse Directory
↓
Ok
↓
Select files
↓
Select “primary 22 .txt”
↓
Import (click)
↓
The first line is the column name “Define” Sequential Metadata
↓
Ok now the file imports the location into D.S if we want to view the data click “VIEW DATA”
↓
Target file à “Select and drive” take forex: D:/
↓
File Name -- Sample.txt
↓
Save -- jobs (RC) new folder
Shilpa
jot name (or) item
↓
Compile run
- Green color indicates that jobs have finished
o/p:- total rows
Sequential files stage properties:
It is a file Stage which reads or writes Sequentially or parallels.
Note:
- 1 file – Sequential (Reading data from only 1 file)
N file – parallel (Reading data from more than 1 file)
- Sequential Stage supports 1 i/p or 1 o/p and 1 reject link
(That is either 1 input or 1 output, but not both from each sequential file) - - > IT can Support i/p + reject link (or) o/p +reject link
Multiple Sources
We have an option to read data from Multiple Source Structure of file 1 and file 2 should be the same There is a Memory limit for Text files (That is As the text file, can accommodate only 2 GB, if we have 5 GB, so we need To have multiple text file) - - >TO Read the data from Multiple Sources /files, Sequential file Supports. “Read Method 2 types”
- Specific files
give file name specifically
- File pattern
users wild card characters
Ex:- if we select “filename” C:/data/primary * .txt wild card characters
Any characters which have multi-purpose That is we have different text files as Primary 11.txt Primary 22.net Primary 11.det * specific char. Matching? multiple char matching # numbers
Primary is common, for that reason, we use * symbol C:/ Data primary ? ? txt After primary Characters or numbers or numbers occupied.
Handling Format Mismatch Records
“Reject Mode” à 3 options Continue - Drop format mismatch record and load rest. Fail - job aborts if any format mismatch records Output - captures rejected data, through Reject link Reject link is allowed only with Reject link “output option” is Reject Mode.
Reject Data
- Format Mismatch
- Data type Mismatch
- Condition Mismatch
Reject link is allowed only with the “output option” is Reject Mode
Missing file Mode: --- (if the file is missed)
- Error – fails job if any file is missed.
- Ok – continue with the rest of the file.
Option:- (To different which record is Coming from which file including the path)
- fileName column
source
properties
option
- Get src into tag
Select file name column
come to the column
- Select FNC – add to columns
Add the at Column Name
- Row number columns
1) gives record no. at src to target gets sequence no. of the src at the target, it is integers
- Read first rows
- Get first ‘n’ records from f1,f2,f3……….fn(that is Read rows = 3, display 1st 3 records from f1,f2,f3……….fn)
- Filter – we can use UNIX based comments like a group
- grep “William” grep “moon” – case Sensitive
grep – I “moon” – ignore the case
grep – I “moon” – exact Matching
grep – I “moon” – other than Moon,(case sensitive)
Inclined to build a profession as Datastage Developer? Then here is the blog post on, explore Datastage Training
Types of links
link Default Colors
Red – Stage Not Connected property (or) job Aborted
Blue – In process
Green – finished
Black - Stage is ready to set the properties
Link Marks
Indicates what data is moved between Stages
- Box at SRC
SRC Stage ready with Mata Data.
- BOX a Tag
Parallel to parallel auto Partitioning
W.r.to Data stage :-
- The compilation is a process of Converting GUI into its machine code, in Process, it checks link requirements, Mandatory Stage properties, and logical errors
Lab Exercise
In the file pattern, What are the options that are missing ?
Sol: Options
First line is column Name = false
Keep file partitions = false
Reject Mode = continue
Report progress = yes
Properties File Name column Row Number column Schema file
Specific files
Options
First line is column Name = false
Keep file partitions = false
Missing File Mode = Depends
Reject Mode = continue
Report progress = yes
Properties
- File Name column
- Filter
- of Record deeper Node
- Read first Rows
- Read from multiple nodes
- Row number column
- Schema file
SRC = 1 file (10 records) , Tag = 2 files, how the output will at the target.
Sol: Output will come in 2 files with alternate records .
That is if we have to records, we get 5 in each target file
Example:- Support, we have 10 records will 1,2,3,4…………..10
The output will be
Target 1 1,3,5,7,9
Target 2 2,4,6,8,10 SRC = 2 files , Tag = 2 files, output = ?
Sol: 1st SRC file into 1st target file
2st SRC file into 2nd target file
SRC = 2 files , Tag = 1 files, output = ?
File 1 = c id, c name, address 2 records
File 2 = c id, c name, address, country 2 record
Sol: When we load a file for the first – time
File 1
C id | C name | Address |
1 | abc | HYD |
2 | def | BNG |
File 2
C id | C name | Address | Country |
3 | gri | chne | India |
4 | sha | knk | India |
While file 1 loads, the output will be 4 rows
C id | C name | address |
1 | abc | HYd |
2 | def | BNG |
3 | gri | Chne, India |
4 | sha | KNK, India |
That is, c id, c name, Address in the file is these in file 2(file 1 Structure is there file 2, so it display records from 2 tables)
- - > While file 2 Is loaded, the output will be 2 rows
C id | C name | Address | Country |
3 | gri | chne | India |
4 | sha | knk | India |
That is cid,c name, Address, in file 2, are not there in file 1, to it does not consider file 2.
That is file 2 structure is not there in file 1 structure so file 1 is discarded and display only File records.
For an in-depth understanding of DataStage click on
- Introduction to DataStage
- Architecture of Data Stage
- Oracle Enterprise in Data Stage
- SCD(Slow changing Dimension) in Data Stage
- DataStage Tutorials
- Horizontal Sorting in DataStage