Join
Join needs Same key Column Name
When we join 2 tables
For ex:-EMP and Dept
In join, While mapping
If Join type = Inner/left outer join, then Dept no while joining comes from left
If join type = right outer join, then deptno comes from right
If join type = full outer join, then deptno comes from both left and right
If join type = Right outer join , then Dept no comes from Right
Note :-
Full outer join Can take only 2 inputs, that is it can’t join any 3 rd table
- Job1 :- Join
Join
Key = Dept no
Options
Join type = InneràOutputDo Mapping
Join type = Inner / left outer/Right outer/ fullouter
Output
EMP No E Name Job Mgr Hire date Sal Comm Dept no D Name loc
- Job2 : If Column Names are Different use an copy operator
Oracle 0
Load EMP table
oracle 1
Load EMP table columns
Stage any columns Name
Copy :-
Output Columns
Change the column Name
Join
Key = EMP No
Join type = INNER Output
Do Mapping
Output
EMP No E Name Job Mgr Hire date Sal Comm Dept no
Job 3
T1 : C id, E name, d no
T2:D no, D name, loc id
T3:loc id , loc Name, area id
T4: Area id , add1, add2
Output
EID No E Name Dept no D Name loc id loc name area id add1 add2
first load T1, T2 and perform join 0 key = dno
Now ,load T3 and perform join1, key =load
load T4 and perform join 2 key = area id
Compile and Run
Output :-
Area id , add1,add2,loc id , loc name, d no, d name , c id , e name
job 4
If we have 2 tables with same structure and same column Names
E1
E id | E name | Add |
11 | A | HYD |
22 | B | SEC |
E2
E id | E name | Add |
11 | A | DEL |
22 | B | GNT |
Output
A A HYD DEL
B B SEC GNT
change the column Name in any one of the table using copy operator with out changing key column name
Copy
OutputDo mapping
Columns
Change E name to E name 1
Join
Key = c id Do mapping
Output
E id | E name | Add | E name | Add |
11 | A | HYD | A | DEL |
22 | B | SEC | B | GNT |
changing the column Name without using copy operator
Load EMP1, EMP2 Select Read Method = user defined SQL= Select EMP no, E name, as E name , job as job 1, mgr as magr1, hire date as Hire data1 , Sal as Sal1, comm. As comm. From EMP
columns
change Names Here also
Output
E id E name job mgr E name 1 job 1 mgr 1 ……..
job 5 :- 2 Tables with out key dependency
EMP
E id E name Add
11 A HYD
22 B SEC
SRC
SRC- Name TBL – Name
Kindle CRM – CUST
Create an column for key dependency using column Generator.
Create a new column ,in both the column generator
Column generate = Dummy
Algorithm = cycle
Value = 1
Output
E id E name Add Src - Name TABLE – Names
11 A HYD Kindle CRM – CUST
22 B SEC kindle CRM – CUST
Aspired to become an DataStage? Explore the post to discover the know-hows on DataStage Training.
Merge
It has two Modes
- Keep (left outer join)
- Drop (inner join)
Master table :- (Stable info)
Stable which are not changes frequently.
P id, prd. Desc, prd -manuf
11, Linux, Tata
22, Santoor, Wipro
33, Wipro, unilever
44, Margo, unilever
up date 1
P id, prd – age , prd – price
11, 4, 87
22, 2, 42
55, 1, 35
66, 7, 90
update 2 :-
Pid, prd – supplier, prd – Category
11, hind, health
33, Tata, Nutri
77, Bagan , Health
88, recom, Home
job 1
Merge
- Merge keys
Key = p id
Sort order = Ascending
Options
Un matched master mode = Drop / keep
Warn on Reject updates = True
Warn on un matched master = True
Click on the link ordering
Output Do mapping
Output
Data set 0:
PID | Prod_desc | Prod_manuf | Prd_age | prdprice | Prd_supplier | Prd catagory |
11 | Lux | TATA | 4 | 87 | Hind | health |
22 | Samtoor | Wipro | 2 | 42 | ||
33 | wipro | Uniliver | 0 | 0 | tata | utri |
Dataset1
Pid | Prd age | Prd price |
55 | 1 | 35 |
66 | 7 | 90 |
Dataset 2
pid | Prd supplier | Prd cat |
77 | bagan | health |
99 | Re com | Home |
De- Duplication :- (Removing Duplicate records)
C id E name
11, a
11, b
22, c
11, d
33, u
22, n
33, i
11, t
22, l
Different ways
Remove Duplicates
Sorted Data
E id E name
11 a
11 b
11 d
22 b
22 n
22 l
33 i
33 u
- Remove Duplicate is a stage which supports 1 input and 1 output
- Used to remove duplicate
- Expects only Sorted data
Output
E id , E name
11, a
22, b
33, i
Aggregator
Aggregator is a processing Stage which works on groups and perform counting and calculations supports 1 input and 1 output.
(a) job :- Need to find the no .of duplicates
Aggregate type = Count rows
Count output column = dept . count
Options
Allow Null output = false
Method = hash output
Do Mapping
Output
E id dept – count
11 3
22 3
33 2
(b) fork join :-
Copy
Output Do mapping for job, Aggregator
Aggregator
Group = c id
Aggregation type = Count Rows
Count output column = dept – countoutput
Do mapping
Join
Key = c id output
Do mapping
output
cid | cname | Dept_count |
11 | A | 3 |
11 | B | 3 |
11 | D | 3 |
22 | B | 3 |
22 | N | 3 |
22 | L | 3 |
33 | I | 2 |
33 | u | 2 |
job
Aggregation Type = calculation
Column for calculation = Sal
Maximum value output column = MAX Sal
Minimum value output column = MIN Sal
SUM output column = total output
DO mapping
Output
Dept No | Max Sal | Min Sal | total |
10 | 25000 | 21300 | 68750 |
20 | 23000 | 20800 | 110875 |
30 | 22850 | 20950 | 129400 |
job
Copy :- output Do Mapping
Aggregator
Group = dept no
Aggregation Type = Calculation
Column for calculation = Sal
=Max Sal
= Min Sal
= tat Sal
output
Do mapping
Column generator
Column to generate = companyMapping
Value = IBM
Aggregator 1
Group = Company
Column for calculation = Sal
= Max Sal
Output Mapping
Output
Data set 1
Company Max Sa
IBM 25000
job
All the Employee details of an particular dept no. should be display
Copy :- output Do Mapping
Aggregator
Group = dept no
Aggregation Type = Calculation
Column for calculation = Sal
=Max Sal
= Min Sal
= tat Sal
Join
Key = e id output
DO mapping
Remove Duplicates
Key = c idDuplicate to Retain = first
Output
Do Mapping
For indepth understanding of DataStage click on