Joins in Data Stage

20 September, 2018


Related Blogs


Join needs Same key Column Name When we join 2 tables For ex:-EMP and Dept In join, While mapping If Join type   = Inner/left outer join, then Dept no while joining comes from left Screenshot_55   7If join type = right outer join, then deptno comes from right 7If join type = full outer join, then deptno comes from both left and right   If join type = Right outer join , then Dept no comes from Right   Note :- Full outer join Can take only 2 inputs, that is it can’t join any 3 rd table

  1. Job1 :- Join

  Screenshot_56   Join Key = Dept no Options Join type = InneràOutput7Do Mapping Join type = Inner / left outer/Right outer/ fullouter   Output EMP No    E Name    Job      Mgr   Hire date        Sal  Comm   Dept no     D Name  loc  

  1. Job2 : If Column Names are Different use an copy operator    

  Screenshot_57     Oracle 0 Load EMP table   oracle 1 Load EMP table 7columns7Stage any columns Name   Copy :- Output 7Columns7Change the column Name   Join Key = EMP No Join type = INNER 7 Output 7 Do Mapping   Output  EMP No    E Name    Job      Mgr   Hire date        Sal  Comm   Dept no Job 3  T1 : C id, E name, d no T2:D no, D name, loc id T3:loc id , loc Name, area id T4: Area id , add1, add2   Output  EID No    E Name       Dept no     D Name  loc id loc name area id add1 add2   Screenshot_58     7first load T1, T2 and perform join 0 key  = dno 7Now ,load T3 and perform join1, key =load 7load T4 and perform join 2        key = area id 7Compile and Run   Output :- Area id , add1,add2,loc id , loc name,  d no, d name ,  c id , e name job 4 If we have 2 tables with same structure and same column Names E1

E id E name Add
11 A HYD
22 B SEC


E id E name Add
11 A DEL
22 B GNT

  Output  A          A     HYD               DEL B          B         SEC             GNT   change the column Name in any one of the table using copy operator with out changing key column name   Screenshot_59   Copy Output7Do mapping7Columns7Change E name to E name 1   Join Key   = c id 7Do mapping   Output 

E id E name Add E name Add

  changing the column Name without using copy operator   Screenshot_60     Load EMP1, EMP2 7Select Read Method  = user defined SQL= Select EMP no, E name, as E name , job as job 1, mgr as magr1, hire date as Hire data1 , Sal as Sal1, comm. As comm. From EMP 7columns 7 change Names  Here also Output  E id  E name job mgr E name 1 job 1 mgr 1 ……..   job 5 :- 2 Tables with out key dependency   EMP E id         E name                 Add 11                 A                      HYD 22                  B                     SEC   SRC SRC- Name              TBL – Name Kindle                    CRM – CUST   7Create an column for key dependency using column Generator.   Screenshot_61   7 Create a new column ,in both the column generator Column generate = Dummy Algorithm = cycle Value = 1   Output E id      E name      Add    Src  - Name      TABLE – Names 11               A          HYD        Kindle                   CRM – CUST 22                 B       SEC           kindle                   CRM – CUST

Aspired to become an DataStage? Explore the post to discover the know-hows on DataStage Training.

Merge It has two Modes

  1. Keep (left outer join)
  2. Drop (inner join)

Master table :-  (Stable info) Stable 7 which are not changes frequently.   P id,   prd. Desc,   prd  -manuf 11,          Linux,               Tata 22,         Santoor,        Wipro 33,           Wipro,       unilever 44,         Margo,       unilever   up date 1  P id,  prd – age ,  prd – price 11,         4,                        87 22,          2,                         42 55,         1,                          35 66,        7,                           90   update 2 :- Pid,          prd – supplier,          prd – Category 11,                  hind,                     health 33,                Tata,                       Nutri 77,              Bagan ,                   Health 88,             recom,                     Home    job 1  Screenshot_62   Merge

  • Merge keys

Key = p id Sort order = Ascending Options   Un matched master mode  = Drop / keep Warn on Reject updates  = True Warn on un matched master = True Click on the link ordering Output 7Do mapping   Output  Data set 0:

PID Prod_desc Prod_manuf Prd_age prdprice Prd_supplier Prd catagory
11 Lux TATA 4 87 Hind health
22 Samtoor Wipro 2 42    
33 wipro Uniliver 0 0 tata utri



Pid Prd age Prd price
55 1 35
66 7 90


Dataset 2

pid Prd supplier Prd cat
77 bagan health
99 Re com Home

   De- Duplication  :-   (Removing  Duplicate records) C id        E name 11,                a 11,                b 22,               c 11,               d 33,                u 22,                n 33,                 i 11,                 t 22,                l  

Different ways

Screenshot_63   Remove Duplicates 

Sorted Data

  E id                  E name 11                      a 11                       b 11                      d 22                     b 22                      n 22                     l 33                    i 33                     u  

  • Remove Duplicate is a stage which supports 1 input and 1 output
  • Used to remove duplicate
  • Expects only Sorted data

Screenshot_63 Output E id ,   E name 11,  a 22, b 33, i Aggregator Aggregator  is a processing Stage which works on groups and perform counting and calculations supports  1 input and 1 output.   (a)  job  :-   Need to find the no .of duplicates Screenshot_66 Aggregate type = Count rows Count output column = dept . count


Allow   Null output   = false Method   = hash  7 output  7Do  Mapping   Output  E id        dept – count 11                 3 22                 3 33                 2   (b)  fork join   :- Screenshot_66  


Output 7 Do mapping for job,  Aggregator   Aggregator Group  = c id Aggregation   type =   Count Rows Count  output column  = dept – count7output7Do mapping Join Key = c id 7 output 7 Do mapping   output

cid cname Dept_count
11 A 3
11 B 3
11 D 3
22 B 3
22 N 3
22 L 3
33 I 2
33 u 2

  job Screenshot_67   Aggregation  Type = calculation Column for calculation   = Sal Maximum  value output column = MAX Sal Minimum  value output column = MIN Sal SUM output column = total   7output 7DO mapping   Output 

Dept No Max Sal Min Sal total
10 25000 21300 68750
20 23000 20800 110875
30 22850 20950 129400

job Screenshot_68   Copy :-  output 7Do Mapping   Aggregator Group  = dept no Aggregation Type = Calculation Column for calculation = Sal =Max Sal = Min Sal = tat Sal output Do mapping   Column generator Column to generate = company7Mapping7Value = IBM   Aggregator 1 Group  = Company Column for calculation = Sal = Max Sal Output 7Mapping   Output Data set 1  Company            Max Sa IBM                      25000 job  Screenshot_69     7All the Employee details of an particular dept no. should be display Copy :-  output 7Do Mapping   Aggregator  Group  = dept no Aggregation Type = Calculation Column for calculation = Sal =Max Sal = Min Sal = tat Sal   Join  Key = e id 7 output 7 DO mapping   Remove Duplicates Key = c id7Duplicate to Retain = first7Output7Do Mapping For indepth understanding of DataStage click on

About Author


Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills .