Joins in Data Stage

Join

Join needs Same key Column Name

When we join 2 tables

For ex:-EMP and Dept

In join, While mapping

If Join type   = Inner/left outer join, then Dept no while joining comes from left

Screenshot_55

 

7If join type = right outer join, then deptno comes from right

7If join type = full outer join, then deptno comes from both left and right

 

If join type = Right outer join , then Dept no comes from Right

 

Note :-

Full outer join Can take only 2 inputs, that is it can’t join any 3 rd table

  1. Job1 :- Join

 

Screenshot_56

 

Join

Key = Dept no

Options

Join type = InneràOutput7Do Mapping

Join type = Inner / left outer/Right outer/ fullouter

 

Output

EMP No    E Name    Job      Mgr   Hire date        Sal  Comm   Dept no     D Name  loc

 

  1. Job2 : If Column Names are Different use an copy operator    

 

Screenshot_57

 

 

Oracle 0

Load EMP table

 

oracle 1

Load EMP table 7columns7Stage any columns Name

 

Copy :-

Output 7Columns7Change the column Name

 

Join

Key = EMP No

Join type = INNER 7 Output 7 Do Mapping

 

Output 

EMP No    E Name    Job      Mgr   Hire date        Sal  Comm   Dept no

Job 3 

T1 : C id, E name, d no

T2:D no, D name, loc id

T3:loc id , loc Name, area id

T4: Area id , add1, add2

 

Output 

EID No    E Name       Dept no     D Name  loc id loc name area id add1 add2

 

Screenshot_58

 

 

7first load T1, T2 and perform join 0 key  = dno

7Now ,load T3 and perform join1, key =load

7load T4 and perform join 2        key = area id

7Compile and Run

 

Output :-

Area id , add1,add2,loc id , loc name,  d no, d name ,  c id , e name

job 4

If we have 2 tables with same structure and same column Names

E1

E idE nameAdd
11AHYD
22BSEC

 

E2

E idE nameAdd
11ADEL
22BGNT

 

Output 

A          A     HYD               DEL

B          B         SEC             GNT

 

change the column Name in any one of the table using copy operator with out changing key column name

 

Screenshot_59

 

Copy

Output7Do mapping7Columns7Change E name to E name 1

 

Join

Key   = c id 7Do mapping

 

Output 

E idE nameAddE nameAdd
11AHYDADEL
22BSECBGNT

 

changing the column Name without using copy operator

 

Screenshot_60

 

 

Load EMP1, EMP2 7Select Read Method  = user defined SQL= Select EMP no, E name, as E name , job as job 1, mgr as magr1, hire date as Hire data1 , Sal as Sal1, comm. As comm. From EMP

7columns 7 change Names  Here also

Output 

E id  E name job mgr E name 1 job 1 mgr 1 ……..

 

job 5 :- 2 Tables with out key dependency

 

EMP

E id         E name                 Add

11                 A                      HYD

22                  B                     SEC

 

SRC

SRC- Name              TBL – Name

Kindle                    CRM – CUST

 

7Create an column for key dependency using column Generator.

 

Screenshot_61

 

7 Create a new column ,in both the column generator

Column generate = Dummy

Algorithm = cycle

Value = 1

 

Output

E id      E name      Add    Src  - Name      TABLE – Names

11               A          HYD        Kindle                   CRM – CUST

22                 B       SEC           kindle                   CRM – CUST

Aspired to become an DataStage? Explore the post to discover the know-hows on DataStage Training.

Merge

It has two Modes

  1. Keep (left outer join)
  2. Drop (inner join)

Master table :-  (Stable info)

Stable 7 which are not changes frequently.

 

P id,   prd. Desc,   prd  -manuf

11,          Linux,               Tata

22,         Santoor,        Wipro

33,           Wipro,       unilever

44,         Margo,       unilever

 

up date 1 

P id,  prd – age ,  prd – price

11,         4,                        87

22,          2,                         42

55,         1,                          35

66,        7,                           90

 

update 2 :-

Pid,          prd – supplier,          prd – Category

11,                  hind,                     health

33,                Tata,                       Nutri

77,              Bagan ,                   Health

88,             recom,                     Home

 

 job 1 

Screenshot_62

 

Merge

  • Merge keys

Key = p id

Sort order = Ascending

Options

 

Un matched master mode  = Drop / keep

Warn on Reject updates  = True

Warn on un matched master = True

Click on the link ordering

Output 7Do mapping

 

Output 

Data set 0:

PIDProd_descProd_manufPrd_ageprdpricePrd_supplierPrd catagory
11LuxTATA487Hindhealth
22SamtoorWipro242
33wiproUniliver00tatautri

 

Dataset1

PidPrd agePrd price
55135
66790

 

Dataset 2

pidPrd supplierPrd cat
77baganhealth
99Re comHome

 

 De- Duplication  :-   (Removing  Duplicate records)

C id        E name

11,                a

11,                b

22,               c

11,               d

33,                u

22,                n

33,                 i

11,                 t

22,                l

 

Different ways

Screenshot_63

 

Remove Duplicates 

Sorted Data

 

E id                  E name

11                      a

11                       b

11                      d

22                     b

22                      n

22                     l

33                    i

33                     u

 

  • Remove Duplicate is a stage which supports 1 input and 1 output
  • Used to remove duplicate
  • Expects only Sorted data

Screenshot_63

Output

E id ,   E name

11,  a

22, b

33, i

Aggregator

Aggregator  is a processing Stage which works on groups and perform counting and calculations supports  1 input and 1 output.

 

(a)  job  :-   Need to find the no .of duplicates

Screenshot_66

Aggregate type = Count rows

Count output column = dept . count

Options

Allow   Null output   = false

Method   = hash  7 output  7Do  Mapping

 

Output 

E id        dept – count

11                 3

22                 3

33                 2

 

(b)  fork join   :-

Screenshot_66

 

Copy

Output 7 Do mapping for job,  Aggregator

 

Aggregator

Group  = c id

Aggregation   type =   Count Rows

Count  output column  = dept – count7output7Do mapping

Join

Key = c id 7 output 7 Do mapping

 

output

cidcnameDept_count
11A3
11B3
11D3
22B3
22N3
22L3
33I2
33u2

 

job

Screenshot_67

 

Aggregation  Type = calculation

Column for calculation   = Sal

Maximum  value output column = MAX Sal

Minimum  value output column = MIN Sal

SUM output column = total   7output 7DO mapping

 

Output 

Dept NoMax SalMin Saltotal
10250002130068750
202300020800110875
302285020950129400

job

Screenshot_68

 

Copy :-  output 7Do Mapping

 

Aggregator

Group  = dept no

Aggregation Type = Calculation

Column for calculation = Sal

=Max Sal

= Min Sal

= tat Sal

output

Do mapping

 

Column generator

Column to generate = company7Mapping7Value = IBM

 

Aggregator 1

Group  = Company

Column for calculation = Sal

= Max Sal

Output 7Mapping

 

Output

Data set 1 

Company            Max Sa

IBM                      25000

job 

Screenshot_69

 

 

7All the Employee details of an particular dept no. should be display

Copy :-  output 7Do Mapping

 

Aggregator 

Group  = dept no

Aggregation Type = Calculation

Column for calculation = Sal

=Max Sal

= Min Sal

= tat Sal

 

Join 

Key = e id 7 output 7 DO mapping

 

Remove Duplicates

Key = c id7Duplicate to Retain = first7Output7Do Mapping

For indepth understanding of DataStage click on