• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Sorting Techniques in DataStage

Sorting

7Arranging the data in ascending / descending order It improves performance

7Sort is required by ‘Remove duplicate’, ’Aggregator’, ’Join’, ’Merge’

3 Types of Sorting

  1. Source level (user defined SQL)  Select * from emp order by  Dept no
  2. Link sort (Traditional Sorting asc/desc) instage Sort
  3. Stage Sort (complex Sort)

In Data stage

Screenshot_23

Screenshot_24

 

If Stable is true 1st occurrence will display

If Stable is false  last  occurrence will display

Output

7934

7834

7797

7902

7876

7788

7369

7566

7698

7499

7521

7844

7654

 

Screenshot_25

Output

7782

7834

7934

7369

7566

7788

7876

7902

7499

7521

7694

7698

7844

7960

 

Screenshot_26

Screenshot_27

 

Output

7934

7902

7788

Sequential

For Sequential :  parallel we have partition type

For parallel :  Sequential  we have Collecting method

For  Sequential:  Sequential we don’t have type

 

Seq            seq

Oracle     7      file

 

Load EMP file            Partitioning 7Perform Sort7Select Dept No

7Compile And RUN

 

Link Sort

 

Screenshot_28

 

Compile Sort

Generating Group ID

Screenshot_29

 

Group ID

After Sorting  , every group’s 1st record is Set to ‘1’ and Corresponding records floow with ‘0’.

Job1 :- Creating Group ID

Sequential file          Sort         Data set

 

Create an file Sorting.txt

E id 11 22 11 33 11 22 11 33 22
E Name a b c d e f g h i

Sequential file

Loading Sorting.txt

Data set

Creates an .ds file

Sort – – >properties

-Sorting keys

Key = cid

Sort key Mode  = Sort

Sort order   = Ascending

 

Screenshot_30

 

 

Options

Allow duplicates = True

Create Cluster key change column = True

Output Statistics = False

Sort utility = Data Stage

Stable Sort = True

Properties

Sort key Mode = Don’t Sort  (provisory Group)

=Don’t Sort (provisory Sorted)

= Sort

Sort Stage options

Cluster Key

Key change

Note :-

When create Key change column = True Sort mode should be sort

Key word

When it is Set to  true, this column is set to ‘ 1’ for the 1st  row is each group and corresponding records follow with ‘0’in the Same group and so on

Sort7Properties7Output7Do mapping7Compile and RUN

Output

C id E Name Key change
11 a 1
11 b 0
11 c 0
11 d 0
22 e 1
33 f 1
33 g 0

Job 2 :- Generating Group’s for already Sorted data

if data is already in sorted state then

Oracle —Sort—dataset

Load Sorted file   7 properties  7Sort key Mode  =  Sort  (previously Sorted)

(and)

Create cluster key change column = True

output :-

Generates Group ID’s

Filter

Blocking  unwanted Data

7 In Data stages filter can be performed is 3 difference.

  1. Source level 7 user defined (Database / file)
  2. Stages 7 (filter, Switch, External filter)
  3. Constraints (Specified in transform look up)

 

Filter Switch External filter
  1. Conditional on multiple columns
 Conditions on single columns Based on UNIX command
  1. Supports

1 – i/p

N – o/p

1—reject link

support1– i/p128– o/p

1– Reject link

1—i/p1—o/pNo reject link

Options

Output Row only once = True

If it is Said to true, it will Send that row only to 1st where clause when it matches , and if it is false and that row is send to all where clauses that makes.

Inclined to build a profession as DataStage Developer?
Then here is the blog post on DataStage Training.

job4

Oracle enterprise 7filter7  dept 7data set 1  (dept no =10)

Sat7 data set  (Sal > 2000 and Sal <22000)

 

Load EMP table

 

Properties 7

Where clause = dept no = 10

Output link         = 0

Where clause = Sal > 20000  and Sal < 22000

Output link = 1

 

Options

Output Reject = false

Output Row only once = false

Click on link 

dept

1 Sal                output link

Click on output

  • Do Mapping for Dept, Sal
  • Column and RUN

Output

Data set 1 

7782   10

7839   10

7934  10

Data set 2

7369   20

7499  30

7521   30

7654     30

7844   30

7836   20

7940  20

7934    10

Job 5 

Oracle         filter              dept           ds1

Sal                ds2

Where clause = dept no = 10

Output link         = 0

Where clause = Sal > 20000  and Sal < 22000

Options

Output  Row only once = True

Output

Data set 2

7369   20

7499    30

7521    30

7654   30

7844   30

7876  20

7900 30

Job 6

 If we don’t want to display Sal > 20000 and Sal < 22000 condition for dept no = 10, then just inter change the where

clauses

Output row once = True

Where clause = Sal > 20000  and Sal < 22000

Output link = 0

Where clause = dept no = 10

Output link         = 1

Output Row only once = True

Output

  Data set 1          Data set 2

7369   20                    7782 10

7499   30                  7832 10

7521   30

7654   30

7844  30

7873   20

7900  30

7934   10

job 7 :-

we a reject link to capture discarded data :-

 

Screenshot_31

 

Output 

DS      3

7566   20

7698    30

  • 20

 

Job8 

Screenshot_32

In SQL plus, Create an table Sorting

Create table Sorting (c id number (3), E name varchar (5));

Insert in to Sorting (c id, E name) values (I, cid, ‘& e Name’)

Oracle :-  load Sorting table

Sort :-  key = E id

Sort key Mode = Sort

Create key change column = true

  • Go to output
  • Mapping

Filter

where clause = key change = 1

Output

Where clause = keyword = 0

Output link =1

7click on output  7DO Mapping  7Compile And RUN

Output 

DS 1

E id E name Key change
11 A 1
22 B 1
33 C 1

 

DS2 :-

E id E name Key change
11 C 0
11 E 0
11 G 0
22 F 0
22 I 0
33 H 0

 

Job 9:-

Screenshot_33

 

Peek

Output  7Map EMP, ds link 3

 

Filter

Where = dept no = 10

Output link = 0

Where = Sal > 20000 and Sal < 22000

Output link  = 1àlink ordering

 

Output :-  Map dept, Sal

 

Options

Output rejects  = True

Output row once only = True

7Compile and RUN

 

Job10

 

Screenshot_34

 

Filter

Where = dept no = 10

Output link = 0

Where = Sal > 20000 and Sal < 22000

Output link  = 1

Output rejects  = True

Output row once only = True

Output

Do Mapping

Filter 1

Where = dept no = 10

Output link = 0

Output rejects = True

7Compile and RUN

 Switch

 

Screenshot_35

 

Switch 7 input

Selector = dept no     (Column on which we are writing condition)

Selector mode = user . defined mapping

User defined Mapping

Case = 10 =0          link ordering

Case = 20=1

 

Options  :-  If Not found  = Drop

 

Screenshot_36

 

 

7Switch 7 properties 7options 7If not found = Output

  1. External filter :-  (uses UNIX Commands)
  • Sequential file —-External filter —Dataset

 

Create a file

11 Suman new work 100

22kumar new work  101

33 sumathi  new wok 102

44 archana pennushavania 100

55 renu new jersey  101

66 eswamin new York 001

77 sravani new York 101

S.F

File = D:/shilpa/external .txt àLoad it with Single column àColumns records varchar 255

External filter 

Filter command  grep – I “newyork” 7Output 7 load the Structure 7  records varchar 255

 

Data set

Properties7Load a file

Screenshot_37

 

Peek :-  DO mapping

 

External filter 1

Filter command = grep  -I   “new  York”

OutputàLoad with single column

External filter 2 

Filter command = grep  -I   “pernysla varia”

Output:

Load with single column

Group wise Sorting

Oracle —Sort 1—Sort2 —-Data set

Load

EMP

table

Sort 1

Key = Dept No

Create Key change column = True  7output 7Do Mapping

Sort 2

Key = Dept No

Sort key word Mode = Don’t Sort previously Grouped

Sort order = Ascending

Key = Sal

Sort key Mode= Sort

Create key change column = True7Output 7 Do mapping

For indepth understanding of DataStage click on

Summary
Review Date
Reviewed Item
Sorting Techniques in DataStage
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Sorting Techniques in DataStage"

Leave a Message

Your email address will not be published. Required fields are marked *

Site Disclaimer, Copyright © 2016 - All Rights Reserved.