Sorting Techniques in DataStage

Ratings:
(4)
Views:0
Banner-Img
  • Share this blog:

Sorting

7Arranging the data in ascending / descending order It improves performance

7Sort is required by ‘Remove duplicate’, ’Aggregator’, ’Join’, ’Merge’

Types of Sorting

  1. Source level (user-defined SQL)  Select * from emp order by  Dept no
  2. Link sort (Traditional Sorting asc/desc) in stage Sort
  3. Stage Sort (complex Sort)

In the Data stage

Sorting

Sorting  

If Stable is true 1st occurrence will display

If Stable is false  last  occurrence will display

Output

7934

7834

7797

7902

7876

7788

7369

7566

7698

7499

7521

7844

7654

  output

Output

7782

7834

7934

7369

7566

7788

7876

7902

7499

7521

7694

7698

7844

7960

  output

Output  

Output

7934

7902

7788

Sequential

For Sequential:  parallel we have partition type

For parallel:  Sequential  we have the Collecting method

For  Sequential:  Sequential we don’t have type

  Seq            seq

Oracle     7      file  

Load EMP file            Partitioning 7Perform Sort7Select Dept No

7Compile And RUN  

Link Sort

  Link Sort  

Compile Sort

Generating Group ID

Compile sort  

Group ID

After Sorting, every group’s 1st record is Set to ‘1’ and Corresponding records flow with ‘0’.

Job1:- Creating Group ID Sequential file          Sort         Data set   Create a file Sorting.txt

E id 11 22 11 33 11 22 11 33 22
E Name a b c d e f g h i

Sequential file

Loading Sorting.txt

Data set

Creates a .ds file

Sort - - >properties

-Sorting keys

Key = cid

Sort key Mode  = Sort

Sort order   = Ascending

  Data set    

Options

Allow duplicates = True

Create Cluster key change column = True

Output Statistics = False

Sort utility = Data Stage

Stable Sort = True

Properties

Sort key Mode = Don’t Sort  (provisory Group) =Don’t Sort (provisory Sorted) = Sort

Sort Stage options

Cluster Key

Key change

Note:- When creating Key change column = True Sort mode should be sort

Keyword

When it is set to  true, this column is set to ‘ 1’ for the 1st  row is each group and corresponding records follow with ‘0’in the Same group and so on Sort7Properties7Output7Do mapping7Compile and RUN

Output

C id E Name Key change
11 a 1
11 b 0
11 c 0
11 d 0
22 e 1
33 f 1
33 g 0

Job 2:- Generating Group’s for already Sorted data

if data is already in a sorted state then

Oracle ---Sort—dataset

Load Sorted file   7 properties  7Sort key Mode  =  Sort  (previously Sorted)

(and)

Create cluster key change column = True

output:- Generates Group ID’s

Filter

Blocking unwanted Data

7 In Data stages filter can be performed is 3 difference.

  1. Source level 7 user-defined (Database/file)
  2. Stages 7 (filter, Switch, External filter)
  3. Constraints (Specified in transform lookup)

 

Filter Switch External filter
  1. Conditional on multiple columns
 Conditions on single columns Based on UNIX command
  1. Supports
1 – i/p N – o/p 1—reject link
support1-- i/p128– o/p 1-- Reject link 1—i/p1—o/pNo reject link

Options

Output Row only once = True

If it is Said to true, it will Send that row only to 1st where clause when it matches, and if it is false and that row is sent to all were clauses that make.

Inclined to build a profession as Datastage Developer? Then here is the blog post on, explore Datastage Training

job4

Oracle enterprise 7filter7  dept 7data set 1  (dept no =10)

Sat7 data set  (Sal > 2000 and Sal <22000)  

Load EMP table  

Properties 7

Where clause = dept no = 10

Output link         = 0

Where clause = Sal > 20000  and Sal < 22000

Output link = 1  

Options

Output Reject = false

Output Row only once = false

Click on link 

dept

1 Sal                output link

Click on output

  • Do Mapping for Dept, Sal
  • Column and RUN

Output

Data set 1 

7782   10

7839   10

7934  10

Data set 2

7369   20

7499  30

7521   30

7654     30

7844   30

7836   20

7940  20

7934    10

Job 5 

Oracle filter dept           ds1

Sal                ds2

Where clause = dept no = 10

Output link         = 0

Where clause = Sal > 20000  and Sal < 22000

Options

Output  Row only once = True

Output

Data set 2

7369   20

7499    30

7521    30

7654   30

7844   30

7876  20

7900 30

Job 6

 If we don’t want to display Sal > 20000 and Sal < 22000 conditions for dept no = 10, then just interchange the where clauses

Output row once = True

Where clause = Sal > 20000  and Sal < 22000

Output link = 0

Where clause = dept no = 10

Output link         = 1

Output Row only once = True

Output

  Data set 1          Data set 2

7369   20                    7782 10

7499   30                  7832 10

7521   30

7654   30

7844  30

7873   20

7900  30

7934   10

job 7:-

we a reject link to capture discarded data:-

  Filter  

Output 

DS      3

7566   20

7698    30

  • 20

Job8

  Sort

In SQL plus, Create a table Sorting

Create table Sorting (c id number (3), E name varchar (5));

Insert into Sorting (c id, E name) values (I, cid, ‘& e Name’)

Oracle:-  load Sorting table

Sort:-  key = E id

Sort key Mode = Sort

Create key change column = true

  • Go to output
  • Mapping

Filter

where clause = key change = 1

Output

Where clause = keyword = 0

Output link =1

7click on output  7DO Mapping  7Compile And RUN

Output 

DS 1

E id E name Key change
11 A 1
22 B 1
33 C 1

  DS2:-

E id E name Key change
11 C 0
11 E 0
11 G 0
22 F 0
22 I 0
33 H 0

 

Job 9:-

Filter  

Peek

Output  7Map EMP, ds link 3  

Filter

Where = dept no = 10

Output link = 0

Where = Sal > 20000 and Sal < 22000

Output link  = 1àlink ordering  

Output :-  Map dept, Sal  

Options

Output rejects  = True

Output row once only = True

7Compile and RUN

  Job10

  Sort  

Filter

Where = dept no = 10

Output link = 0

Where = Sal > 20000 and Sal < 22000

Output link  = 1

Output rejects  = True

Output row once only = True

Output

Do Mapping

Filter 1

Where = dept no = 10

Output link = 0

Output rejects = True

7Compile and RUN

Switch

  Switch  

Switch 7 input

Selector = dept no     (Column on which we are writing condition)

Selector mode = user . defined mapping

User defined Mapping Case = 10 =0          link ordering

Case = 20=1  

Options  :-

  If Not found  = Drop

  Sort    

7Switch 7 properties 7options 7If not found = Output

  1. External filter:-  (uses UNIX Commands)
  • Sequential file ----External filter ---Dataset

 

Create a file

11 Suman new work 100

22kumar new work  101

33 sumathi  new wok 102

44 Archana Pennsylvania 100

55 renu new jersey  101

66 eswamin new York 001

77 sravani new York 101

S.F

File = D:/shilpa/external .txt àLoad it with Single column àColumns records varchar 255

External filter 

Filter command  grep – I “NewYork” 7Output 7 load the Structure 7  records varchar 255  

Data set

Properties7Load a file

Dataset  

Peek:-  DO mapping  

External filter 1

Filter command = grep  -I   “new  York”

OutputàLoad with a single column

External filter 2 

Filter command = grep  -I   “pernyslavaria”

Output:

Load with a single column

Group wise Sorting

Oracle ---Sort 1—Sort2 ----Data set

Load

EMP

table

Sort 1

Key = Dept No

Create a Key change column = True  7output 7Do Mapping

Sort 2

Key = Dept No

Sort keyword Mode = Don’t Sort previously Grouped

Sort order = Ascending

Key = Sal

Sort key Mode= Sort

Create key change column = True7Output 7 Do mapping

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.