• USA : +1 973 910 5725
  • INDIA: +91 905 291 3388
  • info@tekslate.com
  • Login

Types of Partition

Types of partition

-Hash

-Modulus

-DB2

-Auto

-Random

-Range

-Round Robin

-Entire

-Same

Key based partition

  • Hash
  • Modulus
  • DB2

Hash

Determines partition based on key value(s). One or more keys with different data type are supported.

Example: Key is State. All “CA” rows go into one partition; all “MA” rows go into one partition. Two rows of the same state never go into different partitions.

Modulus

Partition based on modulus of key divided by the number of partitions. Key is an Integer type.

Example: Key is OrderNumber (Integer type). Rows with the same order number will all go into the same partition.

DB2

Matches DB2 EEE partitioning

Example: This partition is used when loading data into the DB2 table. It takes the partition key from the loading DB2 table and inserts the records effectively. If the partition key is defined in the DB2 database then it takes that Partition key otherwise it defaults to primary key.

This figure gives the clear view of Key based Partitioning and repartitioning.

Capture.843

Ascential parallel technology operates by a divide-and-conquer technique, splitting the largest integration jobs into subsets (“partition parallelism”) and flowing these subsets concurrently across all available processors (“pipeline parallelism”). This combination of pipeline and partition parallelism delivers true linear scalability (defined as an increase in performance proportional to the number of processors) and makes hardware the only mitigating factor to performance. However, downstream processes may need data partitioned differently. Consider a transformation that is based on customer last name, but the enriching needs to occur on zip code – for house-holding purposes – with loading into the warehouse based on customer credit card number (more on parallel database interfaces below). With dynamic data re-partitioning, data is re-partitioned on-the-fly between processes – without landing the data to disk – based on the downstream process data partitioning needs.

Keyless partition

-Round Robin

-Entire

-Same

Round Robin

The first record goes to the first processing node, the second to the second processing node, and so on. When DataStage reaches the last processing node in the system, it starts over. This method is useful for resizing partitions of an input data set that are not equal in size. The round robin method always creates approximately equal-sized partitions. This method is the one normally used when DataStage initially partitions data.

Example: Assume 8 nodes are allocated to store the records then 1st record will go into the first node and the 2nd record will go into the second node ……8th record will go into the eighth node and the 9th record will go into the first node and so on….

Entire

Send all rows down all partitions

Example: Assume 8 nodes are allocated, then in all the 8 nodes all the records will be passed.

Same

Preserve the same partitioning.

Example: Two stages in a job (Sort and Dataset). In sort stage you have done “Hash” partition and in the dataset you have given “Same” partition. In the dataset the data will be preserved with the hash partition.

Application Execution:

Parallel jobs can be executed in two ways

-Sequential

-Parallel

In the first slot of the below figure sequential execution is shown. Parallel job can be executed in two processing, SMP and MPP, in the second and third slot of the below figure it is shown.

Capture.844

Best allocation

Best allocation of Partitions in DataStage for storage area

S. No No of Ways Volume of Data Best way of Partition Allocation of Configuration File (Node)
DB2 EEE  extraction in serial Low 1
2. DB2 EEE extraction in parallel High Node number = current node (key) 64 (Depends on how many nodes are allocated)
 3. Partition or Repartition in the Stages of DataStage Any Modulus (It should be single key that to integer)

Hash (Any number of keys with different data type)

8 (Depends on how many nodes are allocated for the job)
 4. Writing into DB2 Any DB2
 5. Writing into Dataset Any Same 1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
 6. Writing into Sequential File Low 1

Best allocation of Partitions in DataStage for each stage

S. No Stage Best way of Partition Important points
 1. Join Left and Right link: Hash or Modulus All the input links should be sorted based on the joining key and partitioned with higher key order.
 2. Lookup Main link: Hash or same

Reference link: Entire

Both the links need not be in the sorted order
 3. Merge Master and update link: Hash or Modulus All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.
 4. Remove Duplicate, Aggregator Hash or Modulus If the input link is in sorted order based on the key it will perform better.
 5. Sort Hash or Modulus Sorting happens after partitioning
 6. Transformer, Funnel, Copy, Filter Same None
 7. Change Capture Left and Right link: Hash or Modulus Both the input links should be in the sorted order based on the key and partitioned with higher key order.
These core tutorials will help you to learn the fundamentals of DataStage.
For an in-depth understanding and practical experience, explore DataStage Training.

Appropriate Ways

Step 1: (Serial extraction with proper partition)

In this job, extraction is made serial in both the DB2 stages. All the records will reside in 1 partition. As input link1 of the Sort stage has been “hash” partitioned, 1 partition will split into 8 partitions based on the given key(s) and records are also sorted based on the joining key(s). In Join stage for link2, “same” partition is given and on the other side for link3 with the same key(s) “hash” partition and internal sorting is done so that records from 1 partition will split into 8 partitions and also sorted within the partition. Now joining will be done effectively and the records are captured in the dataset with the “same” partition, which will maintain the records in 8 partitions. For 1 partition, 1 page will be created. So, for the entire job 32 pages will be created.

Capture.845

Suggestion:

This will be one of the best ways to design the job for huge volume of data. But, extracting the records in serial makes performance to go down. Multiple instances can be enabled in this job and can be called in parallel to attain parallelism.

Step 2: (Parallel extraction with proper partition)

This job is same like the previous job but extraction alone made in parallel. In extracting the records in the DB2 side, 64 partitions will be created.

Capture.846

Suggestion:

This will be one of the best ways to design the jobs for huge volume of data. But, the number of database connections been established is huge.

Step 1:

As the extraction is done in serial mode in both the DB2 stages, the allocated node will be 1 that is all the records will be in one partition and in all the remaining stages have the “same” partition, so the number of partition is retained as 1 in all the stages. So, the page created for this job will be 4.

Capture.847

Suggestion:

This job will work fine for the low records. But for the huge records, it will hang. Since the entire job is in serial mode.

Step 2:

Here the extraction is made parallel in both the DB2 stages. So, the allocated node will be 64 and in all the other stage the partition are “same” so for every stage the allocated node will be 64. So the page created for this job will be 256.

Capture.848

Suggestion:

For less or huge records this job will work very fast but it will occupy maximum space in the system. For this job wastage of memory (Paging Issue may come up) happens because 256 pages will be created and some times this kind of job will make the entire system to come down.

Step 3:

Unpredictable thing may happen here because it is all depend on the stages it will decide on its own since it is an auto partition.Capture.849

Suggestion

Never use auto partition in the job.

Step 4:

In this case performance wise no issues but will loss the data on the Join stage because on the Link2 it is coming as 8 nodes and on the other side Link3 in 64 nodes so DataStage cannot join the data accurately since Join Stage is depended on the master link. In this case Link2 is the master link so it will join only with the 8 node and it will leave the other 56 node which is coming from the Link3. Therefore data coming from that 56 node will not be considered.

Capture.850

Suggestion:

Never join the link without having same partitioning key on both sides otherwise data loss may occur and also sort the records before join.

For Indepth Knowledge on DataStage, click on below

Summary
Review Date
Reviewed Item
DataStage - Types of Partition
Author Rating
5

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Types of Partition"

Leave a Message

Your email address will not be published. Required fields are marked *

Site Disclaimer, Copyright © 2016 - All Rights Reserved.