Partition Technique in DataStage

Ratings:
(4.6)
Views:1452
Banner-Img
  • Share this blog:

Partitioning Technique With Performance Tuning

Partitioning is the process of dividing an input data set into multiple segments, or partitions. Each processing node in your system then performs an operation on an individual partition of the data set rather than on the entire data set.

 

Data Stage basically allows 2 types of partitioning

How to tune the jobs using the partitioning Technique?

 

Key-based Technique

  • Hash
  • Modulus
  • Range
  • DB/2
Want to acquire industry skills and gain complete knowledge of Datastage? Enroll in Instructor-Led live Datastage Training to become Job Ready!

Hash

Rows with the same key column (or multiple columns) go to the same partition. Hash is very often used and sometimes improves performance, however it is important to have in mind that hash partitioning does not guarantee load balance and misuse may lead to skew data and poor performance.

Modulus

Data is partitioned on one specified numeric field by calculating modulus against the number of partitions. Not used very often.

Range

An expensive refinement to hash partitioning. It is similar to hash but partition mapping is user-determined and partitions are ordered. Rows are distributed according to the values in one or more key fields, using a range map (the 'Write Range Map' stage needs to be used to create it). Range partitioning requires processing the data twice which makes it hard to find a reason for using it.

 

Keyless Techniques

  • Same
  • Entire
  • Round Robin
  • Random

Entire

All rows from a dataset are distributed to each partition. Duplicated rows are stored and the data volume is significantly increased.

Same

Existing partitioning remains unchanged. No data is moved between nodes.

Round robin

Rows are alternated evenly across partitions. This partitioning method guarantees an exact load balance (the same number of rows processed) between nodes and is very fast.

Random

Rows are randomly distributed across partitions

All key-based stages, by default, are associated with Hash as a Key-based Technique.

Hash Technique

Hash Partitioning is one of the most popular and frequently used techniques in the Data Stage. Under this part, we send data with the Same Key Colum to the same partition.

Example

Partitioning Technique in DataStage

Same Key Column Values are Given to the Same Node.

Hash partitioning Technique can be Selected into 2 cases

    • If Key Column > 1
    • If key column = 1, other than Integer
Check out our tutorial on the Datastage!

 Why Modulus?

  • Modules are having good performance when compared to the hash.
  • In modules, it distributes the data by calculating MOC Value

MOD

Note: Modules is Having Good performance than Hash

MOD is selected, when it has only 1 key column, and it is an integer.

Join 11 Dept no, E no

We use Hash,  As key column > 1

That is dept no, E no (we have 2 key column values)

Partitioning Technique in DataStage

because, we want the same distribution from joining to Aggregator, and key column value is also the same, so we use SAM 11 SAME is Key less

Partitioning Technique

We cant use SAME, here, as join has 2 key column values D no, loc, if we use SAME, We don’t Know that what data is getting from joining to Aggregator.

Modulus à because of only 1 key column (D NO) and Integer.

Partitioning Technique in DataStage

For an in-depth understanding of DataStage click on

About Author
Authorlogo
Name
TekSlate
Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.