Ab Initio Components
There are several components in Abinitio to build the graph’s, It is divided in two sets:
-Dataset Components:-Components Which holds data
-Program Components:-Components which process data
Input file :
Input File represents records read as input to a graph from one or more serial files or from a multi file.
Input Table unloads records from a database into a graph, allowing you to specify as the source either a database table or an SQL statement that selects records from one or more tables.
Output File represents records written as output from a graph into one or more serial files or a multifile.
The output file can be created in write or append mode or permission for the other user can be controlled .
When the target of an Output File component is a particular file (such as /dev/null, NUL, a named pipe, or some other special file), the Co>Operating System never deletes and re-creates that file, nor does it ever truncate it.
Desired to gain proficiency on Ab initio? Explore the blog post on Ab initio Training Online to become a pro in Ab initio.
Output Table loads records from a graph into a database, letting you specify the destination either directly as a single database table, or through an SQL Statement that inserts records into one or more tables.
Sort components: Sort component reorders data . You can use Sort to order records before you send them to a component that requires grouped or sorted records. It comprises two parameters
Key : Key is one of the parameters for sort components which describes the collation order.(key_specifier, required)
Name(s) of the key field(s) and the sequesnce specifier(s) you want the component to use when it orders records.
Max-core:Max-core parameter controls how often the sort component dump data from memory to disk.
Maximum memory usage is in bytes.
Default is 100663296 (100 MB).
When the component reaches the number of bytes specified in the max- core parameter, it sorts the records it has read and writes a temporary file to disk.
Reformat changes the record format of data records by dropping fields, or by using DML expressions to add fields, combine fields, or transform the data in the records. By default reformat has got one output port but increment value of count parameter number. But for that two different transform functions has to be written for each output port.
If any selection from input ports is required the select parameter can be used instead of using ‘Filter by expression’ component before reformat
-Reads record from input port
-Record passes as argument to transform function or xfr
-Records written to out ports, if the function returns a success status
-Records written to reject ports, if the function returns a failure status
Parameters of Reformat Component
-Transform (Xfr) Function
-Use Limit & Ramp
Join reads the records from multiple ports, operates on the records with matching keys using a multi input transform function and writes the result into output ports.
Join deals with two activities.
-Transforming data sources with different record format.
-Combining data sources with the same record format.
Check out the top Ab initio Interview Questions now!
Filter by Expression components:
Filter by Expression filters records according to a specified DML expression.
Basically it can be compared with the where clause of sql select statement.
Different functions can be used in the select expression of the filter by expression component even lookup can also be used.
-Reads data records from the in port.
-Applies the expression in the select_expr parameter to each record. If the expression returns:
-Non-0 value — Filter by Expression writes the record to the out port.
-0 — Filter by Expression writes the record to the deselect port. If you do not connect a flow to the deselect port, Filter by Expression discards the records.
-NULL — Filter by Expression writes the record to the reject port and a descriptive error message to the error port.
Normalize generates multiple output records from each of its input records. You can directly specify the number of output records for each input record, or the number of output records can depend on some calculation.
Reads the input record.
-If you have not defined input_select, Normalize processes all records.
-If you have defined input_select, the input records are filtered as follows:
Performs iterations of the normalize transform function for each input record.
Performs temporary initialization.
Sends the output record to the out port.
Denormalize Sorted consolidates groups of related records by key into a single output record with a vector field for each group, and optionally computes summary fields in the output record for each group. Denormalize Sorted requires grouped input.
For example, if you have a record for each person that includes the households to which that person belongs, Denormalize Sorted can consolidate those records into a record for each household that contains a variable number of people.
Multistage component are nothing but the transform component where the records are transformed into five stages like input selection , temporary records initialization , processing , finalization and output selection.
Examples of multistage components are aggregate, rollup, scan
-Rollup: Rollup evaluates a group of input records that have the same key, and then generates records that either summarize each group or select certain information from each group.
-Aggregate: Aggregate generates records that summarize groups of records. In general, use ROLLUP for new development rather than Aggregate. Rollup gives you more control over record selection, grouping, and aggregation. However, use Aggregate when you want to return the single record that has a field containing either the maximum or the minimum value of all the records in the group.
-Scan: For every input record, Scan generates an output record that includes a running cumulative summary for the group the input record belongs to. For example, the output records might include successive year-to-date totals for groups of records. Scan can be used in continuous graphs
Partition components are used to divide data sets into multiple sets for further processing.
There are several components are available as follows.
-Partition by Round-robin
-Partition by Key
-Partition by Expression
-Partition by Range
-Partition by Percentage
-Partition by Load Balance
It can read data from multiple flows or operations and are used to recombine data records from different flows. Departitioning combines many flows of data to produce one flow. It is the opposite of partitioning. Each departition component combines flows in a different manner.
There are several de partition components are available as follows.
For an Indepth knowledge on Ab Initio, click on below