Ab Initio Interview Questions & Answers

We know rollup component in Abinitio is used to summarize group of data record then why do we use aggregation?

Aggregation and Rollup, both are used to summarize the data.

Rollup is much better and convenient to use.

Rollup can perform some additional functionality, like input filtering and output filtering of records.

Aggregate does not display the intermediate results in main memory, where as Rollup can.

Analyzing a particular summarization is much simpler compared to Aggregations.

Mention what is Abinitio?

 “Abinitio” is a latin word meaning “from the beginning.” Abinitio is a tool used to extract, transform and load data. It is also used for data analysis, data manipulation, batch processing, and graphical user interface based parallel processing.

What are the operations that support avoiding duplicate record?

Duplicate records can be avoided by using the following:

Using Dedup sort

Performing aggregation

Utilizing the Rollup component

Mention what is Rollup Component?

Roll-up component enables the users to group the records on certain field values. It is a multiple stage function and consists initialize 2 and Rollup 3.

What kind of layouts does Abinitio support?

Abinitio supports serial and parallel layouts.

A graph layout supports both serial and parallel layouts at a time.

The parallel layout depends on the degree of the data parallelism

A multi-file system is a 4-way parallel system

A component in a graph system can run 4-way parallel system.

Explain what is the architecture of Abinitio?

Architecture of Abinitio includes

GDE (Graphical Development Environment)

Co-operating System

Enterprise meta-environment (EME)


What is MAX CORE of a component?

MAX CORE is the space consumed by a component that is used for calculations

Each component has different MAX COREs

Component performances will be influenced by the MAX CORE’s contribution

The process may slow down / fasten if a wrong MAX CORE is set

Explain what is de-partition in Abinitio?

De-partition is done in order to read data from multiple flow or operations and are used to re-join data records from different flows. There are several de-partition components available which includes Gather, Merge, Interleave, and Concatenation.

How do you add default rules in transformer?

The following is the process to add default rules in transformer

Double click on the transform parameter in the parameter tab page in component properties

Click on Edit menu in Transform editor

Select Add Default Rules from the dropdown list box.

It shows Match Names and Wildcard options. Select either of them.

Mention what is the role of Co-operating system in Abinitio?

The Abinitio co-operating system provide features like

Manage and run Abinitio graph and control the ETL processes

Provide Abinitio extensions to the operating system

ETL processes monitoring and debugging

Meta-data management and interaction with the EME

State the first_defined function with an example.

This function is similar to the function NVL() in Oracle database

It performs the first values which are not null among other values available in the function and assigns to the variable

Example: A set of variables, say v1,v2,v3,v4,v5,v6 are assigned with NULL.
Another variable num is assigned with value 340 (num=340)
num = first_defined(NULL, v1,v2,v3,v4,v5,v6,NUM)
The result of num is 340

Explain what is SANDBOX?

A SANDBOX is referred for the collection of graphs and related files that are saved in a single directory tree and behaves as a group for the purposes of navigation, version control, and migration.

How to run a graph infinitely?

To run a graph infinitely…

The .ksh graph file should be called by the end script in the graph.

If the graph name is abc.mp then the graph should call the abc.ksh file.

Explain what does dependency analysis mean in Abinitio?

In Abinitio, dependency analysis is a process through which the EME examines a project entirely and traces how data is transferred and transformed- from component-to-component, field-by-field, within and between graphs.

Explain PDL with an example?

To make a graph behave dynamically, PDL is used

Suppose there is a need to have a dynamic field that is to be added to a predefined DML while executing the graph

Then a graph level parameter can be defined

Utilize this parameter while embedding the DML in output port.

For Example : define a parameter named myfield with a value “string(“ | “”) name;”

Use ${mystring} at the time of embedding the dml in out port.

Use $substitution as an interpretation option

Mention what dedup-component and replicate component does?

Dedup component: It is used to remove duplicate records

Replicate component: It combines the data records from the inputs into one flow and writes a copy of that flow to each of its output ports

What is a local lookup?

Local lookup file has records which can be placed in main memory

They use transform function for retrieving records much faster than retrieving from the disk.

Mention how can you connect EME to Abinitio Server?

To connect with Abinitio Server, there are several ways like


Login to EME web interface- http://serverhost:[serverport]/abinitio

Through GDE, you can connect to EME data-store

Through air-command

Describe the Evaluation of Parameters order.

Following is the order of evaluation:

Host setup script will be executed first

All Common parameters, that is, included , are evaluated

All Sandbox parameters are evaluated

The project script – project-start.ksh is executed

All form parameters are evaluated

Graph parameters are evaluated

The Start Script of graph is executed

Explain what is Sort Component in Abinitio?

The Sort Component in Abinitio re-orders the data. It comprises of two parameters “Key” and “Max-core”.

Key: It is one of the parameters for sort component which determines the collation order

Max-core: This parameter controls how often the sort component dumps data from memory to disk

What is a ramp limit?

A limit is an integer parameter which represents a number of reject events

Ramp parameter contain a real number representing a rate of reject events of certain processed records

The formula is – No. of bad records allowed = limit + no. of records x ramp

A ramp is a percentage value from 0 to 1.

These two provides the threshold value of bad records.

Mention what information does a .dbc file extension provides to connect to the database?

The .dbc extension provides the GDE with the information to connect with the database are

Name and version number of the data-base to which you want to connect

Name of the computer on which the data-base instance or server to which you want to connect runs, or on which the database remote access software is installed

Name of the server, database instance or provider to which you want to link

Explain the methods to improve performance of a graph?

The following are the ways to improve the performance of a graph :

Make sure that a limited number of components are used in a particular phase

Implement the usage of optimum value of max core values for the purpose of sorting and joining components.

Utilize the minimum number of sort components

Utilize the minimum number of sorted join components and replace them by in-memory join / hash join, if needed and possible

Restrict only the needed fields in sort, reformat, join components

Utilize phasing or flow buffers when merged or sorted joins

Use sorted join, when two inputs are huge, otherwise use hash join

