Define Data Stage?
A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extracts data from databases and change them into data warehouses. It has become an essential part of IBM WebSphere Data Integration suite.
What are the main features of datastage?
DataStage has the following features to aid the design and processing required to build a data warehouse :
- Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
- Extracts data from any number or types of database.
- Handles all the metadata definitions required to define your data warehouse.
- You can view and modify the table definitions at any point during the design of your application.
- Aggregates data.
- You can modify SQL SELECT statements used to extract data.
- Transforms data. DataStage has a set of predefined transforms and functions. you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
- Loads the data warehouse.
What are Stage Variables, Derivations and Constants?
Stage Variable – An intermediate processing variable that retains value during read and doesn’t pass the value into target column.
Derivation – Expression that specifies value to be passed on to the target column.
Constant – Conditions that are either true or false that specifies flow of data with a link.
Types of views in Datastage Director?
There are 3 types of views in Datastage Director
a) Job View – Dates of Jobs Compiled.
b) Log View – Status of Job last run
c) Status View – Warning Messages, Event Messages, Program Generated Messages.
How do you execute datastage job from command line prompt?
Using “dsjob” command as follows.
dsjob -run -jobstatus projectname jobname
Functionality of Link Partitioner and Link Collector?
Link Partitioner: It actually splits data into various partitions or data flows using various partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data flow and loads to target.
What are the types of jobs available in datastage?
a. Server Job
b. Parallel Job
c. Sequencer Job
d. Container Job
What is the difference between Server Job and Parallel Jobs?
- Server jobs were doesn’t support the partitioning techniques but parallel jobs support the partition techniques.
- Server jobs are not support SMTP,MPP but parallel supports SMTP,MPP.
- Server jobs are running in single node but parallel jobs are running in multiple nodes.
- Server jobs prefer while getting source data is low but data is huge then prefer the parallel
What is a project?
Datastage Projects :- A Complete project may contain several jobs and user-defined components .
Project Contain Datastage jobs.
Built-in components . These are predefined components used in job.
User-defined components . These are customized components created using the Datastage manager . each user-defined component performs a specific task in a job .
All work done in project. Projects are created during and after installation process.you can add project on the Projects tab of Administrator.
A project is associated with a directory.The Project directory is used by DataStage to store jobs and other datastage objects and metedata.
What is sequencer?
Graphically create controlling job, without using the job control function.
what is a container?
A group of stages and link in a job design is called container.
There are two kinds of Containers:
Local and Shared.
Local Containers only exist within the single job they are used. Use Shared Containers to simplify complex job designs.
Shared Containers exist outside of any specific job. They are listed in the Shared Containers branch is Manager. These Shared Containers can be added to any job. Shared containers are frequently used to share a commonly used set of job components.
A Job Container contains two unique stages. The Container Input stage is used to pass data into the Container. The Container Output stage is used to pass data out of the Container.
what are mainframe jobs?
A Mainframe job is complied and run on the mainframe , Data Extracted by such jobs is then loaded into the datawarehouse .
what are parallel jobs?
These are compiled and run on the DataStage server in a similar way to server jobs , but support parallel processing on SMP,MPP and cluster systems
how do you use procedure in datastage job?
Use ODBC plug,pass one dummy colomn and give procedure name in SQL tab.
What is odbc stage?
A Stage that extracts data from or loads data into a database that implements the industry standard Open Database Connectivity API. Used to represent a data source , an aggregation step , or target data table ( Server Job Only )
What is hash file ? what are its types?
Hash file is just like indexed sequential file , this file internally indexed with a particular key value . There are two type of hash file Static Hash File and Dynamic Hash File .
What type of hash file is to be used in general in datastage jobs?
Static Hash File .
What is a stage variable?
In Datastage transformation , we can define some variable and define Value from source .
Learn the core features of DataStage and become master with our expertise tutorials.
What are constraints and derivations?
We can create constraints and derivations with datastage variable .
How do you reject records in a transformer?
Through datastage constraint we can reject record .
Why do you need stage variables?
That is depend upon job requirement , through stage variable we can file data.
What is the precedence of stage variables,derivations, and constraints?
stage variables =>constraints=> derivations
What are data elements?
A specification that describes the type of data in a column and how the data is converted .
What are routines ?
In Datastage routine is just like function , which we call in datastage job . there are In-Built routine and and also we can create routine .
What are transforms and what is the differenece between routines and transforms?
Transforms is used to manipulate data within datastage job .
What a datastage macro?
In datastage macro can be used in expressions , job control routines and before / after subroutines . The available macros are concerned with ascertaining job status .
What is job control?
A job control routine provides the means of controlling other jobs from the current job . A set of one or more jobs can be validated, run ,reset , stopped , scheduled in much the same way as the current job can be .
How many types of stage?
There are three basic type of stage
Built-in stages :- Supplied with DataStage and used for extracting , aggregating , transforming , or writing data . All type of job have these stage .
Plug-in stage :- Additional stages that can be installed in DataStage to perform specialized tasks that the built-in stages do not support. Server jobs and parallel jobs can make use of these .
Job Sequence Stages :– Special built-in stages which allow you to define sequences of activities to run. Only job sequencer have these
Define the difference between active and Passive Stage?
There are two kinds of stages:
Passive stages define read and write access to data sources and repositories.
Active stages define how data is filtered and transformed.
- Sort plug-in
What are the plugin stages used in your projects?
Plug-In Stage :- A Stage that performd specific processing that is not supported by the standard server job stage .
Used Plug-in – ORAOCI8, Orabulk.
Difference Between ORAOCI8 and Orabulk?
ORAOCI8 :- This Stage allow to connect Oracle Database .
OraBulk :- The Orabulk plug-in generates control and data files for bulk loading into a single table on an Oracle target database. The files are suitable for loading into the target database using the Oracle command sqlldr.
What is Sort plugin?
A mainframe processing stage that sorts input columns
What is Aggregate stage?
A stage type that compute s totals or other functions of set of data .
What is the hash file stage and Sequential file stage?
A stage that extracts data or load data into database that contain hashed file .
What types of flat files you used.have you used tab delimited?
Sequential flat file with comma separated .
What is the Job control code?
Job control code used in job control routine to creating controlling job, which invokes and run other jobs.
For Indepth Knowledge on DataStage, click on below
- Introduction to DataStage
- Scheduling Jobs in DataStage
- String Functions in DataStage
- Partitioning Technique in DataStage
- DataStage Interview Questions & Answers
- Surrogate in DataStage