TeraData Interview Questions and Answers
How do you define Teradata?
Teradata is A RDBMS(Relational Database Management system) with parallel processing. The parallel processing helps Teradata to rule the data warehousing industry. In Teradata, we have index instead of key based on which the records are distributed among the AMP's. Teradata has utilities such as Bteq, Fastload, Fastexport, MLoad, TPump, TPT which are used to export and load data to/from database to files. Because of its parallel architecture, the data distribution and retrieval is faster than any other database management system.
What is Teradata aster?
Aster Analytics was a product from Aster Data Systems, a company founded in 2005 and acquired by Teradata in 2011.
Aster Analytics has three engines - SQL, Graph, and MapReduce - and focuses on analytics as opposed to OLTP. Teradata renamed the offering as Teradata Aster after the acquisition.
Teradata acquired Aster Data Systems to buy their way into the NoSQL section of the Big Data world. Teradata has always been doing big data, but traditionally with SQL RDBMS-based OLTP and OLAP systems
Features of Teradata?
- It provides a SQL interface for transferring data between Teradata Database and remote Hadoop hosts.
- Import Hadoop data into a temporary or permanent Teradata table.
- Export data from temporary or permanent Teradata tables into existing Hadoop tables.
- Create or drop tables in Hadoop from Teradata Database.
- Reference tables on the remote hosts in SELECT and INSERT statements.
- Select Hadoop data for use with a business tool.
- Select and join Hadoop data with data from independent data warehouses for analytical use.
What is Best To USE Tera Data As a Database System ?
For Decision support systems. Let me give me some examples and experiences. I had chance to rewrite a 2 Jobs, which used lot of joins and final table is closed to 130 million, with Oracle and SAS code this jobs used to run for around 18 hours and 12 hours respectively. Rewrite of this job in Teradata made the job to run less 30 minutes and 15 minutes respectively.This is just show power of Teradata when you have large volumes of data. I have been also part of team which migrated from Oracle to Teradata. Everyone in the company is very happy with processing power of Teradata. Teradata is very good for OLAP but may not be that beneficial for OLTP especially due to its cost and architecture.
Explain a few Components of Teradata ?
Teradata is made up of following components –
Processor Chip – The processor is the BRAIN of the Teradata system. It is responsible for all the processing done by the system. All task are done according to the direction of the processor.
Memory – The memory is known as the HAND of the Teradata system. Data is retrieved from the hard drives into memory, where processor manipulates, change or alter the data. Once changes are made in memory, the processor directs the information back to the hard drive for storage.
Hard Drives – This is known as the SPINE of the Teradata system. All the data of the Teradata system is stored in the hard drives. Size of hard drives reflects the size of the Teradata system.
In Teradata, how do we Generate Sequence?
In Teradata, we Generate Sequence by making use of Identity Column
During the Display time, how is the sequence generated by Teradata?
All you have to do is use CSUM.
A certain load is being imposed on the table and that too, every hour. The traffic in the morning is relatively low, and that of the night is very high. As per this situation, which is the most advisable utility and how is that utility supposed to be loaded?
The most suggestible utility here has to be Tpump. By making use of packet size decreasing or increasing, the traffic can be easily handled.
If Fast Load Script fails and only the error tables are made available to you, then how will you restart?
There are basically two ways of restarting in this case.
- Making the old file to run – Make sure that you do not completely drop the error tables. Instead, try to rectify the errors that are present in the script or the file and then execute again.
- Running a new file – In this process, the script is executed simply using end loading and beginning statements. This will help in removing the lock that has been put up on the target table and might also remove the given record from the fast-log table. Once this is done, you are free to run the whole script once again.
Mention a few of the ETL tools that come under Teradata.
Some of the ETL tools which are commonly used in Teradata are DataStage, Informatica, SSIS, etc.
At TekSlate, we offer resources that help you in learning various IT courses. We avail both written
material and demo video tutorials. To gain in-depth knowledge and be on par with practical
experience, then explore Teradata Training Videos.
Highlight a few of the advantages that ETL tools have over TD.
Some of the advantages that ETL tools have over TD are: –
- Multiple heterogeneous destinations, as well as sources can be operated.
- Debugging process is much easier with the help of ETL tools owing to full-fledged GUI support.
- Components of ETL tools can be easily reused, and as a result, if there is an update to the main server, then all the corresponding applications connected to the server are updated automatically.
- De-pivoting and pivoting can be easily done using ETL tools.
What is the meaning of Caching in Teradata?
Caching is considered as an added advantage of using Teradata as it primarily works with the source which stays in the same order i.e. does not change on a frequent basis. At times, Cache is usually shared amongst applications.
How can we check the version of Teradata that we are using currently?
Just give the command .SHOW VERSION.
Give a justifiable reason why Multi-load supports NUSI instead of USI.
The index sub-table row happens to be on the same Amp in the same way as the data row in NUSI. Thus, each Amp is operated separately and in a parallel manner.
How is MLOAD Client System restarted after execution?
The script has to be submitted manually so that it can easily load the data from the checkpoint that comes last.
How is MLOAD Teradata Server restarted after execution?
The process is basically carried out from the last known checkpoint, and once the data has been carried out after execution of MLOAD script, the server is restarted.
What is meant by a node?
A node basically is termed as an assortment of components of hardware and software. Usually a server is referred to as a node.
Let us say there is a file that consists of 100 records out of which we need to skip the first and the last 20 records. What will the code snippet?
We need to use BTEQ Utility in order to do this task. Skip 20, as well as Repeat 60 will be used in the script.
PDE basically stands for Parallel Data Extension. PDE basically happens to be an interface layer of software present above the operation system and gives the database a chance to operate in a parallel milieu.
What is TPD?
TPD basically stands for Trusted Parallel Database, and it basically works under PDE. Teradata happens to be a database that primarily works under PDE. This is the reason why Teradata is usually referred to as Trusted Parallel or Pure Parallel database.
What is meant by a Channel Driver?
A channel driver is software that acts as a medium of communication between PEs and all the applications that are running on channels which are attached to the clients.
What is meant by Teradata Gateway?
Just like channel driver, Teradata Gateway acts as a medium of communication between the Parse Engine and applications that are attached to network clients. Only one Gateway is assigned per node.
What is meant by a Virtual Disk?
Virtual Disk is basically a compilation of a whole array of cylinders which are physical disks. It is sometimes referred to as disk Array.
Explain the meaning of Amp?
Amp basically stands for Access Module Processor and happens to be a processor working virtually and is basically used for managing a single portion of the database. This particular portion of database cannot be shared by any other Amp. Thus, this form of architecture is commonly referred to as shared-nothing architecture.
What does Amp contain and what are all the operations that it performs?
Amp basically consists of a Database Manager Subsystem and is capable of performing the operations mentioned below.
- Performing DML
- Performing DDL
- Implementing Aggregations and Joins.
- Releasing and applying locks, etc.
What is meant by a Parsing Engine?
PE happens to be a kind Vproc. Its primary function is to take SQL requests and deliver responses in SQL. It consists of a wide array of software components that are used to break SQL into various steps and then send those steps to AMPs.
What do you mean by parsing?
Parsing is a process concerned with analysis of symbols of string that are either in computer language or in natural language.
What are the functions of a Parser?
A Parser: –
- Checks semantics errors
- Checks syntactical errors
- Checks object existence
What is meant by a dispatcher?
Dispatcher takes a whole collection of requests and then keeps them stored in a queue. The same queue is being kept throughout the process in order to deliver multiple sets of responses.
How many sessions of MAX is PE capable of handling at a particular time?
PE can handle a total of 120 sessions at a particular point of time.
BYNET basically serves as a medium of communication between the components. It is primarily responsible for sending messages and also responsible for performing merging, as well as sorting operations.
What is meant by a Clique?
A Clique is basically known to be an assortment of nodes that is being shared amongst common disk drives. Presence of Clique is immensely important since it helps in avoiding node failures.
Interested in mastering Teradata? Learn more about Teradata Tutorials in this blog post.
What happens when a node suffers a downfall?
Whenever there is a downfall in the performance level of a node, all the corresponding Vprocs immediately migrate to a new node from the fail node in order to get all the data back from common drives.
List out all forms of LOCKS that are available in Teradata.
There are basically four types of LOCKS that fall under Teradata. These are: –
- Read Lock
- Access Lock
- Exclusive Lock
- Write Lock
What is the particular designated level at which a LOCK is liable to be applied in Teradata?
- Table Level – All the rows that are present inside a table will certainly be locked.
- Database Level Lock – All the objects that are present inside the database will be locked.
- Row Hash Level Lock – Only those rows will be locked which are corresponding to the particular row.
In the Primary Index, what is the score of AMPs that are actively involved?
Only one AMP is actively involved in a Primary Index.
In Teradata, what is the significance of UPSERT command?
UPSERT basically stands for Update Else Insert. This option is available only in Teradata.
Highlight the advantages of PPI(Partition Primary Index).
PPI is basically used for Range-based or Category-based data storage purposes. When it comes to Range queries, there is no need of Full table scan utilization as it straightaway moves to the consequent partition thus skipping all the other partitions.
Teradata data type: NUMBER vs. INT/SMALLINT/BYTEINT. Which is better?
As stated in question, fixed-length data types are used historically, and lot of enhancements are done to extend support of variable-length data types (like multi-value compression for VARCHAR in Teradata 13.10). But NUMBER data type will really be useful only as a replacement to DECIMAL and FLOAT, not for INTEGER, as different approaches to calculation are used.
As for joins, I wasn't able to find anything about how hashing is done for NUMBER data type (and thus, how indexes are distributed across AMPs). But I made some experiments, and here's what I've found:
All different NUMBER types are hashed to the same value as long as there is no need to rounding. This makes sense, without rounding, mantissa and exponent [1 p.131] will be the same and internal representation of number will be the same.
For the same value, NUMBER data type will have hash different from INTEGER, FLOAT and DECIMAL.
Same value casted to NUMBER with different precision will have different hashes (due to rounding).
Answering you specific examples, for cases, when Table1 and Table2 have following primary indexes:
A: Table1 has index on NUMBER(10) and Table2 on NUMBER(22).
B: Table1 has index on NUMBER(8) and Table2 on INTEGER.
C: Table1 has index on NUMBER and Table2 on NUMBER(15,2).
Hash join (no data redistribution) will occur only for cases A and C.
What is the need of collect statistics in teradata? Can anyone explain with suitable examples? Also, How can we choose the collect statistics columns?
Collect stats is an important concept in teradata, collect stats gives PE to come up with a plan with least cost for an requested query. Collect stats defines the confidence level of PE in estimating "how many rows it is goin to access ? how many unique values does a table have , null values etc and all this info is stored in data dictionary. Once you submit a query in TD the parsing engine checks if the stats are aviable for the requested table , if it has collected stats earlier PE generates a plan with "high confidence" . in absence of collect stats plan will ne with "low confidence" . however teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table,coulmn,indexes PE does an "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has , based upon this PE ( knows datademographics & available system componets) estimates the workload and generates a plan.
Difference between Database and user
Both may own objects such as tables, views, macros, procedures, and functions. Both users and databases may hold privileges. However, only users may log on, establish a session with the Teradata Database, and submit requests.
A user performs actions where as a database is passive. Users have passwords and startup strings; databases do not. Users can log on to the Teradata Database, establish sessions, and submit SQL statements; databases cannot.
Creator privileges are associated only with a user because only a user can log on and submit a CREATE statement. Implicit privileges are associated with either a database or a user because each can hold an object and an object is owned by the named space in which it resides
DIFF BETWEEN PRIMARY KEY AND PRIMARY INDEX
If you are absolutely looking for the differences then below are few -
PRIMARY KEY PRIMARY INDEX
- It cannot be NULL It can be NULL
- It is not mandatory in Teradata It is mandatory in Teradata
- It does not help in data distribution It helps in data distribution
- It has to be unique It can be unique or non-unique
- It is logical implementation It is physical implementation
- It cannot be changed It can be changed.
What is spool space in Teradata? explain with examples.
We have a rule i.e if a query takes more than one terabyte of spool we are supposed to abort it.
My question is lets say the total spool is used by a query , what is the expected behavior of the system , will the system restart or what can happen?
Next question is related to the 1st line ; If we have around 10 terabyte of spool , is this logical to abort the query that has just crossed 1tb of spool. i thnk we should allow it more spool that can be up to 9tb or so if there are no other sessions. Please provide your analysis on the above cases ,
Explain Need For Performance Tuning.
Performance tuning in Teradata fundamentally done to distinguish every one of the bottlenecks and afterward settle them.
Explain whether bottleneck is an Error or not.
Actually, bottleneck isn't a type of Error, yet it surely causes a specific measure of deferral in the framework..
Best Ways to detect Teradata Skewing
Teradata skewing can be considered one of the worst problems on any Teradata system. A high skew factor means in effect, the parallelism of the system is degraded leading to:
Poor CPU parallel efficiency on full table scans and bulk inserts. The AMP holding the most records of the many values will be the bottleneck, forcing all other AMPs to wait.
Increased IO for updates and inserts of biased values, considering the extra workload for the AMP with a high number of multiple rows for the same NUPI value.
The cause of Teradata skewing hides in many places. We will show you with this article 3 ways to discover your Teradata skewing problems.
For Indepth knowledge on Teradata click on: