Netezza Interview Questions and Answers
Q1) Explain FPGA and how is it useful for query performance?
- FPGA: Field Programmable Gate Array (FPGA) is located on each SPU. Netezza is different from other architectures. Netezza can do a “hardware upgrade” through software by using FPGA. Hardware is reconfigured during install.
- While reading data from disk, FPGA on each SPU also helps in ‘filtering’ unnecessary data before getting loaded into memory on each SPU. This way, FPGA does not overwhelm with all the data from the disk.
Q2) What is a zone map?
Ans: The zone map in Netezza is similar (concept wise) to partitions in Oracle. Netezza maintains a map for data so that it does rely on a zone map to pull only the range it is interested in. For example, if we need to pull out data from Jan 2009 till June 2009 from a table that is distributed on the date column, the zone map helps us to achieve this. The zone map is maintained by Netezza automagically, no user intervention is needed. Zone mapping is done at a block (extent) level. Netezza has zone maps for all columns (not just distributed column) and includes information such as minimum, the maximum, total number of records.
Q3) What are different ways to load?
- External tables
- Create table AS (aka, CTAS).
- Inserts (Eeeewee!!)
Q4) Does everything gets cached in Netezza (or any other data appliance)?
Ans: Typically only schema and other database objects are cached in appliances. Data is not cached, in general. In most cases, data is not saved anywhere (in any cache or on the host computer) and is streamed directly from SPU to client software.
Q5) How is load achieved in Netezza and why is that quick/fast?
Ans: Loads bypass a few steps that typically a query would go through (a query goes through plan generation, optimization, and transaction management). Loads are done in terms of “sets” and this set is based on underlying table structure (thus loads for two different tables are different as their sets are based on table structures). Data is processed to check the format and distribution of records calculated very quickly (in one step), fills into the ‘set’ structure, and writes to the storage structure. Storage also performs space availability and other admin tasks, all these operations go pretty quick (think of them as UNIX named pipes that streams data and SPU stores these records).
Q6) What databases are we able to configure pushdown optimization?
- IBM DB2
- Microsoft SQL Server
- Sybase ASE
- Databases that use ODBC drivers
Q7) When are we likely to receive incorrect (aggregate) results?
Ans: Very rarely a driver may return aggregated results that are still getting processed back to the client. In this case, the client may assume that the calculation is complete, instead of updating with the latest or final results. Obviously, the driver has to wait for Netezza to complete operation on the host computer, before delivering results.
Q8) What is real-time processing?
Ans: Data sources such as JMS, WebSphere MQ, TIBCO, webMethods, MSMQ, SQP, and web services can publish data in real-time. These real-time sources can be leveraged by Informatica Power Centre to process data on-demand. A session can be specifically configured for real-time processing.
Q9) Explain how data gets stored in Netezza and how does SPU failover take place?
Ans: Data is stored based on a selected field(s) that are used for distribution.
==Data (A)==> Hash Function (B) ==> Logical SPU identifier list (C) ==> Physical SPU list (D) ==> Storage (E)
When data arrives, it is hashed based on the field(s) and a hash function (B) is used for this purpose. For example, for a hypothetical 32 node system, the logical SPU identifier list has 32 unique entries. If there are 1000 hashed data items from (B), there are 1000 entries in (C), all having only 32 SPU entries (several data items go to the same SPU, thus multiple (B) entries map to the same (C)). For instance, (C) has values [3,19,30,7,20,25,11,3,22,19….]. This way, 1000 data entries are mapped. (D) has a physical IP address of both primary and failover SPU. If there is a failover, this is the only place where Netezza needs to update its entries. The same goes for a system that has a new SPU added. It is a little complicated, in principle, this is the concept.
Q10) What is a real-time processing terminating condition?
Ans: A real-time processing terminating condition determines when the Integration Service stops reading messages from a real-time source and ends the session.
Q11) What are 4 environment variables that are required. What are the different states on Netezza?
Ans: Environment variables: NZ_HOST, NZ_DATABASE, NZ_USER, and NZ_PASSWORD
- Online: Normal or usual state.
- Stopped: Netezza will shutdown after completing current queries, no new queries are allowed.
- Offline: Waits for completion of current queries, new or queries in queue receive an error.
- Paused: Same as above, but no error displayed. Typically caused during Netezza boot or startup.
- Down: Just plain down could be due to the Netezza server problem or user-initiated.
Q12) What is a pipeline partition and how does provide a session with higher performance?
Ans: Within a mapping, a session can break apart different source qualifier to target pipelines into their own reader/transformation/writer thread(s). This allows the Integration Service to run the partition in parallel with other pipeline partitions in the same mapping. The parallelism creates a higher-performing session.
Q13) What is the maximum number of partitions that can be defined in a single pipeline?
Ans: You can define up to 64 partitions at any partition point in a pipeline.
Q14) What is a dynamic session partition?
Ans: A dynamic session partition is where the Integration Service scales the number of session partitions at runtime. The number of partitions is based on several factors including the number of nodes in a grid or source database partitions.
Q14) List three dynamic partitioning configurations that cause a session to run with one partition?
- Set dynamic partitioning to the number of nodes in the grid, and the session does not run on a grid.
- Create a user-defined SQL statement or a user-defined source filter.
- Use dynamic partitioning with an Application Source Qualifier.
Q15) List the different types of pushdown optimization that can be configured?
- Source-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the source database.
- Target-side pushdown optimization – The Integration Service pushes as much transformation logic as possible to the target database.
- Full pushdown optimization – The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization
Q16) What happens to records that are loaded during the nzload process, but were not committed?
Ans: They are logically deleted and administrators can run nzreclaim, we may also truncate the table.
Q17) What is a control task?
Ans: A control task is used to alter the normal processing of a workflow by stopping, aborting, or failing a workflow or work.
Q18) Can a group become a member of another group in Netezza user administration. Can we use the same group name for databases?
Ans: In Netezza, a public group is created automatically and everyone is a member of this group by default. We can create as many groups and any user can be a member of any group(s). Group can not be a member of another group. Group names, user names, and database names are unique. That is, we can not have a database called sales and a group also called sales.
Q19) How can we give global permission to user joe so that he can create a table in any database?
Ans: Login into system database and give that permission to the user by saying “grant create a table for joe;”
Q20) What permission will you give to connect to a database?
Ans: List. Grand list, select on the table to the public (if logged into sales database, this allows all users to query tables in sales database).
Q21) Do we need to drop all tables and objects in the database, before dropping a database?
Ans: No, drop database will take care of it.
Q22) What constraints on a table are enforced?
Ans: Not null and default. Netezza does not apply PK and FK.
Q23) Why NOT NULL specification is better in Netezza?
Ans: Specifying not null results in better performance as NULL values are tracked at row header level. Having NULL values results in storing references to NULL values in the header. If all columns are NOT NULL, then there is no record header.
Q24) Create Table AS (CTAS), does it distribute data randomly or based on the table on which it received data?
Ans: Response: Newly created table from CTAS gets distribution from the original table.
Q25) Why do you prefer to truncate instead of drop table command?
Ans: Just empties data from the table, keeping table structure, and permission intact.
Q26) When no distribution clause is used while creating a table, what distribution is used by Netezza?
Ans: First column (same as in Teradata).
Q27) Can we update all columns in a Netezza table?
Ans: No, the column that is used in the distribution clause cannot be used for updates. Remember, up to four columns can be used for the distribution of data on SPU. From a practical sense, updating distribution columns result in a redistribution of data; the single most performance hit when a large table is involved. This restriction makes sense.
Q29) What is data slice and SPU?
Ans: For me, they are the same! Of course, this answer is not an accurate reply in your interview(s).
Q30) What data type works best for zone maps?
Ans: Zone maps work best for integer data types.
Q31) What feature in Netezza you do not like?
Ans: Of course, a large list, especially when compared to Oracle. PK and FK enforcement is a big drawback though this is typically enforced at ETL or ELT process [ELT: Extract, Transform, and Load. Note that ‘Transform’ and ‘Load’ can happen within Netezza].
Q32) List three real-time processing terminating conditions?
- Idle time – Time Integration Service waits to receive messages before it stops reading from the source.
- Message count – Number of messages the Integration Service reads from a real-time source before it stops reading from the source.
- Reader time limit – Amount of time in seconds that the Integration Service reads source messages from the real-time source before it stops reading from the source.
Q33)What is real-time processing message recovery?
Ans: Real-time processing message recovery allows the Integration Service to recover unprocessed messages from a failed session. Recovery files, tables, queues, or topics are used to recover the source messages or IDs. Recovery mode can be used to recover these unprocessed messaged.
Q34) Does Netezza support the concurrent updates of the same record?
Ans: In case of conflict in which the same record is set for modification, Netezza rolls back a recent transaction that is attempted on the same record, in fact, the same table. This is generally acceptable in DW environments. Netezza does support serialization transactions and does not permit dirty reads.