Data Warehouse Interview Questions and Answers
Q1: Define Data Warehousing?
A: A Data Warehousing is the repository of data and it is used for the Management decision support system. It consists of a wide variety of data that has a high-level of business conditions at a single point in time
Q2: What are the stages of data warehousing?
A: There are four stages of Data warehousing. They are as follows:
Offline Operational Database
Offline Data Warehouse
Real-Time Data Warehouse
Integrated Data Warehouse
Q3: List the functions of data warehouse tools and utilities?
A: The various functions performed by the Data warehouse tool and utilities are as follows:
Data loading and refreshing
Q4: Define Data mining?
A: Data mining is a process of analyzing data from a different perspective and converting it into brief useful information. In simple terms, it is a process of extracting data from large data sets.
Q5: Define Real-time data warehousing?
A: Real-time Data warehousing describes a system that reflects the condition of the warehouse in real-time. It has low latency data and also provides the current (real-time) data.
Q6: What is the Dimension table?
A: A Dimension table is a table in a star schema of a data warehouse. The Dimensional data model consists of fact and dimensional tables that are used to build Data Warehouses. These Dimensions tables are used to describe dimensions that contain dimension keys, values, and attributes.
Q7: What is the Fact table?
A: A fact table is a central table in a star schema of a data warehouse. It consists of the measurements, metrics, or facts of the business process. Each record in this fact table is uniquely defined by a product, day, and a store of a business.
Q8: What are Aggregate tables?
A: Aggregate table is a table that contains the existing data which has been grouped to a certain level of dimensions. It is easy to retrieve data from the aggregate tables than the original tables that have more records. It also reduces the load in the database server and increases the performance of the query.
Q9: Differentiate between OLAP and OLTP?
Q10: What is Data Mart?
A: A Datamart is a condensed version of a Data warehouse and is designed for a specific department, unit, or a set of users in an organization eg: Marketing, Sales, HR, etc. It usually draws data from a few sources when compared to the data warehouse and also more flexible.
Q11: List the schema that a data warehouse implements?
A: The schema that a data warehouse implements are as follows:
- Star schema
- Snowflake schema
- Galaxy schema
- Fact constellation schema
Q12: Define Star Schema?
A: Star schema is a type of organizing in the tables in such a way that results can be retrieved from the database quickly in the data warehouse environment. It is used to develop the data warehouse and dimensional data mart.
Q13: What is Snowflake Schema?
A: Snowflake Schema is an extension of a star schema and it adds additional dimensions. It is termed a snowflake because its diagram looks like a snowflake. In this schema, dimensions are present in a normalized form from multiple related tables.
Q14: Difference between View and Materialized View?
A: A view can be defined as a virtual table that takes the output of the query and it can be used in place of tables.
A materialized view is indirect access to the table data by saving the results of the query during a separate schema. It is a physical copy, snapshot, or a picture of the base table.
Q15: Define ODS?
A: Operational Data Store (ODS) is a type of database that is often used as an interim logical area for a data warehouse. It can be used for integrating different data from multiple sources so that the business operations, analysis, and reporting can be executed while the business operations have occurred.
Q16: Define the ER diagram?
A: Entity-Relationship diagram that illustrates the inter-relationship between the entities in the database. This diagram shows the layout of each table and the links between the tables.
Q17: What is Active Data warehousing?
A: An Active Data warehousing is a data warehouse that has the ability to record the transactions when they change and combine them into the warehouse, along with batch or scheduled cycle refreshers.
Q18: Define SCD?
A: Slowly changing dimensions (SCD) is a dimension that changes slowly over time, rather than changing on a regular schedule, time-base. It stores and manages both current and historical data over time in a data warehouse.
Q19: What are the types of SCD?
A: There are three types of SCD and they are as follows:
- SCD1: The new record replaces the original record
- SCD2: A new record is added to the prevailing customer dimension table
- SCD3: Original data is modified to new data.
Q20: Define VLDB?
A: Very Large Database (VLDB) is a repository of big data that contains a very large amount of data and it is set to be more than one TB database. These are the decision support systems that are used to serve a large number of users.
Q21: Define ETL?
A: Extract, Transform and Load (ETL) is a software tool that extracts the data from different sources, transforms the data by applying calculations, concatenation, etc. and then loads the data into the Data warehouse system. This process requires active inputs from various stakeholders, developers, analysts, testers, top executives, etc.
Q22: What are the tools available for ETL?
A: The tools that are available for ETL are as follows:
- Data Stage
- Data Junction
- Warehouse Builder
- Ab Initio
Q23: What are the different types of Data warehouse?
A: The following are the different types of Data warehouse:
- Enterprise Data warehouse
- Operational Data Store
Q24: Mention the four components of a Data warehouse?
A: The four components of a Data warehouse are as follows:
- Query manager
- Warehouse manager
- Load manager
- End-user access tools
Q25: Define the Load manager?
A: The load manager performs all the operations associated with the extraction and loads the data into the warehouse. The size and complexity of the load manager vary between specific solutions from the data warehouse to the data warehouse.
Q26: Define Metadata?
A: Metadata is simply defined as the data that is used to represent the other data is termed as metadata. It contains information like a number of columns used, fixed-width, and limited width, ordering of fields, and data types of the fields.
Q27: Define Warehouse manager?
A: The warehouse manager performs operations related to the management of data in the warehouse. It executes the operations like data analysis to make sure the consistency, transformation and merging of data source, creation of indexes and views, archiving and backing up the data, and generation of denormalization and aggregations.
Q28: Define Dimensional modeling?
A: Dimensional modeling is a methodology that consists of dimensions and fact tables that can be used by data warehouse designers to build their own data warehouse. The fact table consists of facts and measurements of the business, while the dimension table contains the context of measurements.
Q29: What are the different types of Dimensional modeling?
A: There are three types of Dimensional Modeling and they are as follows.
- Physical Modeling
- Conceptual Modeling
- Logical Modeling
Q30: What are the steps to build the data warehouse?
A: Following are the steps to be following to build the data warehouse
- Gathering business requirements
- Identifying the necessary sources
- Organize the Attribute hierarchy
- Identifying the facts
- Defining the dimension
- Defining the attributes
- Redefine the dimension and attributes if required
- Assign unique Identifiers
- Define Relationships
Q31: Which language is used for defining schema definition?
A: Data Mining Query Language (DMQL) is the language used for Schema definition.
Q32: Define Non-additive facts?
A: Non-additive facts are the facts that cannot be summed up for any dimensions that are present in the fact tables. If there are changes in the dimensions, the same facts can be useful.
Q33: Define Partial Backup?
A: The Partial backup in an OS backup short of a full backup, taken while the database is open or shut down. It is similar to a full database backup, but it does not contain all the filegroups.
Q34: Define Cube in Data warehousing?
A: Cubes are the data processing units composed of dimensions and fact tables from the data warehouse. They provide multidimensional views of querying, data, and analytical capabilities to clients. It can be stored on a single analysis server and then defined as a linked cube on other Analysis servers.
Q35: What are the loops in Data warehousing?
A: In data warehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation takes more time and it creates uncertainty. It also advised to avoid loops between the tables.
Q36: What needs to be done while starting the database?
A: The following need to be done to start the database:
- Start an Instance
- Mount the database
- Open the database
Q37: What needs to be done when the database is shut down?
A: Following needs to be done when the database is shutdown
- Close the database
- Dismount the database
- Shutdown the instance
Q38: What are the approaches used by Optimizer during the execution plan?
A: There are two approaches used by the optimizer during the execution plan. They are as follows:
Q39: List the functions of OLAP?
A: The functions of OLAP are as follows:
Q40: List the phases involved in the data warehouse delivery process?
A: The stages that are involved in Data warehouse are as follows:
- IT strategy
- Business case analysis
- Build the version
- Technical Blueprint
- History load
- Ad hoc query
- Requirement Evolution
- Extending scope