Data Warehouse Interview Questions & Answers
A data warehouse is a federated repository for all the data that an enterprise's various business systems collect. The repository may be physical or logical.
What is BI?
Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help executives, managers and other corporate end users make informed business decisions.
Explain dimension table ?
A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes, or dimensions, that describe the objects in a fact table.
Explain is fact table?
A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized.
What are fundamental stages of Data Warehousing?
Stages of a data warehouse helps to find and understand how the data in the warehouse changes.
At an initial stage of data warehousing data of the transactions is merely copied to another server. Here, even if the copied data is processed for reporting, the source data’s performance won’t be affected.
In the next evolving stage, the data in the warehouse is updated regularly using the source data.
In Real time Data warehouse stage data in the warehouse is updated for every transaction performed on the source data (E.g. booking a ticket)
When the warehouse is at integrated stage, It not only updates data as and when a transaction is performed but also generates transactions which are passed back to the source online data.
What are fundamental stages of Data Warehousing?
Offline Operational Databases: This is the initial stage of data warehousing. In this stage the development of database of an operational system to an off-line server is done by simply copying the databases.
Offline Data warehouse: In this stage the data warehouses are updated on a regular time cycle from operational system and the data is persisted in an reporting-oriented data structure.
Real time Data Warehouse: Data warehouses are updated based on transaction or event basis in this stage. An operational system performs a transaction every time.
Integrated Data Warehouse: The activity or transactions generation which are passed back into the operational system is done in this stage. These transactions or generated transactions are used in the daily activity of the organization.
What is Virtual Data Warehousing?
The aggregate view of complete data inventory is provided by Virtual Warehousing
What is active data warehousing?
The transactional data captured and reposited in the Active Data Warehouse
What Is Operational Data Store (ODS)?
An operational data store (ODS) is a type of database that's often used as an interim logical area for a data warehouse.
While in the ODS, data can be scrubbed, resolved for redundancy and checked for compliance with the corresponding business rules. An ODS can be used for integrating disparate data from multiple sources so that business operations, analysis and reporting can be carried out while business operations are occurring. This is the place where most of the data used in current operation is housed before it's transferred to the data warehouse for longer term storage or archiving.
What is the difference between view materialized view and view?
A view is created by combining data from different tables. Hence, a view does not have data of itself.
On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.
When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.
What is the difference between view and materialized view?
- Tail raid data representation is provided by a view to access data from its table.
- It has logical structure can not occupy space.
- Changes get affected in corresponding tables.
- Pre calculated data persists in materialized view.
- It has physical data space occupation.
- Changes will not get affected in corresponding tables.
What is VLDB? Very Large Databases
Welcome to the discussion on VLDB. The purpose of this series, these white papers is to present and discuss VLDB regarding it’s implementation, application, design and architecture needs. The
focus here will be centered on the techniques necessary to setup, maintain, and grow a database of 1 to 80 Terabytes in nature. The target audience is those who have a background in systems
architecture, database administration, or systems administration. This series will not discuss the business benefits nor pro’s and con’s of VLDB from a business perspective.
What the series promotes are the following goals of understanding:
Impacts of VLDB on Hardware and Software
Sizing and scooping of Hardware
Parallel Processing, multi-threaded Architecture in relation to VLDB
The other areas of discussion will focus on: networks, partitions, clustering files and indexes, database sharing, CPU speed and quantity, RAM speed and quantity, and an introduction to MPP
possibilities (such as NUMA-Q). A new series will introduce new structures capable of assisting in housing VLDB data sets: a structure called a Data Collection, and a new data architecture model
called the Data Vault™ will be presented as a means for massive data storage in a detailed normalized form. Watch for this series in the coming months.
Much of the theories and hypothesis presented here are supported by mathematical functions, however mathematical theory is not the focus of this series; therefore the mathematical functions may be
referenced in certain places. However the proof of these theorem’s or math functions will not be demonstrated.
REAL-TIME DATA WAREHOUSING
Real-time analytics requires your data warehouse to have timely data available, based on a continuous and efficient data acquisition process. Implementing such a process with homegrown and some traditional ETL software can be complex, lengthy, costly and inefficient.
Attunity can enable real-time data warehousing with Attunity Replicate, providing CDC with optimized integration to all major data warehouse platforms. In addition, Attunity Compose provides end-to-end data warehouse automation that can automatically generate ETL code and help data architects more rapidly configure and manage data warehouses.
What are aggregate tables and aggregate fact tables?
MicroStrategy uses optimized SQL to query the relational database directly to answer users’ questions. Users can ask any question that is supported by the data in their warehouse and then analyze the results until they find a precise answer.
The disadvantage to this relational OLAP (ROLAP) methodology is that accessing huge fact tables can be potentially time-consuming. Multidimensional OLAP (MOLAP) is sometimes considered by some to be the answer to this problem. However, MOLAP is not scalable for large projects because of the difficulty of maintaining every possible combination of aggregates as the number of attributes and the amount of data increases. MicroStrategy’s solution is the use of aggregate tables to provide quicker access to frequently-accessed data while still retaining the power to answer any user query.
Aggregate tables are advantageous because they:
• Reduce input/output, CPU, RAM, and swapping requirements
• Eliminate the need to perform dynamic calculations
• Decrease the number of physical disk reads and the number of records that must be read to satisfy a query
• Minimize the amount of data that must be aggregated and sorted at run time
• Move time-intensive calculations with complicated logic or significant computations into a batch routine from dynamic SQL executed at report run time