Dimensional Modeling Interview Questions
Ratings:
(4.7)
Views:1457

Dimensional data modelling is a technique for organizing data in a database so that it can be easily analyzed and queried. It is often used in business intelligence and data warehousing applications to represent complex relationships between data in a simple and intuitive way.

Here are some common interview questions that you may encounter when interviewing for a role that involves dimensional data modelling

## Dimensional Modeling Interview Questions

### Q1) What is Dimensional Modeling?

Ans: Dimensional modelling is often used in Data warehousing. In simpler words, it is a rational or consistent design technique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. A snow and star flake schema represents data modelling.

### Q2) Give us a non-computer example of preemptive and non-preemptive scheduling?

Ans) Consider any system where people use some kind of resources and compete for them. The non-computer examples for preemptive scheduling the traffic on the single-lane road if there is an emergency or there is an ambulance on the road the other vehicles give the path to the vehicles that are in need. An example of preemptive scheduling is people standing in a queue for tickets.

### Q3) What is the difference between star flake and snowflake schema?

Ans:

Star Schema:

Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in the Fact table. And fact tables primary is the union of its all dimension table key. In a star schema, dimensional tables are usually not in BCNF form.

SnowFlake:

It's almost like star schema but in this, our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary and foreign key relations.

### Q4) What is data sparsity and how it affects aggregation?

Ans: Data sparsity is a term used for how much data we have for a particular dimension/entity of the model.

It affects aggregation depending on how deep the combination of members of the sparse dimension makes up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.

### Q5) What is the difference between hashed file stage and sequential file stage in relates to DataStage Server?

Ans: In Datastage server jobs, can we use a sequential file stage for a lookup instead of a hashed file stage. If yes, then what's the advantage of a Hashed File stage over a sequential file stage

Search is faster in hash files as you can directly get the address of the record directly by the hash algorithm as records are stored like that but in the case of sequential files, u must compare all the records.

 Want to acquire industry skills and gain complete knowledge of Dimensional Data Modeling? Enroll in Instructor-Led live Dimensional Data Modeling Training to become Job Ready!

### Q6) When should you consider denormalization?

Ans: Denormalization is used when there is a lot of tables involved in retrieving data. Denormalization is done in dimensional modelling used to construct a data warehouse. This is not usually done for databases of transactional systems.

### Q7) What is ERD?

Ans: Data models are tools used in the analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.

There are three basic elements in ER models:

1. Entities are the "things" about which we seek information.
2. Attributes are the data we collect about the entities.
3. Relationships provide the structure needed to draw information from multiple entities

### Q8) What is the third normal form?

Ans: An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.

For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship with the PK is good because we established that in 2NF.

### Q9)Why are recursive relationships are bad? How do you resolve them?

Ans: Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself.

For example, a one-to-many recursive relationship occurs when an employee is the manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.

### Q10) What is an artificial (derived) primary key? When should it be used?

Ans: Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate. A derived key is an artificial key that you create. A natural key is one that is already part of the database.

 Check out our tutorial on Data Modeling Tutorial!

### Q11) What is the second normal form?

Ans)An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.

If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:

• If the table has a one-column primary key, the attribute must depend on that key.
• If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.
• If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.
• If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.

### Q12) Is this statement TRUE or FALSE? all databases must be in third normal form.

Ans: In general all organization databases are normalized to 3nf in order to remove redundancy and efficient access. A database can also be created without normalization. Hence it is not mandatory that a database should be in 3nf.

### Q13) What is Conceptual Data Modeling?

Ans: Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

The conceptual data model is created by gathering business requirements from various sources like business documents, discussions with functional teams, business analysts, smart management experts, and end-users who do the reporting on the database. Data modellers create a conceptual data model and forward that model to the functional team for their review.

Conceptual Data Model - Highlights

• CDM is the first step in constructing a data model in a top-down approach and is a clear and accurate visual representation of the business of an organization.
• CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.
• CDM discussion starts with the main subject area of an organization and then all the major entities of each subject area are discussed in detail.
• CDM comprises entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by the symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationships, too many relationships or many-to-many relationships between entities.
• CDM contains data structures that have not been implemented in the database.
• In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model

### Q14) What is Enterprise Data Modeling?

Ans: The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modelling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called as a global business model and the entire information about the enterprise would be captured in the form of entities.

Data Model Highlights

• When an enterprise logical data model is transformed into a physical data model, supertypes and subtypes may not be as is. i.e. the logical and physical structure of supertypes and subtypes may be entirely different. A data modeller has to change that according to the physical and reporting requirement.
• When an enterprise logical data model is transformed into a physical data model, the length of table names, column names etc may exceed the maximum number of characters allowed by the database. So a data modeller has to manually edit that and change the physical names according to the database or organization’s standards.
• One of the important things to note is the standardization of the data model. Since the same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.
• A standard Abbreviation document is a must so that all data structure names would be consistent across the data model.

### Q15) Logical V/s Physical Data Model?

Ans: When a data modeller works with the client, his title may be a logical data modeller or a physical data modeller or a combination of both. A logical data modeller designs the data model to suit business requirements, creates and maintains the lookup data, compare the versions of the data model, maintains a changelog, generates reports from the data model, and whereas a physical data modeller has to know about the source and target databases properties.

A physical data modeller should know the technical know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes, and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering, and so on.

The differences between a logical data model and a physical data model are shown below.

 Logica Data Modeling Physical Data Modeling Represents business information and defines business rules Represents the physical implementation of the model in a database Entity Table Attribute Attribute Primary Key Primary Key Constraint Alternate Key Unique Constraint or Unique Index Inversion Key Entry Non-Unique Index Rule Check Constraint, Default Value Relationship Foreign Key Definition Comment

### Q16) Relational vs Dimensional?

Ans: Relational Data Modeling is used in OLTP systems which are transaction-oriented and Dimensional Data Modeling is used in OLAP systems which are analytical-based. In a data warehouse environment, the staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before being loaded into a data warehouse or data mart. In an OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like a data warehouse.

 Relational Data Modeling Dimensional Data Modeling Data is stored in RDBMS Data is stored in RDBMS or Multidimensional databases Tables are units of storage Cubes are units of storage Data is normalized and used for OLTP. Optimized for OLTP processing Data is denormalized and used in the data warehouse and data mart. Optimized for OLAP Several tables and chains of relationships among them Few tables and fact tables are connected to dimensional tables Volatile(several updates) and time-variant Non volatile and time-invariant The detailed level of transactional data Summary of bulky transactional data

(Aggregates and Measures) used in business decisions

### Q17) What is Data Modeling Development Cycle?

Ans:

• Gathering Business Requirements - First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end, users to find out the reporting needs.
• Conceptual Data Modeling(CDM) - Second Phase: This data model includes all major entities and relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.
• Logical Data Modeling(LDM) - Third Phase: This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.
• Physical Data Modeling(PDM) - Fourth Phase: This is a complete model that includes all required tables, columns, relationships, and database properties for the physical implementation of the database.
• Database - Fifth Phase: DBAs instruct the data modelling tool to create SQL code from the physical data model. Then the SQL code is executed in the server to create databases.

Standardization Needs | Modeling data:

Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.

Nowadays, business-to-business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definitions would create chaos across the business.

For example: when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if proper standardization is maintained across the organization.

Table Names Standardization:

Giving a full name to the tables will give an idea about data what it is about. Generally, do not abbreviate the table names; however, this may differ according to the organization’s standards. If the table name length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.

Examples:

Lookup – LKP - Used for Code, Type tables by which a fact table can be directly accessed.

e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP

Fact – FCT - Used for transaction tables:

e.g. Credit Card Fact - CREDIT_CARD_FCT

Cross Reference - XREF – Tables that resolve many to many relationships.

e.g. Credit Card Member XREF – CREDIT_CARD_MEMBER_XREF

History – HIST - Tables the store's history.

e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST

Statistics – STAT - Tables that store statistical information.

e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT

Column Names Standardization:

Some general guidelines are listed below that may be used as a prefix or suffix for the column.

Examples:

Key – Key System generated surrogate key.

e.g. Credit Card Key – CRDT_CARD_KEY

Identifier – ID - Character column that is used as an identifier.

e.g. Credit Card Identifier – CRDT_CARD_ID

Code – CD - Numeric or alphanumeric column that is used as an identifying attribute.

e.g. State Code – ST_CD

Description – DESC - Description for a code, identifier or key.

e.g. State Description – ST_DESC

Indicator – IND – to denote indicator columns.

e.g. Gender Indicator – GNDR_IND

Database Parameters Standardization:

Some general guidelines are listed below that may be used for other physical parameters.

Examples:

Index – Index – IDX – for index names.

e.g. Credit Card Fact IDX 01 – CRDT_CARD_FCT_IDX01

Primary Key – PK – for Primary key constraint names.

e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01

Alternate Keys – AK – for Alternate key names.

e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01

Foreign Keys – FK – for Foreign key constraint names.

e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01

## Dimensional Data Modeling Interview Questions

### Q18) Steps to create a Data Model?

Ans: These are the general guidelines to create a standard data model and in real-time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.

Sometimes, the data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.

• Create High Level, Conceptual Data Model.
• Create a Logical Data Model.
• Select target DBMS where the data modelling tool creates the physical schema.
• Create standard abbreviation documents according to business standards.
• Create domain.
• Create an Entity and add definitions.
• Create attributes and add definitions.
• Based on the analysis, try to create surrogate keys, super types and subtypes.
• Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain.
• Create primary or unique keys to attribute.
• Create a check constraint or default to an attribute.
• Create a unique index or bitmap index to attribute.
• Create foreign key relationships between entities.
• Create a Physical Data Model.
• Add database properties to the physical data model.
• Create SQL Scripts from Physical Data Model and forward that to DBA.
• Maintain Logical & Physical Data Model.

For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences.

Create a change log document for differences between the current version and the previous version of the data model.

### Q19) What is Data Modeler's Role?

Ans:

• Interact with Business Analysts to get the functional requirements.
• Interact with end-users and find out the reporting needs.
• Conduct interviews, and brainstorming discussions with the project team to get additional requirements.
• Gather accurate data through data analysis and functional analysis.
• Development of data model:
• Create standard abbreviation documents for logical, physical and dimensional data models.
• Create logical, physical and dimensional data models(data warehouse data modelling).
• Document logical, physical and dimensional data models (data warehouse data modelling).
• Reports:
• Generate reports from a data model.
• Review:
• Review the data model with the functional and technical teams.
• Creation of database:
• Create SQL code from the data model and coordinate with DBAs to create a database.
• Check to see if data models and databases are in synch.
• Support & Maintenance:
• Assist developers, ETL, BI team and end users to understand the data model.
• Maintain a change log for each data model.

### Q20) What is Conceptual Data Modeling?

Ans: Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

The conceptual data model is created by gathering business requirements from various sources like business documents, discussions with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modellers create conceptual data models and forward that model to the functional team for their review.

Conceptual Data Model - Highlights

• CDM is the first step in constructing a data model in a top-down approach and is a clear and accurate visual representation of the business of an organization.
• CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.
• CDM discussion starts with the main subject area of an organization and then all the major entities of each subject area are discussed in detail.
• CDM comprises entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by the symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, too many relationships or many-to-many relationships between entities.
• CDM contains data structures that have not been implemented in the database.

In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model

### Q21) What is Enterprise Data Modeling?

Ans: The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modelling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called a global business model and the entire information about the enterprise would be captured in the form of entities.

Data Model Highlights

When an enterprise logical data model is transformed into a physical data model, supertypes and subtypes may not be as is. i.e. the logical and physical structure of super types and sub-types may be entirely different. A data modeller has to change that according to the physical and reporting requirement.

When an enterprise logical data model is transformed into a physical data model, the length of table names, column names etc may exceed the maximum number of characters allowed by the database. So a data modeller has to manually edit that and change the physical names according to the database or organization’s standards.

One of the important things to note is the standardization of the data model. Since the same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.

A standard Abbreviation document is a must so that all data structure names would be consistent across the data model.

Dimensional Data Modeling :

It is a modeling technique used in data warehousing systems. It is different from ER modelling technique used in OLTP systems. In Dimensional modelling, a model of tables is combined together with the aim of optimized query performance in Decision Support systems in relational databases.

Before further going into details of dimensional data modeling we should be aware of key concepts in Dimensional Data Modeling

• Dimension: It can be considered as a category of information. For example geographic dimension. It provides a way to slice and dice the data in the data warehouse.
• Attribute: Attribute can be considered as characteristic of dimension. For example, the region is an attribute of geographic dimension. The dimension attribute is a column in the dimension table.
• Hierarchy: It represents relationships between attributes of the dimension. It defines different levels within dimension, In the case of geographic dimension it is like Continent ->Country –> Region –> State ->City ->Street
• Fact Table: It represents the measure of interest, for organization product revenue is a measure of interest. The level of granularity is an important factor in designing fact tables. For example, it can be revenue by continent or revenue by country. In this example, the fact table will have three columns Product, Geographical region, and Revenue.
• Lookup Table: The look-up table represents a dimension and constitutes attributes for a dimension. The lookup table for products will consist of all products available.

### Q22) Define star flake schema.

Ans: A star flake schema is a combination of a star schema and a snowflake schema. ... The hierarchies of star schemas are denormalized, while the hierarchies of snowflake schemas are normalized. Starflake schemas are normalized to remove any redundancies in the dimensions.

### Q23) Define snowflake schema?

Ans: In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. [citation needed].

### Q24) What is data sparsity and how does it affect aggregation?

Ans: Data sparsity is the term used for how much data we have for a particular dimension/entity of the model.

It affects aggregation depending on how deep the combination of members of the sparse dimension makes up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.

### Q25) Define conceptual models.

Ans: A conceptual model is a representation of a system, made of the composition of concepts that are used to help people know, understand, or simulate a subject the model represents. Some models are physical objects; for example, a toy model may be assembled and may be made to work like the object it represents.

### Q26) What is data mart?

Ans: Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in a data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

### Q27) What is ER model?

Ans: ER model or entity-relationship model is a particular methodology of data modelling wherein the goal of modelling is to normalize the data by reducing redundancy. This is different from dimensional modeling where the main goal is to improve the data retrieval mechanism.

### Q28) What is dimensional modelling?

Ans)Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of the Dimensional model is not to achieve a high degree of normalization but to facilitate easy and faster data retrieval.

Ralph Kimball is one of the strongest proponents of this very popular data modelling technique which is often used in many enterprise-level data warehouses.

If you want to read a quick and simple guide on dimensional modelling, please check our Guide to dimensional modelling.

### Q29)What is a dimension?

Ans: A dimension is something that qualifies as a quantity (measure).

For example: consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives meaningful sense. These products, customers,s and dates are some dimensions that qualified the measure - 20kg.

Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.

### Q30)What is Fact?

Ans: A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

Ans:

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of a non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio, etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

semi-additive measures are those where only a subset of the aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider the price rate or currency rate. A sum is meaningless on rate; however, the average function might be useful.

Additive measures can be used with any aggregation function like Sum(), Avg(), etc. An example is Sales Quantity etc.

At this point, I will request you to pause and take some time to read this article on "Classifying data for successful modelling". This article helps you to understand the differences between dimensional data/ factual data etc. from a fundamental perspective

### Q32) What is a junk dimension?

Ans: A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators, etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.

These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in the data warehouse.

### Q33) What is a Surrogate key?

Ans: In practice, a numerical attribute is enforced by a primary key which is called a surrogate key. This key is a replacement for natural keys, instead of having a primary key or composite primary keys, the data modelers will create this key, which is very useful in identifying a record, creating SQL queries software, and good performance.

### Q34) What are Forward Engineering and Reverse Engineering in a data model?

Ans: Forward Engineering is a process by which Data Definition Language(DDL) scripts are generated from the data model. Data modeling tools have some options to create DDL scripts by coupling or connecting with several databases. By using these scripts, databases can be created. Reverse Engineering is a process used for creating data models from databases or scripts. Data modeling tools have some options to connect with the database through which we can reverse engineer a database into a data model.

### Q35) Why are recursive relationships bad? How do you resolve them?

Ans: Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself.

For example, a one-to-many recursive relationship occurs when an employee is the 9 managers of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.

You liked the article?

Like : 0

Vote for difficulty

Current difficulty (Avg): Medium

Page :

1/2