Dimensional Data Modeling Interview Questions

What is Dimensional Modeling?

Dimensional modeling is often used in Data warehousing. In simpler words it is a rational or consistent designtechnique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. A snow andstar flake schema represent data modeling

Give us a non-computer example of preemptive and non-preemptive scheduling?

Consider any system where people use some kind of resources and compete for them. The non-computer examples for preemptive scheduling the traffic on the single lane road if there is emergency or there is an ambulance on the road the other vehicles give path to the vehicles that are in need. The example for preemptive scheduling is people standing in queue for tickets.   

What is the difference between star flake and snow flake schema?

Star Schema:

Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in Fact table. And fact tables primary is the union of its all dimension table key. In star schema dimensional tables are usually not in BCNF form.

SnowFlake:

Its almost like starschema but in this our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary, foreign key relation.   

What is data sparsity and how it effect on aggregation?

Data sparsity is term used for how much data we have for a particular dimension/entity of the model.

It affects aggregation depending on how deep the combination of members of the sparse dimension make up. If the combination is a lot and those combination do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.   

What is the difference between hashed file stage and sequential file stage in relates to DataStage Server?

In datastage server jobs,can we use sequential file stage for a lookup instead of hashed file stage.If yes ,then what’s the advantage of a Hashed File stage over sequential file stage

search is faster in hash files as you can directly get the address of record directly by hash algorithm as records are stored like that but in case of sequential file u must compare all the records.   

When should you consider denormalization?

Denormalization is used when there is a lot of tables involved in retrieving data. Denormalization is done in dimensional modelling used to construct a data warehouse. This is not usually done for databases of transactional systems.   

What is ERD?

Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.

There are three basic elements in ER models:

Entities are the “things” about which we seek information.

Attributes are the data we collect about the entities.

Relationships provide the structure needed to draw information from multiple entities   

What is third normal form?

An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key, but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.

For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship to the PK is good, because we established that in 2NF.   

Why are recursive relationships are bad? How do you resolve them?

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as a one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the manager of other employeess. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.   

What is an artificial (derived) primary key? When should it be used?

Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate. A derived key is an artificial key that you create. A natural key is one that is already part of the database.

What is second normal form?

An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.

If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:

If the table has a one-column primary key, the attribute must depend on that key.

If the table has a composite primary key, the attribute must depend on the values in all its columns taken as a whole, not on one or some of them.

If the attribute also depends on other columns, they must be columns of a candidate key; that is, columns that are unique in every row.

If you do not convert your model to the second normal form, you risk data redundancy and difficulty in changing data. To convert first-normal-form tables to second-normal-form tables, remove columns that are not dependent on the primary key.   

Is this statement TRUE or FALSE? all databases must be in third normal form?

In general all organization data bases are normalised to 3nf in order to remove redundancy and efficient access.A database can also be created without normalisation.Hence it is not a mandatory that a database should be in 3nf.   

What is Conceptual Data Modeling?

Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.

Conceptual Data Model – Highlights

CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business of an organization.

CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.

CDM discussion starts with main subject area of an organization and then all the major entities of each subject area are discussed in detail.

CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities.

CDM contains data structures that have not been implemented in the database.

In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model

What is Enterprise Data Modeling?

The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called as global business model and the entire information about the enterprise would be captured in the form of entities.

Data Model Highlights

When a enterprise logical data model is transformed to a physical data model, supertypes and subtypes may not be as is. i.e. the logical and physical structure of supertypes and subtypes may be entirely different. A data modeler has to change that according to the physical and reporting requirement.

When a enterprise logical data model is transformed to a physical data model, length of table names, column names etc may exceed the maximum number of the characters allowed by the database. So a data modeler has to manually edit that and change the physical names according to database or organization’s standards.

One of the important things to note is the standardization of the data model. Since a same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.

Standard Abbreviation document is a must so that all data structure names would be consistent across the data model.

Logical V/s Physical Data Model ?

When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compares the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.

A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.

The differences between a logical data model and physical data model is shown below.

Logical vs Physical Data Modeling

LDM :Represents business information and defines business rules

PDM: Represents the physical implementation of the model in a database.

LDM :Entity

PDM :Table

LDM:Attribute

PDM:Column

LDM:Primary Key

PDM:Primary Key Constraint

LDM:Alternate Key

PDM:Unique Constraint or Unique Index

LDM:Inversion Key Entry

PDM:Non Unique Index

LDM:Rule

PDM:Check Constraint, Default Value

LDM:Relationship

PDM:Foreign Key

LDM:Definition

PDM:Comment

Relational vs Dimensional?

Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.

Relational vs Dimensional

RDM:Data is stored in RDBMS

DDM:Data is stored in RDBMS or Multidimensional databases

RDM:Tables are units of storage

DDM:Cubes are units of storage

RDM:Data is normalized and used for OLTP.

Optimized for OLTP processing

DDM:Data is denormalized and used in data warehouse and datamart. Optimized for OLAP

RDM:Several tables and chains of relationships among them

DDM:Few tables and fact tables are connected to dimensional tables

RDM:Volatile(several updates) and time variant

DDM:Non volatile and time invariant

RDM:Detailed level of transactional data

DDM:Summary of bulky transactional data

(Aggregates and Measures) used in business decisions

What is Data Modeling Development Cycle?

Gathering Business Requirements – First Phase: Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.

Conceptual Data Modeling(CDM) – Second Phase:

This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.

Logical Data Modeling(LDM) – Third Phase:

This is the actual implementation of a conceptual model in a logical data model. A logical data model is the version of the model that represents all of the business requirements of an organization.

Physical Data Modeling(PDM) – Fourth Phase:

This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.

Database – Fifth Phase:

DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.

Standardization Needs | Modeling data:

Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.

Nowadays, business to business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definition would create a chaos across the business.

For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.

Table Names Standardization:

Giving a full name to the tables, will give an idea about data what it is about. Generally, do not abbreviate the table names; however this may differ according to organization’s standards. If the table name length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.

Examples:

Lookup – LKP – Used for Code, Type tables by which a fact table can be directly accessed.

e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP

Fact – FCT – Used for transaction tables:

e.g. Credit Card Fact – CREDIT_CARD_FCT

Cross Reference – XREF – Tables that resolves many to many relationships.

e.g. Credit Card Member XREF – CREDIT_CARD_MEMBER_XREF

History – HIST – Tables the stores history.

e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST

Statistics – STAT – Tables that store statistical information.

e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT

Column Names Standardization:

Some general guidelines are listed below that may be used as a prefix or suffix for the column.

Examples:

Key – Key System generated surrogate key.

e.g. Credit Card Key – CRDT_CARD_KEY

Identifier – ID – Character column that is used as an identifier.

e.g. Credit Card Identifier – CRDT_CARD_ID

Code – CD – Numeric or alphanumeric column that is used as an identifying attribute.

e.g. State Code – ST_CD

Description – DESC – Description for a code, identifier or a key.

e.g. State Description – ST_DESC

Indicator – IND – to denote indicator columns.

e.g. Gender Indicator – GNDR_IND

Database Parameters Standardization:

Some general guidelines are listed below that may be used for other physical parameters.

Examples:

Index – Index – IDX – for index names.

e.g. Credit Card Fact IDX 01 – CRDT_CARD_FCT_IDX01

Primary Key – PK – for Primary key constraint names.

e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01

Alternate Keys – AK – for Alternate key names.

e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01

Foreign Keys – FK – for Foreign key constraint names.

e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01

Steps to create a Data Model?

These are the general guidelines to create a standard data model and in real time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.

Sometimes, data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.

Get Business requirements.

Create High Level Conceptual Data Model.

Create Logical Data Model.

Select target DBMS where data modeling tool creates the physical schema.

Create standard abbreviation document according to business standard.

Create domain.

Create Entity and add definitions.

Create attribute and add definitions.

Based on the analysis, try to create surrogate keys, super types and sub types.

Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain.

Create primary or unique keys to attribute.

Create check constraint or default to attribute.

Create unique index or bitmap index to attribute.

Create foreign key relationship between entities.

Create Physical Data Model.

Add database properties to physical data model.

Create SQL Scripts from Physical Data Model and forward that to DBA.

Maintain Logical & Physical Data Model.

For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences.

reate a change log document for differences between the current version and previous version of the data model.

What is Data Modeler Role?

Business Requirement Analysis:

Interact with Business Analysts to get the functional requirements.

Interact with end users and find out the reporting needs.

Conduct interviews, brain storming discussions with project team to get additional requirements.

Gather accurate data by data analysis and functional analysis.

Development of data model:

Create standard abbreviation document for logical, physical and dimensional data models.

Create logical, physical and dimensional data models(data warehouse data modelling).

Document logical, physical and dimensional data models (data warehouse data modelling).

Reports:

» Generate reports from data model.

Review:

» Review the data model with functional and technical team.

Creation of database:

reate sql code from data model and co-ordinate with DBAs to create database.

Check to see data models and databases are in synch.

Support & Maintenance:

Assist developers, ETL, BI team and end users to understand the data model.

Maintain change log for each data model.

What is Conceptual Data Modeling?

Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.

Conceptual Data Model – Highlights

CDM is the first step in constructing a data model in top-down approach and is a clear and accurate visual representation of the business of an organization.

CDM visualizes the overall structure of the database and provides high-level information about the subject areas or data structures of an organization.

CDM discussion starts with main subject area of an organization and then all the major entities of each subject area are discussed in detail.

CDM comprises of entity types and relationships. The relationships between the subject areas and the relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to many relationship or many to many relationship between the entities.

CDM contains data structures that have not been implemented in the database.

In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model

What is Enterprise Data Modeling?

The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modeling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called as global business model and the entire information about the enterprise would be captured in the form of entities.

Data Model Highlights

When a enterprise logical data model is transformed to a physical data model, super types and sub types may not be as is. i.e. the logical and physical structure of super types and sub types may be entirely different. A data modeler has to change that according to the physical and reporting requirement.

When a enterprise logical data model is transformed to a physical data model, length of table names, column names etc may exceed the maximum number of the characters allowed by the database. So a data modeler has to manually edit that and change the physical names according to database or organization’s standards.

One of the important things to note is the standardization of the data model. Since a same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.

Standard Abbreviation document is a must so that all data structure names would be consistent across the data model.

Logical V/s Physical Data Model 

When a data modeler works with the client, his title may be a logical data modeler or a physical data modeler or combination of both. A logical data modeler designs the data model to suit business requirements, creates and maintains the lookup data, compares the versions of data model, maintains change log, generate reports from data model and whereas a physical data modeler has to know about the source and target databases properties.

A physical data modeler should know the technical-know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering and so on.

The differences between a logical data model and physical data model is shown below.

Logical vs Physical Data Modeling

LDM :Represents business information and defines business rules

PDM: Represents the physical implementation of the model in a database.

LDM :Entity

PDM :Table

LDM:Attribute

PDM:Column

LDM:Primary Key

PDM:Primary Key Constraint

LDM:Alternate Key

PDM:Unique Constraint or Unique Index

LDM:Inversion Key Entry

PDM:Non Unique Index

LDM:Rule

PDM:Check Constraint, Default Value

LDM:Relationship

PDM:Foreign Key

LDM:Definition

PDM:Comment

Relational vs Dimensional

Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse.

Relational vs Dimensional

RDM:Data is stored in RDBMS

DDM:Data is stored in RDBMS or Multidimensional databases

RDM:Tables are units of storage

DDM:Cubes are units of storage

RDM:Data is normalized and used for OLTP.

Optimized for OLTP processing

DDM:Data is denormalized and used in datawarehouse and data mart. Optimized for OLAP

RDM:Several tables and chains of relationships among them

DDM:Few tables and fact tables are connected to dimensional tables

RDM:Volatile(several updates) and time variant

DDM:Non volatile and time invariant

RDM:Detailed level of transactional data

DDM:Summary of bulky transactional data

Dimensional Data Modeling :

It is a modeling technique used in data warehousing systems. It is different from ER modeling technique used in OLTP systems. In Dimensional modeling a model of tables is combined together with aim of optimized query performance in Decision Support systems in relational databases.

Before further going into details of dimensional data modeling we should be aware of key concepts in Dimensional Data Modeling

Dimension : It can be considered as category of information. For example geographic dimension. It provides a way to slice and dice the data in data warehouse.

Attribute : Attribute can be considered as characteristic of dimension. For example region is attribute of geographic dimension. Dimension attribute is column in dimension table.

Hierarchy : It represents relationships between attributes of dimension. It defines different level with in dimension, In case of geographic dimension it is like

Continent ->Country –> Region –> State ->City ->Street

Fact Table : It represents the measure of interest, for organization product revenue is measure of interest. The level of granularity is an important factor in designing fact tables. For example it can be revenue by continent or revenue by country. In this example fact table will have three columns Product, Geographical region, Revenue.

Lookup Table: The look up table represents a dimension and constitutes attributes for dimension. Lookup table for product will consist of all products available.

Define star flake schema?

A starflake schema is a combination of a star schema and a snowflake schema. … The hierarchies of star schemas are denormalized, while the hierarchies of snowflake schemas are normalized. Starflake schemas are normalized to remove any redundancies in the dimensions.

Define snowflake schema?

In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. [citation needed].

What is data sparsity and how it effect on aggregation?

Data sparsity is term used for how much data we have for a particular dimension/entity of the model.

It affects aggregation depending on how deep the combination of members of the sparse dimension make up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.

Why are recursive relationships are bad? How do you resolve them?

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as a one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the 9 manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.

Define conceptual models?

A conceptual model is a representation of a system, made of the composition of concepts which are used to help people know, understand, or simulate a subject the model represents. Some models are physical objects; for example, a toy model which may be assembled, and may be made to work like the object it represents.

What is data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

What is ER model?

ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.

What is dimensional modeling?

Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.

Ralph Kimball is one of the strongest proponents of this very popular data modeling technique which is often used in many enterprise level data warehouses.

If you want to read a quick and simple guide on dimensional modeling, please check our Guide to dimensional modeling.

What is dimension?

A dimension is something that qualifies a quantity (measure).

For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, “20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)”, then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure – 20kg.

Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.

What is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

What are additive, semi-additive and non-additive measures?

Non-additive Measures: Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

Semi Additive Measures: Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.

Additive Measures: Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.

At this point, I will request you to pause and make some time to read this article on “Classifying data for successful modeling”. This article helps you to understand the differences between dimensional data/ factual data etc. from a fundamental perspective

What is junk dimension?

A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.

These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in data warehouse.

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on Dimensional Data Modeling Interview Questions"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Support


    Please leave a message and we'll get back to you soon.

    3 + 5