Star Schema and Snowflake Schema in Data Warehousing
What is Star Schema?
- Star schema gives a very simple structure to store the data in the data warehouse. The center of this start schema one or more fact tables which indexes a series of dimension tables. To understand star schema, it is very important to understand fact tables and dimensions in depth.
- The purpose of Star Schema is to collect the information of numerical “fact” data relating to business and separate it from “dimensional” or descriptive data. Fact data includes information like weight, price, quantities, and speed that is the data in the numerical format. Dimensional data includes information of untouchable things like model names, colors, employee names, geographical locations along with numerical data.
- The fact data is organized in the fact table, and the dimensional data is organized in the dimension table.
Types of Fact tables:
Accumulating Snapshot tables: accumulating snapshot tables record the data related to the running tally of the data.
Transaction Fact tables: transaction fact tables record data related to events like individual stock sales.
Snapshot Fact tables: snapshot fact tables record the data which applies to a particular moment in time, like month-end account statements.
Types of Dimension Tables:
Employee dimension table: employee dimension tables contain information about salespeople and employees like employee phone number, address, email address, and so on.
Geography dimension table: location and address information.
Range dimension table: range dimension table contains information related to the range of values for price, time, etc.,
Time dimension table: time dimension table contains the information to identify the exact date, year, month, and time.
Customer dimension table: customer dimension table contains information like customer number, name, address, and so on.
In a star schema, each dimension is represented by only one dimension table.
A dimension table contains a set of attributes.
A fact table in the center contains the keys to every dimension table.
The fact table also contains attributes like units sold and dollars sold.
Inclined to build a profession as Datawarehouse Developer? Then here is the blog post on, explore Datawarehouse Architecture
Star schema characteristics:
The dimension table is connected to the fact table by using the foreign key.
Dimension tables are not joined to each other.
The schema is supported by BI tools.
The dimension tables are not normalized.
Star schema is very easy to understand.
Star schema gives optimal disk usage.
Advantages of star schema:
Star schemas simplify the method of pulling business reports live period-over-period reports.
Star schema gives data to Online Analytic Processing systems.
As the complete data connects through a single fact table, the various dimension tables are considered as one huge table of data, and that makes queries more comfortable to perform.
The performance of read-only commands is very high.
The speed of the query is very high.
What is a Snowflake Schema?
Now, as you are aware of a star schema, you are ready to understand the snowflake schema. Snowflake schema normalizes the data that is denormalized in the star schema. Snowflake schema solves the write command slow-downs and few other problems that are associated with the star schema.
The snowflake schema is the multidimensional structure. Same as the star schema the fact table connects to the dimension table but the only difference is in the snowflake schema the dimension tables are divided into sub-dimension tables which creates a snowflake pattern.
The snowflake schema normalizes the dimension schema, which is connected to it by getting rid of “low cardinality” attributes and turning the dimension tables into multiple tables until that dimension table is entirely normalized.
Hierarchies are divided into different tables.
Very complex database design.
Normalized data structure.
Low-level data redundancy.
Snowflake Schema Characteristics:
The biggest advantage of snowflake schema is it uses minimal disk space.
Because of multiple tables, the query performance will be reduced.
Query performance is reduced due to multiple tables.
Maintenance efforts are required because of more lookup tables.
Advantages of Snowflake Schema:
Some OLAP database tools data scientists utilize for data modeling and analysis are mainly designed to work snowflake data schemes.
Normalizing the data which would typically get denormalized within a star schema can give an enormous reduction in disk space requirements. This is because you are converting long strings of non-numerical data within numerical keys which are dramatically abrupt taxing from a storage viewpoint.
Star Schema vs Snowflake Schema:
Let us see the difference between star schema and snowflake schema:
Star Schema is a top-down approach.
A snowflake schema is a bottom-up approach.
Time taken for executing queries is significantly less.
Time taken for executing queries is more.
Designing is very simple.
Designing is complex.
High data redundancy.
Low data redundancy.
Contains fewer foreign keys.
Contains more foreign keys.
Star and snowflake schemas are alike at heart: a central fact table enclosed by dimension tables. The difference is within the dimensions themselves. Within a star schema every logical dimension is denormalized inside one table, while within a snowflake, at least some of the dimensions are normalized.