Slowly Changing Dimensions Explained: A Straightforward Guide for Beginners
Introduction to Slowly Changing Dimensions
Slowly Changing Dimensions (SCDs) are a crucial concept in data warehousing, helping to track and preserve historical changes in dimension data. Since the core objective of a data warehouse is to enable historical analysis, it’s important to store the different states of data over time.
In data warehouses, we work with fact tables to record transactions and dimension tables to describe those facts. Data originates from operational databases and is transferred to the warehouse using Extract-Transform-Load (ETL) processes.
For instance, customer or product data are examples of dimensions. These attributes can evolve over time, and unlike operational systems, where changes overwrite existing data, in data warehousing, we must maintain the historical record for analytical purposes. SCDs allow us to manage such changes effectively.
What are Slowly Changing Dimensions?
There are several types of SCDs, each designed to handle changes in dimension data in different ways. Below is a summary of the most common SCD types:
  
| SCD Type | Description | 
|---|---|
| Type 0 | No changes are tracked; data remains static. | 
| Type 1 | Data is overwritten with the new value. | 
| Type 2 | Historical changes are recorded as new rows. | 
| Type 3 | Changes are tracked with new columns. | 
| Type 4 | A separate dimension table stores changes. | 
| Type 6 | A hybrid combining Type 2 and Type 3. | 
SCD Type 0: Static Data (No Change)
Type 0 SCDs involve keeping certain attributes unchanged throughout their lifecycle. This approach is used for data that should not evolve over time, such as an employee's initial position or start date. 
Example Employee Hiring Info:
  
    
  
    
Historical Table:
  
    
  
    
  
  
    
  
    
  
Example Employee Hiring Info:
When an employee joins a company, certain details like the joining date and initial job title are recorded. These fields are never updated, even if the employee gets promoted or changes departments. These fields remain static because they reflect historical facts that don't change over time.
For instance:
- Employee John joined the company as a Junior Developer on January 1, 2015.
- Even if he gets promoted to Senior Developer later, his joining date and initial job title (Junior Developer) remain the same in the data.
This approach helps in maintaining historical data for reporting purposes where the original details are crucial.
| Employee ID | Name | Hire Date | 
|---|---|---|
| 101 | John Smith | 2015-01-01 | 
SCD Type 1: Overwrite Data
In SCD Type 1, changes overwrite the existing data without retaining the historical version. This method is often used when the business doesn't require tracking changes over time.
  
    
After an update:
  
    
  
    
  
Example Customer Contact Information:
Suppose a business stores customer addresses, but only the most recent address is relevant for operational purposes. When a customer moves to a new address, the old address is overwritten with the new one, and the old data is not retained.
For instance:
- Customer Jane initially lives at 123 Main St..
- After moving to 456 Oak St., the address in the system is updated to the new address, and the old address (123 Main St.) is no longer stored in the database.
In this case, only the current address matters, and there's no need to track the customer's previous addresses.
| Employee ID | Name | Phone Number | 
|---|---|---|
| 101 | John Smith | 123-456-7890 | 
After an update:
| Employee ID | Name | Phone Number | 
|---|---|---|
| 101 | John Smith | 987-654-3210 | 
SCD Type 2: Storing Historical Changes with New Rows
SCD Type 2 is the most commonly used method in data warehouses, as it allows full tracking of historical changes. This type creates new rows for each change, preserving both the old and new values.Example Employee Job History:
When an employee's job position changes, you create a new record for each change to keep track of their career progression. The old record remains in the database with details about the previous position, while a new record is added for the current role.
For instance:
- Employee John starts as a Junior Developer on January 1, 2015. This record has a StartDate and an EndDate.
- After a promotion to Senior Developer on January 1, 2018, a new record is added with the new job title, StartDate for the promotion, and the old record's EndDate is updated to reflect when the junior developer role ended.
| Employee ID | Job Title | Start Date | End Date | IsCurrent | 
|---|---|---|---|---|
| 101 | Junior Developer | 2015-01-01 | 2018-01-01 | No | 
| 101 | Senior Developer | 2018-01-01 | NULL | Yes | 
In this way, both the historical and current job positions are tracked, allowing reports to show the employee's job history over time.
SCD Type 3: Storing Limited History with New Columns
In SCD Type 3, only a limited history is tracked using new columns in the same row. This method is useful when you need to store only the most recent change alongside the current value.
For instance:
  
    
  
  
- In an employee job history scenario, if an employee changes job titles, you might want to keep their current job title as well as the previous one.
- This allows for easy access to the last job title without maintaining an entire history of job changes.
Example Employee Job History:
| Employee ID | Current Job Title | Previous Job Title | Start Date | 
|---|---|---|---|
| 101 | Senior Developer | Junior Developer | 2018-01-01 | 
SCD Type 4: Using a Separate Table for Historical Data
SCD Type 4 creates a separate historical data table to manage changes efficiently. This is particularly beneficial when the main dimension table can become overloaded with too many historical records.
For instance:
- In an employee job history case, the main employee table might only contain current information, while a separate historical table stores all previous job titles and their corresponding dates.
- This separation keeps the main table concise while still providing access to historical data.
Example Employee Job History Table:
Main Table:
| Employee ID | Job Title | Start Date | Is Current | 
|---|---|---|---|
| 101 | Senior Developer | 2018-01-01 | Yes | 
Historical Table:
| Employee ID | Job Title | Start Date | End Date | 
|---|---|---|---|
| 101 | Junior Developer | 2015-01-01 | 2018-01-01 | 
SCD Type 5: Mini-Dimension
SCD Type 5 combines features from SCD Types 1, 2, 3, and 4. It keeps the current data in a main table while using a separate mini-table to store historical changes. This way, the main table remains clean, but historical data is still available for analysis.
For Instance:
Imagine a retail company that needs to track customer roles over time. With SCD Type 5, they can keep current roles in one table and maintain historical roles in another.
Example Table SCD Type 5:
Main Dimension Table (Current Data)
| CustomerID | Name | CurrentRole | 
|---|---|---|
| 101 | John Smith | Manager | 
| 102 | Bob Johnson | Senior Analyst | 
Mini-Dimension Table (Historical Data)
| CustomerID | Role | StartDate | EndDate | 
|---|---|---|---|
| 101 | Team Lead | 2023-01-01 | 2024-10-01 | 
| 101 | Analyst | 2022-01-01 | 2023-01-01 | 
| 102 | Junior Analyst | 2022-05-01 | 2024-10-01 | 
Summary
- Main Table: Holds only the latest role of customers.
- Mini-Table: Tracks all historical roles with start and end dates.
SCD Type 6: Combining Types 2 and 3
SCD Type 6 combines the features of Type 2 and Type 3, enabling flexible historical analysis. This type allows for both row-level and column-level tracking of changes, accommodating comprehensive data needs.
For instance: 
- In an employee job history situation, the table would store the current job title, the previous job title, and the historical start date of the previous position.
- This approach enables you to see both the latest and a limited history without creating an overload of rows.
Example Employee Job History:
| Employee ID | Current Job Title | Previous Job Title | Previous Start Date | Start Date | 
|---|---|---|---|---|
| 101 | Senior Developer | Junior Developer | 2015-01-01 | 2018-01-01 | 
These examples illustrate how different SCD types manage historical data in various ways while maintaining the essential information needed for analysis and reporting. Adjust the styling as needed to fit your design preferences!
  Conclusion
Slowly Changing Dimensions (SCDs) play a crucial role in data warehouses by enabling accurate tracking and analysis of changes over time. Each SCD type is designed for specific needs, with Type 2 and Type 6 being the most widely utilized for maintaining comprehensive historical records. By comprehending these SCD types, organizations can effectively manage the evolution of their data while retaining a historical perspective.
SCDs are vital for monitoring the progression of dimension data in a data warehouse. The selection of an appropriate SCD type is influenced by the organization's requirements for preserving and analyzing historical data.

Comments
Post a Comment