Slowly Changing Dimensions in Data warehouse

Slowly Changing Dimensions in Data warehouse

Slowly changing dimensions (SCD) are the dimensions in which the data changes slowly, rather than changing regularly on a time basis or in a regular schedule. For instance, the cost of a product will change by time to time, people change their names for some personal reason. These are a few example of SCD since some changes are happening to them over a period of time.

Slowly Changing Dimensions problem:

Prakash is a valuable customer of the IGuddy Inc.; he is now living in US. So the original entry in the customer table will have the following records,

 

Customer Id Name current_country

101

Prakash US

At a later date he moved back to his home country India. Now how should IGuddy Inc., now modify its customer table to reflect this change? This is the Slowly Changing Dimension Problem.

We can solve this problem in three ways using three different types of SCD as below.

The SCD has been broadly classified into three categories based on whether we want to store data with no history, partial history or full history.

  • SCD Type1 – we do not store any history in SCD Type 1 tables.
  • SCD Type2 – We will store full history in SCD Type2 tables.
  • SCD Type3 – We will store only partial history in SCD Type3 tables.

SCD Type1 – we do not store any history in SCD Type 1 tables

In Type1 SCD the new information simply overwrites the original information. In other words, we are not maintaining any history for these changes.

After Prakash moved to India, The new information replaces the original information and we have the following customer table.

 

Customer Id Name current_country

101

Prakash India

Advantage:

This is the easiest way to handle the SCD problem, since there is no need to keep track of historical data.

Disadvantage:

Here we are not keep track of history, So it is not possible to get the information like where Prakash lived in before comes to India.

SCD Type2 – We will store full history in SCD Type2 tables

In Type2 SCD a new record will be inserted to the table to represent the new information. So that both the old and new information will be present in the table. We can track the history of the customer at any point of time, since we are maintaining history for all the changes.

In order to design the above tables as SCD Type 2 we will have to add 3 more columns to the table, effective_from_date, effective_to_date, is_latest_y_n. These columns are called SCD Type 2 Meta columns.

After Prakash moved to India, The new information will be inserted into the table as a new records with changes in meta columns and we have the following customer table.

 

Customer Id Name current_country effective_from_date effective_to_date is_latest_y_n

101

Prakash US

10-Jan-12

31-12-9999

N

101

Prakash India

5-Feb-13

10-Jan-12

Y

The above three Meta columns help us to store the change history, and is_latest_y_n flag helps to identify the latest record.

Advantages:

We can easily keep track of historical information at any time

Disadvantages:

The size of the table will grow fast.

SCD Type3 – We will store only partial history in SCD Type3 tables

In Type3 SCD a new column will be added to track the latest changes.

After Prakash moved to India, The new information will be inserted into a separate column in the table and we have the following customer table.

 

Customer Id Name current_country old country

101

Prakash India US

Advantage:

We will keep track of partial historical data

Disadvantage:

If the record is changed more than one then it will be difficult to keep track of previous historical data, since we are tracking only the latest changes.