While working in Michigan (Ann Arbor), I needed to explain how, or where to start a star schema dimensional model. A previous employee had created a data warehouse in third normal form. He had also started working on a data mart intended to de-normalize the 3NF data warehouse. I had been asked by a team of dba’s, app dev, data scientists, and middle managers where to start the star schema. In their case, they had three main data sources completed in the data warehouse with many more data sources needed to be developed. The 3 data sources are as follows:
- HR_Data (Human Resources)
- AD_Data (Active Directory)
- ERP_Data (Enterprise Resource Planning)
To answer their questions, I reminded them about what they already knew and understood. I reminded them about the fact that the data mart was already de-normalized for us and all we need to do was identify what facts and dimensions were required by doing business interviews in order to gather requirements and user stories. Everyone was in agreement, but I could sense that they wanted more information.
My approach was to model an example from the 3 Enterprise Data Sources listed above. The star schema (dimensional model) that I wanted to illustrate contained 1 fact and 3 dimensions. In the diagrams below, these are the color keys:
In my mind, my 1st step was to look a these data sources and understand what data they held. I thought to myself, the HR Data would provide the data for the employee dimension. I continued to think about the employee dimension and I decided that there was also employee attributes I could obtain from the AD Data. I thought about the AD Data some more, and I discovered that I could also model an Organization dimension (not in the example diagrams). So at this point, I had 1 new dimension, dimEmpoyee
Next, I started to think about the ERP Data. The ERP Data was an enterprise project module. This data became my second dimension, dimProject. In both of the diagrams (above and below), I included business keys (BK) which are stored with the dimension so that we can update and maintain the related fact attributes.
My mind then went to the fact table. While I looked at the ERP data in the Data Mart, I noticed the many columns that we can count, aggregate. I realized that the ERP data was also a source for project fact information. I should note, most dates (not dates used as attributes only) should be stored in your fact table. Additive columns such as nuts, bolts, and washers were included in the fact. The green shaded columns in the diagram below are the surrogate keys loaded from their associated dimension such as start and end date and Project Lead name.
Once I had these 2 dimensions and 1 fact, I noticed that I was missing a date dimension. The fact table above, indicates that we have a start date and an end date. Notice that both of these dates in the diagram above have _SK_FK appended to the end of the StartDate, or EndDate. This represents that the value stored is the Surrogate Key (SK) defined in the related dimension table. The Foreign Key (FK) is indicated to represent that this is a foreign key to the surrogate key stored in the dimension. In my production, I would just use the _FK. I wanted to include the SK_FK to clarify to new readers that we are loading the SK from the dimension in the fact table.
The date dimension looks similar to below:
Now, if we look at all of the pieces above, we have a dimensional model plan that looks like below:
Finally, we use ER/Studio Data Architect to create a Dimensional Data Model (star schema).
Next Blog Topic: Modeling with ER/Studio Data Architect | The Dimensional Model Plan
Next Blog Topic: Modeling with ER/Studio Data Architect | Plan and Model Slowly Changing Dimensions