One of my favorite projects as a BI consultant is the Data Warehouse Checkup. This is an opportunity for me to roll up my sleeves and get hands on some data. I love shining a light in the dark corners of the data warehouse (DW), bringing all the dark secrets to the light where we can get ahead of problems before they get out of hand.
Let’s start with my checklist.
- Find or build a dimensional bus matrix to document relationships between fact and dimension tables.
- Extend the dimensional bus matrix to group dimension tables by the five W’s (Who, What, Where, When, Why)
- Extend the dimensional bus matrix to link each fact tables with their associated business processes
- Review or create documentation of the granularity of each fact table.
- Audit data and note any unusual gaps, patterns and outliers.
- Review performance of
- Top 25 slowest queries
- Top 25 queries by cpu and disk io (input output)
- Are there a lot of ad hoc queries that use more than one fact table?
- Are all the fact tables transactional fact tables? Would the reporting systems benefit from the addition of some accumulating snapshot fact tables or periodic snapshot fact tables?
- Are the fact tables exclusively made up of numeric values that link to your dimension tables and the things they are measuring, or is their “data warehouse” really just a bunch of report driven tables?*
Usually, by the time I’ve worked through this list I have a good understanding of the data warehouse and at least a couple hidden issues were brought to light. From here it’s fairly straight forward (but not always easy) to work through any problems or opportunities for improvement that have been discovered and bring several positive changes to the stability and performance of the data warehouse.
* Before I go, I just can’t resist elaborating on that last bullet point in the list above. This is something that fascinates me. I’ve lost track of the number of databases I have looked at that were described as dimensional data warehouses, but were in fact not following any of the commonly accepted design practices for these systems. The first place I usually see this issue present itself is in the fact table. When you look at a fact table and you find several text fields (like first name, last name, business address …) you know you might have a problem. If not a problem, at least a conflict between what you’ve found and how it has been described.
I could go on and on about this problem, but fortunately the Angry DBA (Grant Fritchey) has beat me to it. Check out his blog post on the cargo cult data professional. I’ve often thought of these databases as scarecrow data warehouses. They look legit from a distance, but the closer you get the scarier they are. The good news is, even a scarecrow can be cleaned up and put to work. The list above is a great start to make that happen.