|Published (Last):||5 August 2009|
|PDF File Size:||1.53 Mb|
|ePub File Size:||2.89 Mb|
|Price:||Free* [*Free Regsitration Required]|
Data Warehouse Design — Inmon versus Kimball Data Warehouse Design — Inmon versus Kimball Published: Author Sakthi Rangarajan Introduction We are living in the age of a data revolution, and more corporations are realizing that to lead—or in some cases, to survive—they need to harness their data wealth effectively.
The data warehouse, due to its unique proposition as the integrated enterprise repository of data, is playing an even more important role in this situation. There are two prominent architecture styles practiced today to build a data warehouse: the Inmon architecture and the Kimball architecture. This paper attempts to compare and contrast the pros and cons of each architecture style and to recommend which style to pursue based on certain factors.
Background In terms of how to architect the data warehouse, there are two distinctive schools of thought: the Inmon method and Kimball method. They both view the data warehouse as the central data repository for the enterprise, primarily serve enterprise reporting needs, and they both use ETL to load the data warehouse.
The key distinction is how the data structures are modeled, loaded, and stored in the data warehouse. This difference in the architecture impacts the initial delivery time of the data warehouse and the ability to accommodate future changes in the ETL design.
When a data architect is asked to design and implement a data warehouse from the ground up, what architecture style should he or she choose to build the data warehouse? The Inmon Approach The Inmon approach to building a data warehouse begins with the corporate data model.
This model identifies the key subject areas, and most importantly, the key entities the business operates with and cares about, like customer, product, vendor, etc. From this model, a detailed logical model is created for each major entity.
For example, a logical model will be built for Customer with all the details related to that entity. There could be ten different entities under Customer. All the details including business keys, attributes, dependencies, participation, and relationships will be captured in the detailed logical model. The key point here is that the entity structure is built in normalized form. Data redundancy is avoided as much as possible. This leads to clear identification of business concepts and avoids data update anomalies.
The next step is building the physical model. The physical implementation of the data warehouse is also normalized. This normalized model makes loading the data less complex, but using this structure for querying is hard as it involves many tables and joins.
So, Inmon suggests building data marts specific for departments. The data marts will be designed specifically for Finance, Sales, etc. Any data that comes into the data warehouse is integrated, and the data warehouse is the only source of data for the different data marts. This ensures that the integrity and consistency of data is kept intact across the organization.
Figure 1. Data update anomalies are avoided because of very low redundancy. This makes ETL process easier and less prone to failure.
The business processes can be understood easily, as the logical model represents the detailed business entities. Very flexible — As the business requirements change or source data changes, it is easy to update the data warehouse as one thing is in only one place. Can handle varied reporting needs across the enterprise. Here are some of the disadvantages of Inmon method: The model and implementation can become complex over time as it involves more tables and joins.
Need resources who are experts in data modeling and of the business itself. These type of resources can be hard to find and are often expensive. The initial set-up and delivery will take more time, and management needs to be aware of this. More ETL work is needed as the data marts are built from the data warehouse.
A fairly large team of specialists need to be around to successfully manage the environment Breslin, The Kimball Approach The Kimball approach to building the data warehouse starts with identifying the key business processes and the key business questions that the data warehouse needs to answer.
The key sources operational systems of data for the data warehouse are analyzed and documented. ETL software is used to bring data from all the different sources and load into a staging area. From here, data is loaded into a dimensional model. Here the comes the key difference: the model proposed by Kimball for data warehousing—the dimensional model—is not normalized. The fundamental concept of dimensional modeling is the star schema. In the star schema, there is typically a fact table surrounded by many dimensions.
The fact table has all the measures that are relevant to the subject area, and it also has the foreign keys from the different dimensions that surround the fact. The dimensions are denormalized completely so that the user can drill up and drill down without joining to another table. Multiple star schemas will be built to satisfy different reporting requirements.
So, how is integration achieved in the dimensional model? The key dimensions, like customer and product, that are shared across the different facts will be built once and be used by all the facts Kimball et al. This ensures that one thing or concept is used the same way across the facts.
This is the document where the different facts are listed vertically and the conformed dimensions are listed horizontally. Where ever the dimensions play a foreign key role in the fact, it is marked in the document. This serves as an anchoring document showing how the star schemas are built and what is left to build in the data warehouse. The star schema can be easily understood by the business users and is easy to use for reporting.
Most BI tools work well with star schema. The foot print of the data warehousing environment is small;it occupies less space in the database and it makes the management of the system fairly easier. The performance of the star schema model is very good. This is known to be a very effective database operation. A small team of developers and architects is enough to keep the data warehouse performing effectively Breslin, Works really well for department-wise metrics and KPI tracking, as the data marts are geared towards department-wise or business process-wise reporting.
Drill-across, where a BI tool goes across multiple star schemas to generate a report can be successfully accomplished using conformed dimensions. Redundant data can cause data update anomalies over time. Adding columns to the fact table can cause performance issues.
This is because the fact tables are designed to be very deep. If new columns are to be added, the size of the fact table becomes much larger and will not perform well. This makes the dimensional model hard to change as the business requirements change. Cannot handle all the enterprise reporting needs because the model is oriented towards business processes rather than the enterprise as a whole. Integration of legacy data into the data warehouse can be a complex process. Deciding Factors Now that we have seen the pros and cons of the Kimball and Inmon approaches, a question arises.
Which approach should be used when? This question is faced by data warehouse architects every time they start building a data warehouse. Here are the deciding factors that can help an architect choose between the two: Reporting Requirements — If the reporting requirements are strategic and enterprise-wide and integrated reporting is needed, then Inmon works best.
Project Urgency — If the organization has enough time to wait for the first delivery of the data warehouse say 4 to 9 months , then Inmon approach can be followed. If there is very little time for the data warehouse to be up and running say, 2 to 3 months then the Kimball approach is best Breslin, Future Staffing Plan — If the company can afford to have a large sized team of specialists to maintain the data warehouse, then the Inmon method can be pursued.
If the future plan for the team is to be thin, then Kimball is more suited. Frequency of Changes — If the reporting requirements are expected to change more rapidly and the source systems are known to be volatile, then the Inmon approach works better, as it is more flexible. If the requirements and source systems are relatively stable, the Kimball method can be used.
Organization Culture — If the sponsors of the data warehouse and the managers of the firm understand the value proposition of the data warehouse and are willing to accept long-lasting value from the data warehouse investment, the Inmon approach is better. If the sponsors do not care about the concepts but want a solution to get better at reporting, then the Kimball approach is enough.
Conclusion It has been proven that both the Inmon and Kimball approach work for successfully delivering data warehouses. In a hybrid model, the data warehouse is built using the Inmon model, and on top of the integrated data warehouse, the business process oriented data marts are built using the star schema for reporting. We cannot generalize and say that one approach is better than the other; they both have their advantages and disadvantages, and they both work fine in different scenarios.
The architect has to select an approach for the data warehouse depending on the different factors; a few key ones were identified in this paper. References Breslin, Mary. Accessed May 22, Inmon, W.
Building the Data Warehouse, Fourth Edition. Marakas, George M. Prentice Hall, Accessed May 23, Accessed May 26, Accessed May 25,
Data Warehouse and Business Intelligence Toolkit Books
They are discussed in detail in this section. Dimensional Data Model : Dimensional data model is commonly used in data warehousing systems. This section describes this modeling technique, and the two common schema types, star schema and snowflake schema. Slowly Changing Dimension : This is a common issue facing data warehousing practioners.
Data Warehousing Concepts
Data Warehouse Design — Inmon versus Kimball Data Warehouse Design — Inmon versus Kimball Published: Author Sakthi Rangarajan Introduction We are living in the age of a data revolution, and more corporations are realizing that to lead—or in some cases, to survive—they need to harness their data wealth effectively. The data warehouse, due to its unique proposition as the integrated enterprise repository of data, is playing an even more important role in this situation. There are two prominent architecture styles practiced today to build a data warehouse: the Inmon architecture and the Kimball architecture. This paper attempts to compare and contrast the pros and cons of each architecture style and to recommend which style to pursue based on certain factors.