The “Data Warehouse” terminology was introduced in 1990 by Bill Inmon, recognized as the Dad of it. Years late, Ralph Kimball got this wave to describe it technically deeper. These are the main names you will always see as the “masters” of the Data Warehouse.
Data Warehouse. Powerful name! OK. And that means?
Reading some classical books, it’s possible to pick some different ways to define that term by them:
A collection of data, oriented by subject, integrated, time-variant, and volatile, to support the decision process management.Bill Inmon
The data source for organization consultancy, or nothing more than the data union of all existent Data Marts. […] A copy of transaction data, structured for consultancy and analysis.Ralph Kimball
As possible to see, Kimball was used to be more technical in your description than Inmon. It’s useful to know it if you are interested to read one of the existing DW books.
Our focus here is not talking about books and get much theory in detail but present that topic a little more summarized. There are some basics characteristics that define if a system is a Data warehouse or not:
- Supply corporate and organizational data.
- Has consistency data, promoting a “same point of view” for everyone that consults it.
- It’s organized by subjects, called too Data Marts.
- The stored data can be sliced and combined with business characteristics.
- It’s not only to store data, but make possible consult, analyze, and present information to support decisions.
- It’s the place where important and relevant data from the organization is stored, to be used as a data propagation area.
- Possibility to integrate data from different sources, consolidating and adjusting to being available in the same place, with the same format and standard regulations.
A formal and classical Data Warehouse should something similar to the image below, of course changing according to the used technologies.
- First, the data is extracted from the existing and desired data sources to the staging areas, the first stage the data is in in a DW.
- Them this data is transferred and transformed by some ETL process (know more in the previous Business Intelligence post), doing data cleaning, format adjustments, and as necessary, business calculations. At the end of that step, usually, this data will be stored in ODS (Operational Data Storage) objects, which means, a common relational database.
- After the ODS consolidation, it’s time to bring that to an OLAP server area, where commonly will be stored in cubes*¹ in some dedicated Data Mart.
cubes*¹ technology is being transformed due to new models possibility, specifically because of in-memory databases and cloud computing services, but that is something we should talk in the next post. Anyway, the very classic cubes model should be used for the next years yet due to the necessary investment to get new capabilities.