<Nico?/>

Technology, HelloWorld() and much++✌

What about OLAP and Cubes?

OLAP, OLTP, Cubes, but before we go ahead, I assume you have read the previous post or you already know what is a Data Warehouse. If not, read it first by clicking here. Now, as part of the Data Warehouse universe, let’s talk about cubes and OLAP!

What OLAP means?

The OLAP (Online Analytical Processing) term was used for the first time by Edgar Frank Codd. For the ones that have studied at university about relational database has listened about him as the “father” of the relational database. According to his understanding, OLAP is a technology category of software that allows getting data information quickly, consistency, and interactively, that were transformed from some raw data to reflect the company dimension, according to the user needs. Ralph Kimball, the ones from Data Warehouse definition, define a little differently and shortly: Consultancy representation of textual and numeric data of a Data Warehouse.

To define OLAP applications, Codd have defined 12 rules:

  1. Multi-dimensional conceptual view of the database
  2. Transparency concept
  3. Accessibility concept
  4. Consistent reporting performance
  5. Client-server architecture
  6. Generic dimensionality
  7. Dynamic sparse matrix handling
  8. Multi-user support
  9. Unrestricted cross-dimensional operations
  10. Intuitive data manipulation
  11. Flexible reporting
  12. Unlimited dimensions and aggregation levels

When we talk about OLAP, we should talk about OLTP too. According to Microsoft documentation, OLTP (or Online Transaction Processing) “systems record business interactions as they occur in the day-to-day operation of the organization, and support querying of this data to make inferences”. Below you can see the main differences between then:

OLTPOLAP
TypeTransactional SystemAnalytical System
CharacteristicsFocus in a high number of transactionsFocus in a large volumes of data and complex queries
Granularity DetailedAggregated
Query typesSimple standard queriesComplex and flexible queries
Database OperationsINSERT, UPDATE, DELETE and SELECTMost of SELECT commands to aggregate data for reporting
Response timeMilliseconds for the simple amount of dataSeconds, minutes, or hours depending on the amount of data to process and load
SourceTransactionsAggregated data from transactions
PurposeControl of ongoing business operations in real-timePlan, give direction, support decisions, discover insights, look for past data
Data updatesShort, fast updates commonly initiated by the userData periodically refreshed with scheduled, long-running batch jobs
Space requirementsGenerally small if historical data is archivedGenerally large due to aggregating large datasets
Backup and recoveryRegular backups required to ensure business continuity and meet legal and governance requirementsLost data can be reloaded from OLTP database as needed in lieu of regular backups
ProductivitySupport the productivity of end-usersSupport the productivity of business managers, data analysts, and executives
Database designNormalized databases for efficiencyDenormalized databases for analysis
Data behavior Constantly changeable (with restrictions)Not changeable (with restrictions)

In general, one of the classics measures to provide OLAP data reporting is storing data in Cubes objects, due to performance and database flexibility.

Cubes and terminology

  • a Cube can be considered a multi-dimensional generalization of a set of data organized in a different matrix that can fits together for some desired result, promoting flexible and fast analysis possibilities.
  • Cubes are composed by Dimensions: a group of members (characteristics) that converges all together with the same topic, in the same face.
  • Measures are part of Fact tables, that are special dimensions used to store all calculable values, it means, SUM, AVERAGE, MAX, MIN, COUNT, etc.

To get it a little more simple, let’s check the example of a multidimensional data set product sales around the world along with them some time.

  • Dimensions: Product, Date, and Country
  • Members (respective dimensions):
    • TV, PC, VCR
    • 1Qrt, 2Qrt, 3Qrt, 4Qrt
    • China, India, Japan
  • Measures would be each piece of that cubes, where the more you pick up pieces, the more you aggregate your data.

Even it’s named and represented as a cube, actually from the relational database scheme it’s is commonly designed by a star schema, where it’s possible to see clearly all dimensions connected through the same Fact table:

Entenda o esquema em estrela e a importância para o Power BI ...

Star-Schema

The star schema have two types of tables:

  • Facts: Measures of a process, generally numeric and agreeable (e.g. sales amount), or contable quantitatives. It have characteristics connetions with the another tables (dimmensions). In a star schema there is only one fact table.
  • Dimensions: These are formed by relevant caracteristics as when, where, what, how, from, to, etc. (e.g. customer, supplier, partner, data-time, adress). Usually, a star-scheme is based in multiple dimension , all of them connected through the fact table.

Advantages

  • It makes the query selection easy, scripted, or note, it’s much more simple work in that model.
  • Good performance for reading command.
  • Self-explainable model, simple to understand or explain,

Disadvantages

  • It’s not a model to receive constants inserts/updates, due to not normalized relations. It needs a specific load process that can take some considerable time, normally out of business time.
  • Once it’s modeled, it’s not so flexible for different business understanding, due to it’s constructed based and focused on some business report need.

Author

Nícolas Estevan Padovani MartinsBI System Analyst, Computer Engineer & Developer
“Networking it’s not about how many connections you have, but the much you are really connected to them. - Let’s learn, let’s share.”

Leave a Reply

Your email address will not be published. Required fields are marked *