OLAP Cube, ROLAP, MOLAP, HOLAP: How to use Stores

8. Juli 2010, 22:20

The built in Offline Stores in instantOLAP allow to aggregate the classical cubes fully or partwise into light speed local data stores. With Offline Stores, you can realize all modern forms of OLAP storage: ROLAP (direct access to relational databases when querying data), MOLAP (fully aggregated cubes with high performance) or HOLAP (a hybrid mixture of both).

[toc]

Classical ROLAP Cubes

In instantOLAP, every Cube is by default a ROLAP cube. This is because we believe creating a new cube is very productive with adhoc cubes: You can add new dimensions to cubes or modify mappings and then instantly test the new results without any aggregation process (which may take time).

After you finished the first production quality version of your cube it is time to test its performance and to improve it if neccesary. This is when Offline Stores come into sight.

Building MOLAP Cubes

Creating a new Store

Since version 2.6 building offline stores is very easy and comfortable: Use the context menu of the cube to add a new Offline Store to it. Then you must decide, which dimensions you want to add to the cube and how you want them to be loaded.

After you added the store, a new column will appear right to the mappings. In this new column you can define, which dimensions and facts you want to be added to the new Store.

Now it’s time to create the most simple form of a store: Add every dimension and fact to the store by clicking into the related cells of the column. A checkmark will show that the dimension or fact is now part of the Store.

Filling the Store with the Model Manager

Now you should open the Model Manager and fill the Store with data. Locate the Model in the Manager, open the Model Information dialog for „demo/demo“ by double clicking the model and switch to the Cubes tab in the dialog.

Here you can see the new Store which replaced its source cube in the Cubes tab. The most important information for Stores is the „State“ columns which shows if the Store is currently available (online), not available (offline) or building (online + building or offline + building).

Because our cube is not loaded yet, its state is „offline“. To load the Store, select it and use the „Rebuild“ button from the toolbar at the right side. Then wait a few seconds and use the „Refresh“ button from the toolbar to view the current state.

Now the store is loaded and the execution time of the demo reports will shrink from seconds to milliseconds. There are a couple of new informations in the table: The build time, the error message for the case the build process stopped with an error, the number of raised warning while loading the data and the Store size.

The Store size, the last reload timestamp and the build time are simple informations from the loading process. The build time show how long it took to load and aggregate the source data and the size shows the effective (compressed) size of the Store. This is not the number of cells the Store can deliver but the number of informations instantOLAP wrote to the local disk. In this oexample, the Store needs 73,863 records to deliver data for 151,632,000 cells, a compression of 2000:1!

The error message and the warnings are very important informations for the case your cube delivers no or wrong data or didn’t load at all.

How to change the aggregation scheme for dimensions and facts

There are two different ways how to aggregate the values for higher levels of a dimension: By doing a rollup or by querying each single level.

Rolling up means that the system will only query the lowest level and single mapped keys from the database and calculate every upper level manually. Querying the levels means the system will perform one SQL query per level.

Be careful with the query scheme, because usually your Store contains more than one dimension. E.g. if you have three dimensions with three levels, this will perform 27 queries to the database. Ten dimensions with three levels means 118.000 queries, one for each possible combination of each dimension level!

But why should you use the query scheme? There are some reasons you might need this:

One of your facts uses a COUNT DISTINCT aggregation, and this cannot be calculate by performing a rollup in some cases
Your hierarchy cannot be calculated by rolling up data because a key value is not the SUM, AVG or else of its children
Sometimes it can be faster to perform many small queries than a single, large one

Also for facts you can change the aggregation scheme, but here you can change the aggregation formula the system uses. The default value is „default“ – then the Store automatically determines the aggregation type from the SQL expression you mapped the fact with. But can also change it manually to one of the aggregations „SUM“, „AVG“, „MIN“, „MAX“ etc. This is needed whenever the system is unable to automatically determine the aggregation from the SQL expression or simply want something different.

Resolving errors in Stores

There are some pitfalls when loading the data of a Cube into a Store. The Store might raise an error or collect warning while its building process. Whenever an error is raised you should, of course, fixed it and reload the Store again. But even warning should always be fixed, because they can be a reason for wrong data. But why can a Store raise errors or warnings when original cube works correct?

There two main reasons and they are not always easy to understand:

In ROLAP cubes, the cube will of course only load data for the requested dimensions and facts. Which also means it will only use the tables containing information for this dimension and facts. But if you combined, for example, two different fact tables in one cube, the Store will perform one SQL query to gather both facts – and combining two fact tables means a full permutaion of both tables, the numbers get exponentially higher. Maybe with the ROLAP version of your cube you didn’t see the error, because you always used only one of the facts, but the Store combines everything.
Your cube may deliver data for more keys than a dimension contains. E.g. if you build a product dimension and filter the dimension to load only yellow products, you must apply the same filter to the cube. Otherwise, the cube will load facts for keys which are not part of the dimension, warn you (in fact, this is 99.99% of all warning a cube will raise are „missing keys“) and drop the row. Dropping the data itself is not the problem, because you don’t want the cube to load data for unknown keys, but your original ROLAP does it! E.g. if the ROLAP cube is queried for the Amount of 2010 for all products, the ROLAP Cube will simply generate no WHERE statement for products, because you want all of them. But the store will calculate this value by rolling up all product data and the value will be less than the ROLAP value. This is not an error in the Store, it is originally located in the ROLAP Cube!

As you can see, the source for all problems with Stores is the ROLAP Cube itself. Therefore, you should always look for error messages and warnings and try to eliminate the problems in you cube.

Also important for warnings: You need to know how to find them, because the model information only shows the number of raised warnings but not the warnings itself. The warnings are sent to the log while building the Store, therefore you should open the log console before starting your first Cube aggregations. It is also important to switch of the session filter before building the Store, because this is a system process not related to any (or your) user session.

Combining multiple Stores

Our simple approach to load all dimensions into a single Store is good for small Cubes, but with a raising number of dimensions the Store size will grow exponentially (this is, because the number of cells in a Cube is the product of all dimension sizes).

Adding more and more dimensions to a MOLAP Cube will bring every OLAP system to its limit – including instantOLAP. We have a good experience with cubes up to one billion compressed datasets (Stores of this size can hold up to 5000 billion cells), but even if this number sounds very high it can easily be reached by combining 20 or more large dimensions. And aggregating this number of cells can take many hours.

Therefore, we have a different approach for this problem: Combining of multiple Stores. In practice, there are several combinations of dimensions which are used often and other combination which are used never or very seldom. This means, you can aggregate a ROLAP Cube into more than one Store, and each Store holds a smaller number of dimensions the original Cube contains.

In this example we combined two Stores, one holding the Campaing dimension, the other the Manufacturer dimension. The system will now automatically choose the correct Store whenever a user queries the cube.

Building HOLAP Cubes

Now you might say „Okay, but what happens if the user queries all dimensions together or an unknown combination?“. This is where HOLAP comes into play: The „H“ in HOLAP stands for hybrid – a mixture between offline MOLAP Stores and adhoc ROLAP Stores. You can create subset Stores and tell the system to use the orignial ROLAP Cube whenever an unknown combination of dimensions is queried. This means, you can deliver performant Stores but also a fallback solution for unmapped combinations.

How to turn an MOLAP Cube into an HOLAP Cube? There is a property for Cubes, „Keep Adhoc Cube With Stores“. Simply set the property to „true“ and redisplay the Cubes in the Model Information dialog. Now you will see the original ROLAP Cube behind your Stores. And, because the order of Cubes and Stores is important in instantOLAP, the system will use the ROLAP cube only, if none of the Stores can deliver the wanted data.

You can see this in the Model Information also: When switching the property „Keep Adhoc Cube With Stores“ to „true“, the original Cube appears behind the Stores. Simple but effective.

Refreshing Offline Stores

This is the last question: When will the system build or refresh my Stores? Of course, you don’t want to create the Stores manually after creating them. Therefore, there are two different ways to keep Stores up to date: Dimension triggered (automatic refresh) or time triggered (crontab refresh).

Automatic refresh

The automatic refresh will always start the build of a Store whenever it is offline or one of its source dimensions changed (in this case, the Store must be rebuild otherwise it would contain corrupted data). To turn the automatic refresh on, set the switch „Build Offline Store Immedialtey“ of the configuration / model to true.

Crontab refresh

The alternative to automatic refresh is to refresh a Store triggered by a cron pattern. This cron pattern, which is a property of the Stores, you can manually control when the Store should be rebuild. Whenever a source dimension of the Store changes, the Store will fall back into the ROLAP mode and the data will be queried from the underlying database.

Usually, it is more comfortable to use automatic refresh, but there are sometimes reasons to manually controll the cron pattern. E.g. if you don’t want to stress your source database with the large SQL queries at daytime or if a source database is not available at certain times.

Schlagwörter: HOLAP, HowTos, MOLAP, OLAP, OLAP Cube, ROLAP, Stores
Category: HowTos | Comment (RSS) | Trackback