Data Cube Storage Preparation and Use
This applies to: Managed Dashboards, Managed Reports
When you select warehouse or in-memory storage, you can build the storage on-demand or schedule it to run periodically, in off-peak hours. Additionally, you can select to perform incremental updates, and retrieve only data added to the database since the last build. Incremental updates can potentially help you build large data cubes faster.
Important: A data cube that contains public parameters or user-specific filters cannot be built as a warehouse or in-memory cube. A security hierarchy can be used instead of filtering by custom attributes.
Data Cube Preparation
The in-memory storage option in particular can require a lot of server memory, so if you choose this type, we strongly recommend that you remove or hide elements (columns, hierarchies, or measures) that you do not need from your data cube's Process Result transform. This can also benefit the warehouse storage type by leaving out unnecessary data.
To completely remove columns from the data cube output, double-click the Process Result transform to open its configuration dialog, and then uncheck output columns that you don't need.
Alternatively, click on Process Result to open the Data Cube Elements pop-up where you can click the gray minus icon on the right to hide elements. Those elements will be moved and listed below under Hidden, where you can unhide it by clicking its plus icon. A hidden column will be excluded from storage but still available for linking in hierarchy keys (for example, if you need to use the data cube later to define a hierarchy by using the column as part of a composite key).
In-memory storage is also affected by the number of aggregators enabled on each measure. For example, edit a measure from your data cube and check the list of Supported Aggregators: by default, only Sum, Count, and Average are selected in order to conserve memory usage, but you can adjust this setting as needed.
Elements that provide details about or describe items identified by another hierarchy can be selected as Attribute Hierarchies for that hierarchy. This conserves the amount of in-memory storage required, while providing benefits when selecting data such as disabling unwanted totals and expand/collapse functionality by default.
For example, with a ProductID that uniquely identifies each product, you could select various elements such as Color, Class, and Size as attribute hierarchies for ProductID that describe each product.
Attributes can also be set up when creating a predefined hierarchy as shown in Automatic joins and hierarchies.
Note: More details and tips for improving performance for in-memory cube storage are provided below.
Using Data Cube Storage
When editing a data cube that is checked out to you, click Storage Type in the toolbar and select the type you want.
The Warehouse storage type lets you store or cache data in the Symphony Warehouse database. The In-Memory storage type stores data in server memory for even greater performance gain potential for data analysis queries, and also stores its data in the Symphony Warehouse database so that it can be reloaded back into memory in the event of a server restart.
Check in Your Data Cube
Once you've chosen a storage type, you must Check In your data cube before you can build its storage or set up scheduling.
After checking in, you will be asked if you want to build the data cube. Click Ok to begin building the warehouse or in-memory storage, or click Cancel if you want to build it later as shown in the next section.
Next, you will be asked if you want to set up a schedule. Click Ok to open the Schedule Rule dialog, otherwise click Cancel and you can still choose to set up a schedule later.
Build the Data Cube
After checking in the data cube, you can also build its warehouse or in-memory storage on-demand any of these three ways:
When the data cube is open for editing but still checked in, click Warehouse in the toolbar and then Build Warehouse, or click In-Memory and then Build In-Memory.
Locate the data cube within the Data Cubes folder (navigate there from the Data option in the main menu or an Explore window), right-click the data cube, and choose Build Warehouse or Build In-Memory.
If you are a Symphony administrator, you can use the Managing Jobs or Manage Data Cubes screens to run the data cube storage job.
You'll see a message indicating the build has started. The time required to build the cube depends on the amount of data and other factors described elsewhere in this article.
You can check on the status of the build from the Data Cubes dialog, which you can access from your Data Cube Status.
Important: If you check out the data cube later, its data will be displayed for you live until you undo your check out or check in again and re-build the data cube with your changes. (This does not affect other users, who can only see the checked-in version.)
Schedule Data Cube Storage
To open the Schedule Rule dialog for scheduling, open the data cube for editing while it is still checked in.
Click Warehouse in the toolbar and then Schedule Warehouse, or click In-Memory and then Schedule In-Memory.
In the dialog, choose from a wide variety of scheduling options to have the data cube periodically re-build its storage automatically, then click Submit at the bottom.
Administrators can also go to the Managing Jobs or Manage Data Cubes screens to schedule data cube builds.
Job Failure
You can confirm if the data cube storage processing was successful for data cubes you have access to from your profile.
The Configure Job Failure Emails set up by your administrator may be configured so that someone is notified by email if processing the data cube storage fails. This can be configured so that you can choose which user to notify for each data cube using storage:
Click Data Cube Settings in the toolbar when editing a data cube that is checked out to you and is set to use storage.
If available in the dialog that opens, you can set Notified On Failure to a user that should receive these emails.
There is also a configuration setting that administrators can change to Automatically Retry Jobs, including for data cube storage.
Disable Memory Management
When using in-memory storage, after the storage exceeds a certain number of records, it begins compressing the records to maintain a stable use of RAM. When the records are used, they are decompressed and then recompressed again after some time.
You can increase the performance of the in-memory data cube by preventing it from recompressing records.
Important: This setting allows for unlimited use of RAM. Using it with an insufficient amount of RAM can result in memory paging, which would likely lower the performance rather than improve it.
Click to configure the Data Cube Settings in the toolbar when editing an in-memory data cube that is checked out to you.
In the Data Cube Settings dialog, tick the check box under Disable Memory Management. Then click Submit.
The new setting will take effect after the next time the data cube is built.
Important: As an alternative, you can change the number of records after which compression starts by navigating to Config settings as a system admin and editing the Record Block Size configuration. Navigate to Admin > Setup > Config to edit the Record Block Size.
Administrator Options
If you are a Symphony administrator, you can also access additional options that allow you to manage data cube storage for multiple data cubes at once, including running jobs, creating batched sequence jobs, changing storage types, scheduling, and automatically building storage for all referenced data cubes. Under Projects / File System, click Data Cubes to access these options. See Manage Projects and the File System.
You can also manage each data cube's storage job from the Jobs page in Administration as shown in Monitoring system health.
Limitations
The following limitations exist for in-memory warehouse storage:
In-memory storage does not support binary data types.
In-memory storage relies on the double numeric type, which may approximate rather than guarantee accuracy when displaying a large number of significant digits (e.g., many digits to the right of the decimal point).
Incremental Data Update
Incrementally building data cubes will only retrieve data that was added to the database since the last build as kept track of by one or more incremental element fields in your data, such as a date or sequential ID. This has the potential to substantially decrease the time needed to build the data cube, however it will not update or remove existing data.
It is important to avoid or be cautious when using incremental updates on a data cube where changes to existing records or rows in the data are possible. For example, records deleted in the source will remain in the cube, records with the same or a smaller incremental element value will not be updated, and records updated with a larger incremental element value can appear as a new, separate record after the data cube has been re-built.
Important: Incremental data updates are not available for data structures that do not allow parameters, such as Excel.
You can enable incremental data updates when using either the warehouse or in-memory storage type, and then build the data cubes manually or via a schedule, as described above.
Click to configure the Data Cube Settings in the toolbar when the data cube is checked out to you.
Select Perform Incremental Build, and then select the Incremental Elements to use to identify new records. Each time the data cube builds, it will only load new records where the selected values are greater than those from the previous build.
Configuration Notes
After you've built your data cube storage, you can check the storage type and status, review in-memory build performance, and work with your data as raw data.
Check the Storage Type and Status
If your data cube is checked in, you can identify its storage type from the Explore window or other similar file explorers.
Data cubes appear with a distinct icon indicating its storage type as shown below. The tooltip that appears when hovering over it (or long-tapping) also describes its storage type.
If available, the last build time is also displayed in the tooltip above, and in the 1`status bar when the data cube is open (not available for some data cubes on multi-tenant instances).
You can also view all the data cubes you have access to from the Data Cube Status, along with their storage types, schedules, and build statuses.
In-Memory Build Performance
The total time required to build in-memory storage for a particular data cube, and the RAM memory required, can vary greatly depending on a range of factors. Estimating build times is therefore not as simple as saying building X records takes Y minutes, etc.
For example, memory consumption generally depends on:
How memory is allocated by .NET
The number of hierarchies and measures used in your data cube process (especially with distinct count)
How sparse your fact table is (end result of your data cube process)
Ideally, if the in-memory storage result fits entirely in RAM, then the total build time will generally be divided as follows on a typical 8 core CPU box:
50% of time spent retrieving/loading data from your data sources
50% of time spent building the in-memory data structures
With a higher number of processing cores, the time to build the in-memory data structures can be reduced due to parallelization.
Here are some general guidelines for optimizing in-memory builds:
De-select columns you don't need in your data cube as early in the process as possible.
Minimize the number of hierarchies, measures, and aggregators in your data cube.
Include columns as attributes of a hierarchy where appropriate instead of separate hierarchies.
Minimize the complexity of your data cube process (e.g., type and number of transforms used).
Use the Filter transform to restrict data to required ranges.
Raw Data
When the Data Retrieval Format on a metric set is set to Raw and <Row Number> is placed on Rows, the raw data may be retrieved directly from the original database, bypassing the data cube's storage.
See Adding More Data.
Optimize Storage Jobs
If you have a very large number of data cubes using warehouse or in-memory storage, you could reach a limit to the number of their storage jobs that can successfully run simultaneously for multiple potential reasons, including the volume of data, database configurations, and your server's hardware. To avoid issues when building data cube storage for many cubes at once, try one or more of the following:
Avoid scheduling or manually running all of the storage jobs at the same time: stagger each job's run time to allow time for some or all of the previous jobs to complete, and/or use the batching feature to schedule a special sequence job that will automatically build each selected cube in sequence.
Increase the value of the max_connections setting in PostgreSQL for the warehouse database.
Change the Data Warehouse Connection String in your configuration settings to specify a higher Maximum Pool Size parameter value.
Comments
0 comments
Please sign in to leave a comment.