Data Reflections course - 7

Mar 21, 2022 22:36

Sorting
Both raw and aggregating reflections support sorting.

The sort option is useful for optimizing filters and range queries, especially on columns with high cardinality. If sorting is enabled, Dremio skips over large blocks of records during query execution
Typically it is not beneficial to sort on more than one column in a single reflection as this does not improve read performance, but it will increase the costs of reflection maintenance
For workloads that need to support multiple sort options, consider multiple reflections that are each sorted on a single column
Adding sorting or other configuration changes to an existing reflection will not trigger a rebuild. The new configuration will be applied at the next refresh. To trigger an immediate rebuild, move the slider to the off position, click Save, then reopen the reflection and move the slider back to on, then click Save again.

Partitioning
Using partitioning, Dremio will organize data in directories. When queries use filters in a partitioning key, Dremio will apply partition pruning to minimize the amount of data scanned. Rather than scanning data in all directories, Dremio will only scan the data in the directories that correspond to the values in the filter.

It is important to use a field with low cardinality for your partitioning key. High cardinality partitions result in reflections that are ineffective compared to other techniques and expensive to maintain




dremio (learning)

Previous post Next post
Up