Priority filter for faster livemodel

Tim · ‎07-09-2023

Summary: Live Model slow, can be fixed by prioritising a filter

I use a Live Model to return info about a few records from a big dataset. One filter should efficiently narrow the big dataset down. But the server sometimes applies other filters or joins first, making it unusably slow.

I want an option to tell the Live Model to fetch the important filter's items first, and then write those items into the filtering clause when writing the main query. That will cause the server to correctly discover that applying the important filter first is the most efficient query plan.

Details

(I'm querying SQL Server, but I suspect other sources would also benefit.)

Sisense documentation says "Analytical Engine inserts filtering clauses into the innermost SQL statement. This means that filtering is the first operation performed". Sometimes that helps performance. But it depends on the source server. If you apply multiple filters or joins, the server may decide to perform a less efficient one first, or it may parallelise. In my case, my query went from a few seconds to over an hour depending on the server's choice.

The problem is that the server can't estimate the cardinality of the "filtering clauses" entered by Sisense. For example: The server doesn't know that my "dimCustomer.[Customer Name] = 'Joe Bloggs'" filter returns only 100 rows from my fact table, whereas "dimDateTime.FiscalYear = 2022" returns 100,000,000 rows from the fact table.

There is a solution. Instead of "Where Fact.[Customer ID] In (Select [Customer ID] From dimCustomer Where [Customer Name] = 'Joe Bloggs')", Sisense should first run the query Select [Customer ID]... query to find that Joe's ID is 5. Then put the result into the filtering clause, i.e. "Where Fact.[Customer ID] IN (5)". For a hardcoded ID like that, the server can estimate the cardinality, so it sees that filtering on [Customer ID] first will be more efficient that other query plans.

Sisense mustn't just do that for all filters. If it tries prefetching "dimDateTime.FiscalYear = 2022", that'll return all 31,536,000 seconds, and then pass that back in the query as "Where Fact.DateTimeID IN (...)".

Solution: The solution I request is: Allow the Live Model developer to set a "Prefetch Filter Items" flag on a field. If it is set, when a widget or dashboard filters that field, then Sisense first creates a query "Select dimSomething.[Relationship column] From dimSomething.FieldWithPrioritisedFilter". Then, when building the main query, change the "filtering clauses" to be "Where Fact.[Relationship column] IN (<put prefetched values here>)".

Reference: For reference: https://docs.sisense.com/main/SisenseLinux/analytical-engine-overview.htm: Analytical Engine inserts filtering clauses into the innermost SQL statement. This means that filtering is the first operation performed, minimizing the number of rows that need to be scanned in order to process the query. Complex filters, such as top ranking and filter on measure are significantly improved as a result.

Proof of concept:
Build a Live Model.
FactA joins to DimB on ID_B and to DimC on ID_C.
The source is SQL Server.
In SQL Server, all three tables have millions of rows, and the FactA has foreign keys to DimB and DimC.
DimB.Name has a unique name for each row.
DimC.Type is 1 for half the rows, and 0 for half the rows.
Create a widget that filters for DimB.Name=<some name> and DimC.Type=1.
Use SQL Profiler to find the query your Live Model runs.
Run the query; it should take over a minute or time out. Then change the "Where FactA.ID_B IN (Select ID_B From DimB Where DimB.Name IN ('<some name>'))" to instead be "Where FactA.ID_B IN (<copy paste the ID of <some name> here>)". It should run much quicker. The original query might run quickly without change, depending on server settings; if so, mess about with more complicated joins or ask me. In my setting key queries were reliably slow.

Johan-Bekker · ‎01-23-2024

Hey @Tim, great explanation! Thanks for taking the time to explain it in such detail. We are facing the exact same problem; Sisense is prioritizing a filter key which forces the database to scan ALL micropartitions in our Snowflake database, while only 4 partitions are necessary and can be accessed by filtering on the 'right' filtering key first.

Next to me confirming the need for such a feature, did you happen to find a workaround for this? If so, a pointer in the right direction would be greatly appreciated.

Hope to hear from you!

Tim · ‎01-23-2024

Hi @Johan-Bekker ,

I'm glad I'm not alone! I did build a workaround. It's nasty and counterintuitive, but it worked: To make it faster, I query in a way that blocks some indexes/foreignkeys and allows others.

First, build a table function like this:

TableFunction(UniqueID) = Select * From OriginalView Where UniqueID.

Then in Sisense, query like this:

Select

OriginalView.Field1,

OriginalView.Field2,

TableFunction.Field3,

TableFunction.Field4,

etc.

From

OriginalView

Cross Apply TableFunction(OriginalView.UniqueID)

--(And Sisense will add a Where clause with your filters when it performs a query)

Fields that you select from OriginalView can use indexes. Take the fields that will reduce it to 4 partitions from OriginalView. Fields that you select from TableFunction can't use indexes. Take your other fields fields from here.

Result: Selecting from TableFunction can't use indexes to filter or foreign keys to join. So, the optimiser sees that it'll be very slow to join/filter TableFunction. So, it first applies all the filters you've given to OriginalView fields. Once it's got those results, it then calculates TableFunction for each UniqueID. That calculation is inefficient, but because you're already down to just a few rows/partitions, it'll run quickly.

Snowflake vs SQL Server:

I work on SQL Server; Snowflake might be different. In particular: I cross-apply to a Table Function, which is a way to join that prevents using foreign keys for the join and prevents using indexes to apply the filters from Sisense's Where clause. If Snowflake works differently, you might need to get creative (I came up with the table function idea because that's one of the "don't do this because it prevents indexes" tips for SQL Server; you might have to study how to write bad queries for Snowflake 😂...).

Another thing to look into might be "writing your own query plans"; that might be neater, but it sounded like even more hassle to learn, maintain, and teach others about than my index-blocking solution.

Sisense Community

Priority filter for faster livemodel