Product Feedback Forum
cancel
Showing results for 
Search instead for 
Did you mean: 
Status: New Idea

Summary: Live Model slow, can be fixed by prioritising a filter

I use a Live Model to return info about a few records from a big dataset. One filter should efficiently narrow the big dataset down. But the server sometimes applies other filters or joins first, making it unusably slow.

I want an option to tell the Live Model to fetch the important filter's items first, and then write those items into the filtering clause when writing the main query. That will cause the server to correctly discover that applying the important filter first is the most efficient query plan.

 

Details

(I'm querying SQL Server, but I suspect other sources would also benefit.)

Sisense documentation says "Analytical Engine inserts filtering clauses into the innermost SQL statement. This means that filtering is the first operation performed". Sometimes that helps performance. But it depends on the source server. If you apply multiple filters or joins, the server may decide to perform a less efficient one first, or it may parallelise. In my case, my query went from a few seconds to over an hour depending on the server's choice. 

The problem is that the server can't estimate the cardinality of the "filtering clauses" entered by Sisense. For example: The server doesn't know that my "dimCustomer.[Customer Name] = 'Joe Bloggs'" filter returns only 100 rows from my fact table, whereas "dimDateTime.FiscalYear = 2022" returns 100,000,000 rows from the fact table.

There is a solution. Instead of "Where Fact.[Customer ID] In (Select [Customer ID] From dimCustomer Where [Customer Name] = 'Joe Bloggs')", Sisense should first run the query Select [Customer ID]... query to find that Joe's ID is 5. Then put the result into the filtering clause, i.e. "Where Fact.[Customer ID] IN (5)". For a hardcoded ID like that, the server can estimate the cardinality, so it sees that filtering on [Customer ID] first will be more efficient that other query plans.

Sisense mustn't just do that for all filters. If it tries prefetching "dimDateTime.FiscalYear = 2022", that'll return all 31,536,000 seconds, and then pass that back in the query as "Where Fact.DateTimeID IN (...)". 

Solution: The solution I request is: Allow the Live Model developer to set a "Prefetch Filter Items" flag on a field. If it is set, when a widget or dashboard filters that field, then Sisense first creates a query "Select dimSomething.[Relationship column] From dimSomething.FieldWithPrioritisedFilter". Then, when building the main query, change the "filtering clauses" to be "Where Fact.[Relationship column] IN (<put prefetched values here>)".

Reference: For reference: https://docs.sisense.com/main/SisenseLinux/analytical-engine-overview.htmAnalytical Engine inserts filtering clauses into the innermost SQL statement. This means that filtering is the first operation performed, minimizing the number of rows that need to be scanned in order to process the query. Complex filters, such as top ranking and filter on measure are significantly improved as a result.

Proof of concept:
Build a Live Model.
FactA joins to DimB on ID_B and to DimC on ID_C.
The source is SQL Server.
In SQL Server, all three tables have millions of rows, and the FactA has foreign keys to DimB and DimC.
DimB.Name has a unique name for each row.
DimC.Type is 1 for half the rows, and 0 for half the rows.
Create a widget that filters for DimB.Name=<some name> and DimC.Type=1.
Use SQL Profiler to find the query your Live Model runs.
Run the query; it should take over a minute or time out. Then change the "Where FactA.ID_B IN (Select ID_B From DimB Where DimB.Name IN ('<some name>'))" to instead be "Where FactA.ID_B IN (<copy paste the ID of <some name> here>)". It should run much quicker. The original query might run quickly without change, depending on server settings; if so, mess about with more complicated joins or ask me. In my setting key queries were reliably slow.