cancel
Showing results for 
Search instead for 
Did you mean: 
Community_Admin
Community Team Member
Community Team Member

Sisense Performance: A Billion Records in a Single Server

Browsing our site, speaking to our team, or reading about us in analyst reviews, you might have noticed that we dig technology here at Sisense. That’s why when clients and prospects desire to push the limits, both in terms of data complexity and quantity, we happily oblige. 
 
After asking what our recommendations would be for the most data to host in a single Sisense server, one newly signed client (a prospect at the time) passed us one billion transactional records and three million dimensional records to host in a single Sisense node  thats 500 GB of data to test with 100 concurrent users logging in and banging around on the server. We used a 32 CPU core and 244 GB RAM cloud machine for the job, in agreement with our straightforward specs. We’ll cut to the chase and share the details from Load Impact below. 
 
Tested Setup
  • AWS Instance - r4.8xlarge (32CPU, 244GB RAM)
  • 100 Concurrent Users
  • 120 minutes
  • 38 Max Concurrent Queries
    • Sisense Concurrency is defined as querying within the same millisecond
  • 2 types of usage scenarios
    • 50of users returned results from the entire billion-record dataset
    • 50of the users viewed a subset of data, simulating use by clients who see only their data
Conclusions
  • Query response time averaged 0.1 seconds and maxed at 3.1 seconds. This represents the time for Sisense to receive a query from the web application and return a result set to the client application.
  • The Sisense Elasticube RAM consumption remained stable at approximately 100 gb despite the 500gb+ of data loaded into a disk of the Elasticube Server.
  • The average CPU usage during the load test was approximately 10-20%. This is spread across all of the distinct CPU cores.
Performance Details 
We used a tool called logz.io to analyze the server performance during the load test to aggregate logs into KPIs which we can analyze to determine the impact on the server and determine impact in production. 
Here’s what those query performance results looked like across the hour-long test. To summarize, no query took longer than 3.1 seconds to return results to the web front end.

 
kb-13b2a897-ff2c-449a-8f1f-7bf037600c7f.png

When it comes to the server usage, we passed the test with flying colors as well. Our amazing in-chip technology was on full display - we hosted 500 GB of data without utilizing more than 128 GB of RAM. CPU utilization during query times never rose above 75throughout the load test, and it averaged less than 20%.
 
kb-48594be6-44db-484f-b2a1-bffc7f1328fd.png

 

Methodology
We used a tool called Load Impact to create artificial users that log in and interact with dashboards to mimic production. That includes the following types of actions in Sisense: 
  • Loading a dashboard with nine widgets
  • Changing filter from one account to another and from one year to two years
  • Filtering by clicking on context from one chart to control the others
  • Drilling from country town to region-level data
  • Downloading a .csv of the information in a Sisense widget
  • Switch the dashboard, and repeat all steps above.
 
The two different user types (scenarios 1 and 2 below) performed the same steps. One group, however, had a where clause appended to all their queries to limit their view to one out of the seven customer accounts. This simulates the external, OEM use case for deploying to clients to view your dashboards.
Here is a visualization describing the usage pattern over the timeframe. Across the two hours on the x-axis, the number of virtual users (VUs) is displayed on the y-axis. As you can see, the number of users ramped up for 50 minutes, remained steady for 10 minutes, and then did the same thing during the second hour.

kb-fe0d4d9a-7352-4cb4-9154-b5f1ae9a96bc.png
 
The concurrent number of queries over the two-hour test increased throughout testing, as shown below. In Sisense, concurrency represents two or more users initiating a query within the same millisecond.
Data Details
The data represented one billion purchases on a website, each with its own unique transaction ID. The purchases were split into three categories - planes, trains, and automobiles. Furthermore, the analysts wanted to kick the tires on Sisense’s ability to join large tables on demand. On user request, a three million record dimension table would join with that one billion record fact table to provide revenues from the fact table grouped by origin/destination combinations, contained in the dimension table. 
 
The Elasticube looked like this:
kb-036e9a5c-0436-4b39-8ae5-f395a26ef4fc.png
 
Dashboard Details
At the end of the day, the client wanted dashboards that tracked revenues, bookings, and average revenues per booking across time, client types, and fee types. 
Here's one of the dashboards used during the testing:
 
kb-2f1293f2-1f3b-4742-af84-a92eb76061fb.png