elasticube

75 Topics

Flattening JSON Objects In SQL Server
Question: In instances where tables within SQL Server contain JSON objects within a column you can use the import sql query editor feature in Sisense to flatten these objects into individual fields. For example: Can be turned into: Solution: To accomplish this, SQL Server has native functions that are capable of parsing json objects and flattening them into individual fields. The below code uses the JSON_Value extracts a scalar value from a JSON string in which the first parameter takes an expression which is typically the name of a variable or column that contains JSON text. The second parameter takes a JSON path that specifies the property to extract. The default path mode is lax mode, which takes the syntax '$.<keyname>'. Below is the SQL used to flatten the JSON values in the sample data. select SalesOrderNumber , JSON_Value(JSONValue,'$.CustomerID') as CustomerID , JSON_Value(JSONValue,'$.OrderDate') as OrderDate , JSON_Value(JSONValue,'$.TotalDue') as TotalDue , JSON_Value(JSONValue,'$.ShipMethodID') as ShipMethodID , JSON_Value(JSONValue,'$.TerritoryID') as TerritoryID , JSON_Value(JSONValue,'$.SalesPersonID') as SalesPersonID from OrderHeaderJSON Other useful functions regarding working with JSON in SQL Server can be found here: ISJSON (Transact-SQL) tests whether a string contains valid JSON. JSON_VALUE (Transact-SQL) extracts a scalar value from a JSON string. JSON_QUERY (Transact-SQL) extracts an object or an array from a JSON string. JSON_MODIFY (Transact-SQL) changes a value in a JSON string.
intapiuser
03-02-2023 Place Data Models
11KViews
0likes
0Comments
Year cannot be recognized
Hi! I just started using Sisense and so shame that it's such a basic problem, but I cannot seem to visualize the following columns about the year and number of crimes in Japan. when I visualize it, the year only shows as 1905? (see attached) Year 合計(人) 2002 22,294 2003 23,971 2004 22,568 2005 20,388 2006 18,649 2007 16,922 2008 15,922 2009 15,271 2010 14,878 2011 14,144 2012 14,581 2013 14,596 2014 14,051 2015 12,565 2016 11,546 2017 10,888 2018 10,544 2019 9,899 Please let me know how I can resolv
Solved
Mis
11-09-2021 Place Deploy & Connect Analytics
7.6KViews
0likes
3Comments
How do you handle timezone conversions on dashboards?
We store our dates in UTC format and want the ability for dashboard viewers to view the date data in their timezone, or in a timezone they select. Has anyone figured out a way to do this? We have a LOT of date data in our data model.
Solved
barrytallis
10-25-2021 Place Deploy & Connect Analytics
7.3KViews
0likes
5Comments
LAG / LEAD Functions
Question LAG & LEAD Functions are functions that give access to multiple rows within a table, without the need for a self-join. The LAG function is used to access data from a previous row, while the LEAD function is used to return data from rows further down the result set. A perfect example of these functions in use is with the “Sales Order” scenario. A department store is trying to monitor the purchase trends of its customers; more specifically, per customer, what’s the average time it took to repeat an order? For us to do so, we must take a record in the table (Order Date) and compare it to another record in the same table. What are some ways we can do that? In our example, we will analyze & compare the efficiency between the multiple solutions. We will start by addressing the problem in the most common/traditional of solving a problem like this. Answer Steps: 1. Create A New ElastiCube Add Data > Microsoft SQL Server > Connect To Server Pull in Sales Order Header Table "How are we going to manipulate that table to get it into a format that we would want to work with?" Uncheck fields that you don’t need; looking for high-level/important data BUILD! 2. Let's Create A Custom SQL Expression What we are getting here a single customer’s activity. It also orders the Order Date in ascending order (from first date to last) "For this specific customer, I want to be able to see __ amount of days between each order. How can we compute / compare those dates? In order to do so, we want to take the order table and join it to itself, where the customer is the same and the order date is offset to the previous or future date." So, let’s start by leveraging the rank function to assign an order number to every date… The Rank function is important here because within the Customer ID, we want to rank based off the order date, group it per customer, and order by the order date. This basically creates create a system that makes it so that it is comparing itself to itself minus 1. This is what sets up our ranking per order. In order for us to join the table to itself, let’s create a sub query and inside of it, put the entire Customer’s table: (Type Of Join:) What it’s getting joined on: Result: 3. Now That We Have The Order Date & The Total Due, We Can Start Subtracting Them Off Each Other & Start Doing Differences Now you would want to go back to the top; the first “Select” of the sub query and change the claims to: Preview: Now that we saw that this function is working, we can now delete our customer filter, order by filter. Now we get all the differences across all of the customers. Issue With This Solution The main issue with this solution is that it requires a lot of processing due to the two tables. Although we are really using one table, we still have two and the same data is repeated between them. This extra storage isn’t necessary, creates the long processing, and can also create confusion for the customer. Solution 2: Doing A Lookup Function On Itself (Own Data Set). To improve upon the last solution, is there a way to pull in one table and do the manipulation in Sisense, without creating a second table? Answer: Yes! What we can do is take the custom table and add it as a view to the source; that way we only import one table Steps: 1. Duplicate "SalesOrderHeader 1", Rename "SalesOrderHeader 2" BUILD! For us to do a look up on this table, we need to have need to have one key to join. The key is a combination of the customer ID + ranking, that’s what makes it unique. 2. So First, Create A New Field, Record_Rank (Int), Input: BUILD Add Data > Custom SQL Table 3. Head Back Into SalesOrderHeader 2, And Set Up The Different Fields To Create The Key + New Field > CurrCustomerRankKey (Text) Do Entire Build 4. Let's Create A Test Function + Custom SQL Expression > Test Function 5. Go Back To SalesOrderHeaders 2 Table, Do Lookup In There + New Field > PrevOrderDate (Date-Time) + New Field > PrevTotalDue (Float) + New Field > DaysBetweenOrders(Int) + New Field > DiffOrderAmount Schema Build: Solution 3: LAG & LEAD Functions This solution does not include a self-join Steps: 1. Open Up SQL Server (TC SQL) Go to same database (Adventure Works 2012) > create new query Let’s start off the new query as: Now we want to dynamically compare one against the other. Instead of doing a self-join, it tells itself to look at the look at the record/number that is above or below and just pull that value in. So it’s almost like a lookup , but it doesn’t require a join , it just requires you to tell it how to sort the data and just dynamically go up and go down and pull the field that you want in to the current record. This is where the LAG & LEAD functions step in To go back a record, we call the LAG function. Within the LAG function, we must define the field that we want to pull in (order date) and how many records back do you want to go (1). Next we want to group by (partition by) and Order By a type of variety and then order it either ASC / DES. To make our “look up” a bit more distinct we can Result: If we want to compare up against future dates, we would use: And its result would be: What we can also pull in the Total Due and calculate the previous Total Due: Which would require a sub query: To get of the first null, we would enter: Pull in everything from t2.* and now we can calculate our differences (Date Diff, etc): As for getting this entire query into Sisense, we want to create a View
Community_Admin
10-16-2021 Place How-Tos & FAQs
6.9KViews
0likes
0Comments
Ensuring Elasticube is in Sync with Data Source (Deletion & Updates in Accumulate Build)
Capturing accurate business transactions is imperative in understanding business operations. If data is constantly changing, such as being deleted or updated, how do you ensure your Elasticube is in sync with your data source? Depending on the use case, sometimes “Replace All” data could be an option, but that might not be the most efficient option if you have a huge amount of data. You also have “Accumulated By” as an option, but that could result in duplicated records and deleted records still being shown in Elasticube. Given these potential results, another path you should consider is a custom code solution. Custom Code Solution Leverage the demo below to learn more. Note in the demo, we will be using an oracle database with a transaction table. The transaction table has an updated_date column that captures the latest data changes and an audit table where deleted transaction IDs are stored with timestamps via a trigger. Please note, all tables in the demo contain dummy records. You can find the DDL for table and trigger here. Using custom code to create new tables in Elasticube will allow the Elasticube to be in sync with your data source table. This new table will only show the latest records, while also removing any deleted records via trigger. The audit table will capture deleted records. (NOTE: Adding triggers could impact database performance as every time a record is deleted there needs to be a write operation). Additionally, this custom code solution provides the ability for change data capture (CDC) to answer business questions like at what intervals the record was updated and/or deleted via the audit table. Let’s begin by building an Elasticube. Use updated_date as “Accumulated By” for both the tables and build the Elasticube by selecting BY TABLE. After this, we will add Custom Code that will contain all the logic required to obtain the latest records and also remove any deleted records. Step 1: Create tables, triggers, and records in Source Database. Sample code is available here. Step 2: Create Elasticube by adding both tables and selecting deleted_on column as accumulated by for audit table and updated_on for fact_sales table. Then build using “BY Table” (Note: The audit table will be empty since we haven’t deleted any records yet) Step 3: Create Custom Code Table If not enabled, go to Admin > Feature Management > Custom Code and save. Select Create your own Notebook In the input parameter, add both base tables, making sure you select the correct parameters and add columns for the output schema. Then hit the open code editor to open a jupyter notebook in a new tab where you will be able to add the logic. Once the jupyter notebook is open, clear all the cells and copy-paste the code from the .ipynb file from the repo. Make changes to the notebook based on your use case. For example, if the Elasticube and table names are different, provide them in cell 2. For this example, the date_change_threshold is the parameter we are choosing to restrict. Save the notebook, return to the previous tab, select the table for input parameters, and click done to create a new Custom Code table in Elasticube. First Build Changes Only which will create the data for the Custom Code Table Step 4: Insert a new record, update an old record, and/or delete a record from the Fact Sales table in your data source. The update trigger will change the Updated_On value for id 9 while the delete trigger will create a new record in the Audit Table Step 5: Build the Elasticube by selecting “By Table”. Now, in the Elasticube you have a fact_sales table which contains all the history/changes of the record and the Custom Code table which has the latest record that are in sync with the data source table. Example below of Table capturing all data changes: Example below of Table in-sync with data source: These steps should result in accurate data in your Elasticube leading to smarter business decisions. Please note, this is one possible workaround for users with similar use cases. Keep in mind that editing the jupyter code is another possibility. For example, you can try to use dask dataframe instead of pandas and see which performance is better with your data. Disclaimer: Please note, that this blog post contains one possible custom workaround solution for users with similar use cases. We cannot guarantee that the custom code solution described in this post will work in every scenario or with every Sisense software version. As such, we strongly advise users to test solutions in their own environment prior to deploying them to ensure that the solutions proffered function as desired in their environment. For the avoidance of doubt, the content of this blog post is provided to you “as-is” and without warranty of any kind, express, implied or otherwise, including without limitation any warranty of security and or fitness for a particular purpose. The workaround solution described in this post incorporates custom coding which is outside the Sisense product development environment and is therefore not covered by not covered by Sisense warranty and support services.
himanshu_negi
03-08-2022 Place Data Models
5.7KViews
2likes
0Comments
Connecting to Clickhouse
I was recently speaking with Kat about certified connectors and related status for those connectors – specifically related to Clickhouse, which is listed as a Certified data connector for Sisense. Are there any Sisense customers out there that are using Clickhouse successfully with Sisense? To us, "successfully" would at a minimum include: Being able to connect using a Live Connection for interactive query and aggregation via Sisense Notebooks Being able to load data from Clickhouse into Elasticubes Thanks all!
sisense-nuna
02-27-2023 Place Deploy & Connect Analytics
5.7KViews
0likes
11Comments
How to configure Data Groups
Introduction This article will guide you on how to configure the Data Groups’ parameters based on the current performance and business requirements. What are "Data Groups”? The “Data Groups” administrative feature allows configuring Quality-Of-Service (QoS) to Sisense instances and limiting/reserving the resources of individual data models. By utilizing data groups you’ll make sure available resources are controlled and managed to support your business needs. Why Should One Configure “Data Groups”? The main benefits of configuring “Data Groups” are: Controlling which cluster nodes are used for building and querying Limiting the amount of RAM and CPU cores a data model uses Configuring indexing, parallel processing, and caching behavior of a data model Mitigating “Out-Of-Memory" issues caused by lack of resources Preventing “Safe Mode” exceptions caused by lack of resources Read this article to learn about the different “Data Groups” parameters “Data Group” Use Cases The following two scenarios are good examples of using "Data Groups”: Scenario A customer embeds Sisense (OEM use case) and has many tenants that can design dashboards (a.k.a. Self-Service BI). Each tenant has a dedicated Elasticube data model. Challenge A single tenant is capable of creating a large number of dashboards - Causing the underlying data model to utilize a significant portion of the available CPU/RAM resources. The excessive use of resources by a single tenant may lead to resource depletion and degrade the user experience for them and other tenants. Solution Resource Governance – Set up resource utilization limitations using "Data Groups”. Doing so will limit each tenant’s usable resources and prevent tenants from affecting each other. Scenario A customer has many Elasticube data models created by various departments of the organization. Some of the data models are used to generate high-priority strategical dashboards (used by C-Level managers). Other dashboards are prioritized as well (e.g., operational dashboards vs. dashboards used for testing) Challenge C-Level dashboards must have the highest priority and should always be available. Other dashboards should still be operational and behave based on their priority. In case of conflict or a temporary lack of resources, a critical Elasticube may run out of memory or trigger a ‘safe mode’ event. Solution Resource Governance – Setting up priority-based Data Groups would result in allocating business-critical data models with more resources and limiting the less critical ones. Data Groups Resource Governance Strategies Limiting a Data Model’s Resources - Governance rules can be used to limit the resources used by a single data model or a group of multiple data models. This can be done by configuring a "Maximal” amount of allocated CPU/RAM. Note that the data model would be limited to the configured resource restrictions even though additional resources are available for use. Pre-Allocating a Data Model’s Resources - Governance rules can be used to pre-allocate the resources used by a single data model or a group of multiple data models. This can be done by configuring a “Reserved” amount of allocated CPU/RAM. Note that other data models would be limited to partial server resources even though a pre-allocated resource might be idle. Prioritizing Data Models - Governance rules can be used to prioritize certain data models by allocating a different amount of resources to different data models. High-priority data models would be allocated with more resources than lower-level data models. You may also choose not to limit the data model’s resources (no Data Group configuration). However, this will re-introduce the initial risk of resource depletion. How To Configure Data Groups? Configuring “Data Groups” requires high expertise and attention to detail. A dedicated tool is introduced to assist you in this task. Follow the directions below to acquire the tool and implement Data Groups: Prerequisite To base your calculations and make a decision on how to configure data groups you’ll require data monitoring enabled. To learn more about Sisense monitoring and the data reported read the following article: https://support.sisense.com/kb/en/article/enable-sending-sisense-monitoring-logs Step #1 – Download the “Data Groups Configuration Tool” The data groups tool (Excel Document) is attached to this article. Download a local copy to your computer. Step #2 – Access Logz.IO To access the Logz.IO platform: Log in to the Logz.io website Navigate to the “SA - Linux - Data Groups“ dashboard Set the timeframe filter to a 21-day timeframe Step #3 – Fill out the “Data Groups Configuration Tool” Document The “Elasticubes” sheet holds the data for the decision making regarding the different groups: Field Description Comment CubeName The ElastiCube name Size GB The ElastiCube’s size on disk The measure is taken from the “Data” tab ElastiCube list Estimated size in memory The estimated maximal ElastiCube size in memory Auto-calculated ([Size GB] X 2.5) Peak query memory consumption (GB) The actual maximal ElastiCube size in memory (should be smaller than the estimated maximal size) The measure is taken from the logz.io dashboard: (“Max Query Memory” field) Build frequency The frequency of the ETL Sisense performs The measure is taken from Sisense’s build scheduler Average of concurrent Query Average concurrent queries sent from dashboards to the ElastiCube The measure is taken from the logz.io dashboard: Search for the ‘Max concurrent queries per Cube’ widget and download the CSV file with the data. Calculate the average value of the “Max concurrent queries per Cube: column Business Criticality This is a business measure that determines the QoS of the ElastiCube True (High) / False (Low) Data security Is “Data Security” applied to this ElastiCube This column will help determine if the “ElastiCube Query Recycler” parameter will improve the performance or not. Explanation here under “ElastiCube Query Recycler” Data Group The Data Group this ElastiCube should be a part of Try to fill in the column by classifying your ElastiCubes according to common characteristics both in terms of business and in terms of resource consumption. Step #4 – Define your Data Groups Using the information you’ve collected and the explanations in this article – Define the data groups you wish to use. Fill in the information in the “Required data groups” sheet. The “Required data groups” sheet provides the name and configuration for each data group. Use this to tab describe the Data Groups that meet your business needs, use the “Intent” column to describe each group's purpose. The configuration in this sheet will later be used to configure the Data Groups in the Sisense Admin console: Field Description Comment Group name The Data Group’s name Intent The Data Group’s description (in the business point of view) Instances The number of query instances (pods) created in the Sisense cluster This parameter is very useful, however, increasing this value will result in increasing the Elasticube’s memory footprint. You should only consider changing this value if your CPU usage reaches 100%. The high CPU consumption is mostly caused by high users concurrency Build on Local Storage Is enabled, for Multi-node. Using local storage can decrease the time required to query the cube, after the first build on local storage Simultaneous Query Executions The maximum number of queries processed simultaneously In case your widget contains heavy formulas it is worth reducing the number to make the pressure lower. Used when experiencing out-of-memory (OOM) issues. Parallelization is a trade-off between memory usage & query performance. 8 is the optimal amount of queries % Concurrent Cores per Query Related to Mitosis and Parallelism technology. To minimize the risk of OOM, set this value to the equivalent of 8 cores. e.g. if 32-core, set to 25; for 64 cores set to 14; the value should be an even number. Can be increased up to a maximum of half the total cores - i.e. treat the default as the recommended max that can be adjusted down only. Change it when experiencing out-of-memory (OOM) issues Max Cores Maximum cores allowed to be used by the qry pod Limit for less critical groups Query: Max RAM (MB) The Max RAM will be consumed per each of the ElastiCubes in each group. Take from the column “MAX of Peak query memory consumption (GB)” in the Summary of data groups sheet Maximum RAM that is allowed to be used by the qry pod (dashboards). By each of the query instances if were increased in the Instances. Please note that 85% of the pod usage OR overall server RAM will cause a Safe-Mode exception. In the case of Dashboards qry pod) will delete and start the pod again. At the Safe Mode, the users of a dashboard would see the error of a Safe Mode and the qry pod will be deleted and started again. So the next dashboard refresh will bring the data Step #5 – Verify The “Summary of Data Groups” sheet includes a pivot chart that will calculate the max memory consumption from each data group. This value correlates to the “Query: Max RAM in MB” configuration. We need to take the maximal value of Peak query memory consumption (GB) from the Summary of data groups tab and multiply it by 1.20 to avoid Safe mode. Step #6 – Process the results and configure Sisense Data Groups Prerequisite - Read this article on how to create data-groups (5 min read): https://documentation.sisense.com/docs/creating-data-groups On the Sisense Web Application, for each new data group: Navigate to “Admin” --> Data Groups” and click the “+ Data Group” button. Fill out the information from the “Required data groups” tab. Step #7 – Monitor the Sisense Instance The final step is to follow the information in the “Sisense Monitor” and to make sure the performance is improving. Review the documentation below for more details regarding monitoring https://docs.sisense.com/main/SisenseLinux/monitoring-sisense-on-linux.htm Good luck!
Liran_Elnekave
02-10-2022 Place Best Practices
5.3KViews
3likes
0Comments
Converting a working Python script into Bash to upload as Pre/Post Build Script
I have a customized automation script for row level security written in Python using the Pysense library. We were planning to use this to upload as a Pre/Post build script on the elasticube end over the admin page to automate the script execution with the cube build. But according to the documentations, it says that it only allows a bash script. I do not have knowledge/ skill on bash scripts. Is there any easy way for me to convert this working python script with pysense library into a bash script? Any kind of help would be appreciated. Thanks Hema
HemaK
02-01-2022 Place Deploy & Connect Analytics
4.9KViews
0likes
3Comments
Matching charts with different level of number
Good day fellow members, I have a quick question. So here are temperture data (e.g. 11.5 celicius) and the number of cases (e.g.50k) and I want to correlate them in a same graph. However since the measure level is different the temperature bar went really low and almost invisible (see attached), could you tell me if there is anyway I can match them same level like Tableau?? Thank you!
Solved
Mis
11-10-2021 Place Deploy & Connect Analytics
4.9KViews
0likes
2Comments
Performance with Jupyter Notebooks in Sisense
Hi all, We have experienced some issues when deploying custom code to our ElastiCubes with custom Python algorithms. We have developed some algorithms in Jupyter Notebooks in local boxes (using Anaconda) and the execution performance of the algorithm it's about minutes. When we deployed these algorithms in Sisense takes hours to complete. Do you have any idea about what are the pods, components that are affected by the execution of custom code in Sisense? Seems that it's a resouces trouble. Server has many more resources that local laptops, but it spent much more time to execute the same notebooks. Thanks in advance! BR
jirazazabal
12-22-2021 Place Deploy & Connect Analytics
3.9KViews
0likes
3Comments