Choosing the Right Data Model
This post has become outdated. You can find guidance on choosing a data model on our documentation site here. https://docs.sisense.com/main/SisenseLinux/choosing-the-right-data-model.htm Introduction Customers often run into the question of which data model they should use (an ElastiCube, a Live model, or a Build-to-Destination). The following article presents some of the aspects you should consider when choosing between them. Sisense recommends that you discuss your needs and requirements with Sisense's technical team during the Jumpstart process, so the result will best meet your business expectations. Table of Contents Definitions The ElastiCube Data Model Importing data into an ElastiCube data model allows the customer to pull data from multiple data sources on-demand or at a scheduled time, and create a single source of truth inside Sisense. The imported data can then be transformed and aggregated to meet your business needs. Once imported, the data snapshot is used to generate analytical information. The process of importing the data, known as a "Build", includes the following steps: Extract the data: Query the different data source(s) for data. Load the data: Write the data extracted to Sisense (the local MonetDB). Transform the data: Transform the local MonetDB (using SQL queries). To read more about ElastiCubes, see Introducing ElastiCubes. The Live Data Model Using a Live data model does not require importing data. Only the data's schema needs to be defined. Once configured, analytical information required by the user is queried directly against the backend data source. To read more about Live models, see Introducing Live Models. Determining Factors Refresh Rate One of the most fundamental aspects of determining your data model is your data's refresh rate. The data refresh rate refers to the age of the data in your dashboards: For Live models, the data displayed on your dashboards is near-real-time, as every query is passed directly to the backend database. A good example of using a live model (due to refresh rate requirements) is a dashboard that shows stock prices. For ElastiCubes, the data displayed on your dashboard is current to the last successful build event. Every query is passed to the local database for execution. A good example of using an ElastiCube (due to refresh rate requirements) is a dashboard that shows historical stock prices. In this case, a daily ETL process will provide results that are good enough. To make a choice based on this factor, answer the following questions: How frequently do I need to pull new data from the database? Do all my widgets require the same data refresh frequency? How long does an entire ETL process take? Data Transformation Options The ETL process includes a "Transformation" phase. This transformation phase usually includes: Migrating the data tables into a dim-fact schema Enriching your data Pre-aggregating the data to meet your business needs The amount of data transformation on Sisense helps determine the suitable data model: For Live models, Sisense allows minimal to no data transformation. Data is not imported before a query is issued from the front end. Therefore, data cannot be pre-conditioned or pre-aggregated. Most data sources used by Live models are data warehouses that may perform all data preparations themselves. For ElastiCubes, data is imported before a query is issued from the front end. Therefore, it may be pre-conditioned and pre-aggregated. A user may customize the data model to optimally answer their business questions. To make a choice based on this factor, answer the following questions: Is my data in a fact-dim schema? Does my data require enriching or pre-conditioning? Can my data be pre-aggregated? Operational Database Load Your operational databases do more than serve your analytical system. Any application loading the operational databases should be closely examined: For Live models, Sisense will constantly query information from your operational databases, and feed it into your dashboard widgets. This occurs every time a user loads a dashboard. For ElastiCubes, Sisense highly stresses your operational databases during an ETL process while reading all tables. To make a choice based on this factor, answer the following questions: Does the analytical system stress my operational database(s)? Can the query load be avoided by using a "database replica"? Operational Database Availability Your operational database(s) availability is critical for collecting information for your analytical system. For Live models, all queries are redirected to your data sources. If the data source is not available, widgets will generate errors and not present any data. For ElastiCubes, data source availability is critical during the ETL process. If the data source is not available, the data in your widgets will always be available, but not necessarily be up to date. To make a choice based on this factor, answer the following questions: How frequently are analytical data sources offline? How critical is my analytical system? Is being offline (showing out-of-date information) acceptable? Additional Vendor Costs Various database vendors use a chargeback charging model, meaning that you will be charged by the amount of data you pull from the database or the computational power required to process your data. For Live models, every time a user loads a dashboard, each widget will trigger (at least) one database query. A combination of a chargeback charging model and a large user load may result in high costs. For ElastiCubes, every time the user triggers an ETL process, a large amount of data is queried from the database and loaded into Sisense. To make a choice based on this factor, answer the following questions: What is the number of users using my dashboards / What is my "build" frequency? Which data model will result in lower costs? What is the tipping point? Are you willing to pay more for real-time data? Database Size For ElastiCubes, please refer to these documents: Introducing ElastiCubes Minimum Requirements for Sisense in Linux Environments For Live models, there is no limitation as data is not imported to Sisense, only the data's schema. To make a choice based on this factor, answer the following questions: What is the amount of data I need in my data model? What is the amount of history I need to store? Can I reduce the amount of data (e.g., trimming historical data? reducing the number of columns? etc.) Query Performance Query performance depends on the underlying work required to fetch data and process it. Although every widget generates a query, the underlying data model will determine the work necessary to execute it. For ElastiCubes, every query is handled inside Sisense: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an internal database. The query result is transformed back to JAQL syntax and returned to the client-side. For Live models, every query is forwarded to an external database and then processed internally: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an external database. Sisense waits for the query to execute. Once returned, the query result is transformed back into JAQL syntax and returned to the client-side. To make a choice based on this factor, answer the following questions: How sensitive is the client to a delay in the query's result? When showing real-time data, is this extra latency acceptable? Connector Availability Sisense supports hundreds of data connectors (see Data Connectors). However, not all connectors are available for live data models. The reasoning behind this has to do with the connector's performance. A "slow connector" or one that requires a significant amount of processing may lead to a bad user experience when using Live models (that is, widgets take a long time to load): For ElastiCubes, Sisense allows the user to utilize all the data connectors. For Live models, Sisense limits the number of data connectors to a few high-performing ones (including most data warehouses and high-performing databases). To make a choice based on this factor, answer the following questions: Does my data source's connector support both data model types? Should I consider moving my data to a different data source to allow live connectivity? Caching Optimization Sisense optimizes performance by caching query results. In other words, query results are stored in memory for easier retrieval, in case they are re-executed. This ability provides a great benefit and improves the end-user experience: For ElastiCubes, Sisense recycles (caches) query results. For Live models, Sisense performs minimal caching to make sure data is near real-time. (Note that caching can be turned off upon request.) To make a choice based on this factor, answer the following questions: Do I want to leverage Sienese's query caching? How long do I want to cache data? Dashboard Design Limitations Specific formulas (such as Mode and Standard Deviation) and widget types (such as Box plots or Whisker plots) may result in "heavy" database queries: For Live models, Sisense limits the use of these functions and visualizations as the results of these formulas and visualizations may take a long time, causing a bad user experience. For ElastiCubes, Sisense allows the user to use them, as processing them is internal to Sisense. To make a choice based on this factor, answer the following questions: Do I need these functions and visualizations? Can I pre-aggregate the data and move these calculations to the data source instead of Sisense? See also Choosing a Data Strategy for Embedded Self-Service.3.7KViews2likes1CommentRestoring missing connector
Sometimes connectors can go missing for some reason, leading to a breakdown in data connectivity. Restoring these connectors is crucial to maintaining a smooth data flow within your Sisense environment. This article describes the process of restoring a missing connector in Sisense; additionally, the added tutorial describes how to restore the missing connectors in the Sisense Linux environment.1.2KViews0likes1CommentThe Ability To Manage JDBC Connections
Question How to change the connection strings in an easy way? Answer Using the REST API you can update the data connection strings directly. Returns a list of all your connections: GET http://localhost:8081/api/v1/connection Updates a connection: PATCH http://localhost:8081/api/v1/connection/{id}680Views0likes0CommentsPersonio API Pre-Processing Data
Question Is there a way to add pre-processing (python script) to the data that will be uploaded from a custom API connector? Or should I run the script locally on a fixed schedule and use the Sisense API to upload after the data is processed? Answer Some info about your questions (as we already come across with that many times before): You can run any python script that you would like automatically before or after a Build is done using an ElastiCube Plugin , scroll to the part where is talks about python scripts. Sisense doesn't provide an API to modify data within your cube. So if you would like the Python script to do some extra processing before it gets pushed to the cube - you can save that data locally on the Sisense machine in csv format and then use another another post-build plugin to delete that csv once the build is finished. - Notice that you can use any python version you'd like (in their code sample they are using the old 2.7)731Views0likes0CommentsSisense Data Pipeline Best Practices
Determining how to construct data pipelines to assure optimized performance, minimized cost, and maintain quality controls is a complex challenge. This article describes a few architectural choices in designing data pipelines when implementing Sisense.3.2KViews6likes0CommentsReducing Windows Memory Pressure by Removing unused JVM Connectors
Reducing Windows Memory Pressure by Removing unused JVM Connectors A simple technique can reduce memory pressure on Windows Sisense versions by inactivating unused JM connectors. To do this you should search for the Sisense JVM Connectors Configuration application from the Windows Start menu. After opening the Sisense JVM Connectors Configuration application simply uncheck any connectors which are not in use and press the Save button. The amount of memory saved varies depending on which connector. On average each connector uses 2GB. So if you inactivate 5 connectors, you just saved 10 GB of memory. If you need additional help, please contact Sisense Support. PLEASE NOTE: If you cannot locate the JVM as seen in the screenshot, the executable name is "usedConnectorsEditor.bat" and it should be located in the following directory C:\Program Files\Sisense\DataConnectors\JVMContainer\bin As seen in below picture.928Views1like0CommentsGoogle Analytics CDATA Connector
In May 2022, after Google’s April announcement deprecating an old API, Sisense announced that we will be deprecating the native Google Analytics connector. Despite these hurdles, users can still use CDATA drivers as a workaround in order to connect. This article will show you two ways of using a CDATA driver to connect to a Google Analytics data source.1.8KViews2likes0CommentsDetermine Driver's Class Name for JDBC Connector
Determine Driver's Class Name for JDBC Connector Question How do you determine the Driver’s Class Name from the JDBC driver file that is installed with the Sisense JDBC Connector? Prerequisites: Sisense Data Administrator may require a separate JDBC Connector to connect to a specific data source. https://www.sisense.com/data-connectors/ https://www.cdata.com/solutions/bi/sisense/ A download of the specific JDBC driver is required. https://docs.sisense.com/main/SisenseLinux/connecting-to-custom-connectors-with-jdbc-drivers.htm Relevant Background Information: As per Sisense documentation: https://documentation.sisense.com/docs/connecting-to-dynamodb. Steps are described in this document for installing a DynamoDB JDBC driver: Reference in the document section: Adding DynamoDB Tables to your ElastiCube (Step 7) The document provides the Driver’s Class Name to use. cdata.jdbc.amazondynamodb.AmazonDynamoDBDriver The document provides the Driver’s Class Name but does not contain details on how to obtain it. Answer 1. Let's use for example the SalesForce Marketing Cloud JDBC Driver https://www.cdata.com/drivers/salesforcemarketing/jdbc/ Open the JDBC driver file as an archive file. Filename: cdata.jdbc.sfmarketingcloud.jar Note: You can use any other application like Winzip, 7zip, or Winrar, to extract the jar file contents. 2. Navigate to the META-INF/services/ subdirectory within the archive. 3. Extract the file named java.sql.Driver to view. 4. Open java.sql.Driver in a text editor. Driver’s Class Name: cdata.jdbc.sfmarketingcloud.SFMarketingCloudDriver Additional Notes: These steps can be applied to installing any JDBC driver that can be supported with Sisense. If you need additional help, please contact Sisense Support or create a Support Case. Document References: https://www.sisense.com/data-connectors/ https://www.cdata.com/solutions/bi/sisense/ https://docs.sisense.com/main/SisenseLinux/connecting-to-custom-connectors-with-jdbc-drivers.htm https://documentation.sisense.com/docs/copying-a-cdata-jar-file-installed-locally-to-a-remote-server https://www.cdata.com/drivers/salesforcemarketing/jdbc/5.3KViews1like0Comments