Data Prep Essentials for AI-Driven Analytics - Part 2
Learn how AI training and validation shape reliable models. Explore how structured, diverse data helps AI recognize patterns, generalize effectively, and avoid overfitting—ensuring optimal performance in real-world applications.628Views4likes0CommentsData prep essentials for AI-driven analytics - part 3
Discover how to improve data quality for AI and machine learning. In Part 3 of our Data Preparation series, learn how to fix missing values, remove duplicates, correct data types, and standardize formats—with SQL and Python scripts to help you clean your data for accurate, AI-ready analytics.280Views2likes0CommentsHave you heard about Sisense Intelligence?
Sisense Intelligence is a new suite of AI-powered tools that help product managers and developers build smarter, more intuitive analytics experiences. Features like natural language dashboards, automated insights, forecasting, and data explanations accelerate every stage of the analytics journey. Designed for seamless integration, it empowers teams to deliver fast, user-friendly insights at scale.71Views1like0CommentsUse dashboard API to get data in python dataframe
I have a widget which displays users with points and I want to fetch top 10 users from that widget.Also I need to select current month "01/2025" from the filter.Let me know how this can be done But I'm getting weired error.Please help! I'm an admin so I have a token from my profile section. import requests url = "https://bp.sisense.com/api/v1/dashboards/4577fc4415dd7d003339bfdc/widgets/4577c7716dd7d554441bjae" # authentication headers = { "Content-Type": "application/json", "Authorization": "" } # Send the request response = requests.get(url, headers=headers) response # Check if the request was successful data = response.json() data Error: {'error': {'code': 5002, 'message': 'Invalid token.', 'status': 401, 'httpMessage': 'unauthorized'}}2.1KViews0likes10CommentsExploring RAG: A Sisense-Based Approach with BigQuery and Gemini
Exploring RAG: A Sisense-Based Approach with BigQuery and Gemini Continuing from our previous article, Automated Machine Learning with Sisense Fusion: A Practical Guide, where we discussed how Sisense and AutoML simplify complex machine learning workflows, I am now excited to introduce an advanced feature: a chatbot interface powered by Retrieval-Augmented Generation (RAG) and a large language model (LLM) like Gemini. This enhancement allows users to interact with their data in natural language through Sisense's Native Blox widget. Imagine asking questions like "How many products did I sell per category?" and receiving insightful answers instantly. This not only improves data accessibility but also bridges the gap between technical data repositories and business decision-making. In this article, I’ll cover the first step of enabling this functionality: setting up a seamless integration pipeline between Sisense and Google BigQuery, embedding table and column metadata, and preparing the data for efficient querying. Important Note: Experimental Project: This implementation is an experimental project and not an official Sisense feature. It demonstrates how Sisense’s functionalities can be utilized to build a custom Retrieval-Augmented Generation (RAG) pipeline or how you can use this example by importing the provided notebooks. Google Ecosystem: The example provided is specific to the Google ecosystem, utilizing services like BigQuery (BQ) and Gemini for query generation and processing. Associated Costs: Please note that each prompt sent to the chatbot incurs a cost, which will be billed to your Google Cloud Platform (GCP) account. This includes charges for LLM processing and database queries. Building the Foundation: Data Preparation and Embeddings To enable natural language interactions with your data, we first need to establish a robust data infrastructure. This includes creating a vector store for embeddings in Google BigQuery (BQ) and preparing metadata about tables and columns. The process is fully automated using a pre-built Sisense custom code notebook, which simplifies embedding generation and management. Dataset in BigQuery The journey begins with your dataset in Google BigQuery. The notebook retrieves metadata such as table and column names, descriptions, and schema information from BigQuery’s Information_Schema. If any descriptions are missing, the system automatically generates them, ensuring comprehensive metadata coverage. Setting Up the Vector Store in BigQuery The vector store acts as the backbone for similarity searches within the RAG system. Using the custom notebook, the system: Creates the vector store: A structured repository to store embeddings for tables and columns. Organizes metadata: Ensures all table and column descriptions are structured and accessible. Generating Table and Column Descriptions Missing descriptions can hinder data understanding. The notebook includes a Description Agent that automatically generates meaningful text for: Tables: Contextual descriptions based on schema and usage. Columns: Descriptions highlighting their role within the dataset. These enhancements ensure the metadata is both informative and ready for embedding generation. Creating Embeddings with Vertex AI To enable semantic search, metadata descriptions are converted into numerical embeddings using Vertex AI’s TextEmbeddingModel. This is facilitated by the EmbedderAgent, which: Accepts strings or lists of strings (e.g., column descriptions). Generates embeddings through Vertex AI. Handles both single and batch processing for efficiency. Efficient Embedding with Chunked Processing For large datasets, embeddings are generated in chunks using the get_embedding_chunked function. This ensures: Scalability: Handles datasets of all sizes without performance issues. Parallel Processing: Processes text chunks simultaneously to speed up the workflow. Structured Outputs: Embeddings are returned in a structured DataFrame for storage or analysis. Storing Embeddings in the Vector Store The final step is storing these embeddings in the BigQuery vector store. This ensures that: Similarity searches are fast and efficient. Metadata is always accessible for chatbot interactions. ALT text: A screenshot of a data table displaying various columns such as "table_schema," "column_name," "data_type," "source_type," and several other attributes. The table shows sample values and metadata related to a database structure, organized in rows and columns. How the Chatbot Interface Works Now that the foundation for embeddings and metadata storage is set, let’s explore the chatbot interface in action. Imagine opening the chatbot in the Blox widget and asking a question about your dataset. Within moments, the chatbot responds in natural language, providing actionable insights. But what exactly happens under the hood to generate this seamless interaction? RAG Notebook and the Chatbot Workflow The chatbot operates using a pre-built RAG custom code transformation notebook, which orchestrates the end-to-end process. With this notebook, the entire pipeline—from understanding the query to generating the response—is automated. The notebook uses multiple specialized agents, each responsible for a specific task, ensuring precision and efficiency at every step. SQL Query Builder Agent BuildSQLAgent This agent specializes in constructing SQL queries for BigQuery. It uses the LLM to analyze the user’s natural language query and matches it with table schemas and column details from the vector store. It outputs a fully formed SQL query tailored to the user’s dataset and question. SQL Validation Agent ValidateSQLAgent The ValidateSQLAgent validates the SQL query before execution using a Large Language Model (LLM). Validation ensures the query adheres to essential rules, including: The presence of all referenced columns and tables. Proper table relationships and join conditions based on the schema. Formatting and compliance with BigQuery-specific SQL standards. Validation occurs during the debugging process, specifically within the DebugSQLAgent, to identify potential errors before attempting a dry run or execution. It provides a detailed JSON response: If valid, the process moves to the next step (dry run or execution). If invalid, the DebugSQLAgent uses the error details to refine the query iteratively. SQL Debugging Loop Agent DebugSQLAgent This agent runs the debugging loop to refine queries that fail validation or execution. The process includes: Validation: The query is passed to ValidateSQLAgent to check syntax, schema compliance, and structure. If valid, the query is ready for execution. Dry Run: If validation passes, the query is tested using a dry run via the test_sql_plan_execution function to confirm execution readiness. Execution: Once validation and dry runs succeed, the final query is executed using the retrieve_df function, which returns results as a DataFrame. Iterative Refinement: If the query fails either validation or the dry run, the DebugSQLAgent uses the LLM to troubleshoot and generate an alternative query. The loop repeats until a valid query is generated or the maximum debugging rounds are reached. This agent ensures the final query is: Correctly structured and semantically valid. Optimized for performance and aligns with the original user intent. Response Agent ResponseAgent This agent translates the SQL query results into natural language. It bridges the gap between technical SQL outputs and user-friendly communication. By combining the query results with the user’s original question, it crafts a clear and relevant response. How the Workflow Executes Here’s the step-by-step process for generating a response: User Query Embedding The EmbedderAgent converts the user’s natural language question into a numerical embedding. Using BigQuery native vector search, the system retrieves similar embeddings from the vector store created in the first phase. Schema and Content Retrieval Based on the retrieved embeddings, the system fetches relevant table and column schema details from the vector store. SQL Query Generation The BuildSQLAgent uses the retrieved schema details to construct an SQL query that aligns with the user’s question. SQL Validation and Execution The ValidateSQLAgent checks the generated SQL for accuracy and potential errors. If the SQL passes validation, it is executed against the BigQuery database. Debugging (if needed) If the query fails or generates an error, the DebugSQLAgent refines it iteratively until a valid query is produced. Response Generation The ResponseAgent uses the query results and the user’s original prompt to generate a natural language response. If the system fails to generate a valid response, it communicates the issue to the user. Conclusion By combining the foundational embedding process with this RAG-powered workflow, the chatbot transforms how users interact with their data. From seamless SQL query generation to delivering natural language responses, the system exemplifies the power of Sisense Fusion and advanced AI tools to simplify data-driven decision-making. As always, please reach out to your Customer Success Manager (CSM) if you would like to implement this in your own environment.2.2KViews1like0CommentsAutomated Machine Learning with Sisense Fusion: A Practical Guide
Automated Machine Learning with Sisense Fusion: A Practical Guide In this article, we’ll explore how using Sisense and AutoML (Automated Machine Learning) can simplify the process of applying machine learning to real-world business problems. AutoML takes care of tasks such as data preprocessing, model selection, and hyperparameter optimization without requiring deep expertise in machine learning. Let’s dive into some practical business challenges where machine learning can make a significant impact. Understanding the Business Use Cases To illustrate how machine learning (ML) solves business challenges, we’ll look at two real-world use cases: Optimizing Inventory for a Popular Retail Product (Regression Problem): Imagine a popular clothing store trying to manage stock for a trendy item that frequently sells out. By applying machine learning, the store could predict the future demand for this product. This is an example of a regression problem, where the model forecasts continuous values—such as the number of items to stock—based on historical sales patterns, seasonal trends, and customer behaviors. This allows the store to optimize its inventory, avoid shortages, and maximize sales, demonstrating the power of machine learning to enhance operational efficiencyto stock) based on historical data and other influencing factors. Improving Customer Retention in Subscription Services (Classification Problem): For subscription-based businesses, predicting customer churn is essential. By analyzing data such as usage patterns, customer engagement, and support history, machine learning can predict whether a customer is likely to cancel their subscription. This classification model enables businesses to proactively target at-risk customers with personalized offers or support, helping to improve retention and customer satisfaction. The predictive power of machine learning transforms how businesses engage with their users, reducing churn and increasing long-term loyalty. Difference Between Regression and Classification Regression models are used to predict continuous values, such as quantities (e.g., how many products to stock) or prices. Classification models are used to predict categorical outcomes, such as Yes/No (e.g., whether a customer will churn) or other distinct categories (e.g., fraud/no fraud). Hands-On: Using Sisense Fusion and AutoML Everything demonstrated in this video, including the integration of machine learning models, is powered by Sisense Fusion’s native APIs and features. The web app is simply a wrapper that adds a shiny interface, but all actions performed in the demo—whether selecting data models, training machine learning models, or making predictions—can be done entirely within Sisense Fusion’s native platform without the need for external code or API calls. The power of Sisense Fusion lies in its seamless ability to manage and integrate these machines' learning tasks natively, making it easy for users to build, deploy, and interact with models without needing deep technical expertise or external integrations. The web app just provides a visually engaging way to demonstrate the capabilities of the Sisense Fusion platform. Step 1: Selecting the Data Model and Dataset In the first part of the video, we use the web app that leverages Sisense Fusion’s native API features to select the data model we want to work with. Here’s a breakdown of what happens: Selecting a Data Model: After starting the app, we selected the data model available in Sisense that contains our data. Choosing the Dataset: Once the data model is selected, the app displays all the tables or datasets contained in that model. We then select the dataset we want to train the machine learning model on. Target Variable Selection: After selecting the dataset, the app presents all the columns within the dataset. We select the target variable (the column we want to predict). For example, in customer churn prediction, this could be the Exited column, which indicates whether a customer has churned (1) or not (0). Selecting the Prediction Type: Next, we select whether the task is a regression or classification problem, based on the target variable. Since we are predicting customer churn, we select a classification. Storing Information: Once all selections are made, the app stores this information. This data will later be used when we select the machine learning model for training. Step 2: Exploratory Data Analysis (EDA) After submitting behind the scenes, a Flask application generates an Exploratory Data Analysis (EDA) report based on the dataset. This report provides important insights, such as: Number of customer records. Missing values in the dataset. Relationships between variables. These insights help us select relevant columns to ensure the machine learning model performs optimally. Step 3: Model Training Options We can have multiple options for training our model within Sisense Fusion for example: Auto-Sklearn: This is an open-source AutoML library that automates the model training process. Since it runs locally within Sisense, data never leaves the platform, ensuring data security. However, model training can be computationally expensive, meaning your Sisense cluster should have adequate resources to handle it. AWS Autopilot: This option leverages Amazon Web Services (AWS) infrastructure to train the model, offering more reliable performance. However, it incurs additional costs and requires your data to be sent to AWS for processing. After selecting the model training method, the process begins automatically, and you’ll see the status on the screen as it progresses. Part 4: Integration with AWS SageMaker and Dashboard Creation After selecting the AWS option for model training in the web app a new Custom Code Table is added to the Sisense data model. This Custom Code Table automates the training and deployment of the machine learning model using AWS SageMaker Autopilot. Here’s how it works: Input Parameters for the Notebook The custom code notebook contains a set of input parameters that are passed based on the selections made earlier in Part 1 (dataset and target column). Other parameters include: Dataset: The table you selected for model training. Target Column: The data column you want to predict (churn, in this case). Drop Feature: Columns you wish to exclude from model training (optional). AWS Credentials: Paths to AWS access keys and secret keys to authenticate with AWS. S3 Bucket Name: A unique S3 bucket where the dataset is stored for training. AWS Role ARN: The role with the necessary permissions to access S3 and SageMaker. The notebook code reads these parameters and uses them to call the AWS SageMaker Autopilot API, which automates the model training and deployment process. The trained model is deployed as an endpoint on SageMaker, allowing for online predictions. Creating a Blox widget with dynamic input fields The custom code notebook also contains code that dynamically creates a Sisense dashboard and a Blox widget based on the dataset selected for model training. Here’s what happens: Dynamic Input Fields: Based on the feature columns in the dataset, the Blox widget dynamically generates input fields (boxes) for each feature. This is crucial for online predictions, as it allows users to input new data for the model to predict outcomes in real time. Predict Button: A predict button is added to the widget. When a user inputs new data into the input boxes and clicks the predict button, the system requests the SageMaker endpoint, passing the input data. The model processes this data and returns the prediction, which is displayed in the widget. This setup enables real-time, online predictions directly from the Sisense dashboard, with predictions being powered by the AWS SageMaker endpoint. The dynamic nature of the widget allows the interface to adjust based on the dataset used for training, making the system flexible and user-friendly. The custom code table outputs key information about the trained model and its deployment status, which includes the following columns: Model Name: The name assigned to the trained machine learning model. Metric Name: The evaluation metric used to assess the model's performance, such as accuracy, precision, or recall. Score: The metric score that indicates how well the model performed during evaluation. Local Path: The path within the Sisense environment where the model is stored. Model S3 Location: The S3 location where the trained model is saved after deployment. AWS Model Name: The name of the model is registered in AWS SageMaker. Endpoint Name: The name of the deployed SageMaker endpoint used for making real-time predictions. This output allows users to track key details about the model, including where it's stored, its performance, and the endpoint used for predictions. Step 5: Saving Model Versions The notebook not only trains the model but also saves important metadata within Sisense’s file management storage. This allows you to maintain version control over your models. For each model training session, I store details such as the model metrics (accuracy, precision, etc.) and save each model in a folder based on the timestamp. This ensures easy traceability and allows for multiple versions of models to be stored and retrieved as needed. Step 6: Making Predictions Once the model is trained, we move on to predictions. There are two ways to handle predictions: Batch Predictions (Offline): This method allows you to process thousands of records at once. It's suitable for scenarios where real-time predictions are not required, and predictions can be generated in bulk. Online Predictions (Real-Time): For real-time applications, you can provide individual customer records and receive immediate predictions. This is ideal for real-time decision-making, such as predicting whether a new customer will churn based on their current attributes. Online Predictions (Real-Time): When a custom code table was built it automatically generated a Sisense dashboard and a Blox widget based on the input features used during model training. This integration allows predictions to be embedded directly into Sisense dashboards, enabling users to interact with the model seamlessly. Here’s how it works: The Blox widget takes the input data from the user and sends an API request to Sisense’s custom code transformation. In the case of Auto-Sklearn, the pre-trained model is loaded locally within Sisense, as the model was trained and stored in the local environment. For AWS SageMaker, instead of loading a local model, the system sends a request to the SageMaker endpoint (where the model is deployed) for predictions. The prediction results, whether generated locally with Auto-Sklearn or through the SageMaker API, are returned to the dashboard and displayed within the Blox widget in real-time. This process ensures that predictions are fully integrated into the Sisense environment, providing an interactive and real-time experience for users, with the flexibility to use either local or cloud-based models depending on their needs. Conclusion Sisense Fusion, combined with AutoML, offers an efficient and powerful way to integrate machine learning into real-world business applications. Whether using Auto-Sklearn for local, cost-efficient model training or AWS Autopilot for cloud-based scalability, Sisense provides seamless version control and easy integration into dashboards, making it a comprehensive platform for automating machine learning at scale. If you’re interested in integrating this solution into your Sisense deployment, please reach out to your dedicated Customer Success Manager (CSM) for further assistance. Related Content: https://docs.sisense.com/main/SisenseLinux/ai-overview.htm https://academy.sisense.com/gen-ai Related Content:2.8KViews2likes0Comments