Automated Machine Learning with Sisense Fusion: A Practical Guide

himanshu_negi · ‎10-21-2024

Automated Machine Learning with Sisense Fusion: A Practical Guide

In this article, we’ll explore how using Sisense and AutoML (Automated Machine Learning) can simplify the process of applying machine learning to real-world business problems. AutoML takes care of tasks such as data preprocessing, model selection, and hyperparameter optimization without requiring deep expertise in machine learning. Let’s dive into some practical business challenges where machine learning can make a significant impact.

Understanding the Business Use Cases

To illustrate how machine learning (ML) solves business challenges, we’ll look at two real-world use cases:

Optimizing Inventory for a Popular Retail Product (Regression Problem): Imagine a popular clothing store trying to manage stock for a trendy item that frequently sells out. By applying machine learning, the store could predict the future demand for this product. This is an example of a regression problem, where the model forecasts continuous values—such as the number of items to stock—based on historical sales patterns, seasonal trends, and customer behaviors. This allows the store to optimize its inventory, avoid shortages, and maximize sales, demonstrating the power of machine learning to enhance operational efficiencyto stock) based on historical data and other influencing factors.
Improving Customer Retention in Subscription Services (Classification Problem): For subscription-based businesses, predicting customer churn is essential. By analyzing data such as usage patterns, customer engagement, and support history, machine learning can predict whether a customer is likely to cancel their subscription. This classification model enables businesses to proactively target at-risk customers with personalized offers or support, helping to improve retention and customer satisfaction. The predictive power of machine learning transforms how businesses engage with their users, reducing churn and increasing long-term loyalty.

Difference Between Regression and Classification

Regression models are used to predict continuous values, such as quantities (e.g., how many products to stock) or prices.
Classification models are used to predict categorical outcomes, such as Yes/No (e.g., whether a customer will churn) or other distinct categories (e.g., fraud/no fraud).

Hands-On: Using Sisense Fusion and AutoML

Everything demonstrated in this video, including the integration of machine learning models, is powered by Sisense Fusion’s native APIs and features. The web app is simply a wrapper that adds a shiny interface, but all actions performed in the demo—whether selecting data models, training machine learning models, or making predictions—can be done entirely within Sisense Fusion’s native platform without the need for external code or API calls.

The power of Sisense Fusion lies in its seamless ability to manage and integrate these machines' learning tasks natively, making it easy for users to build, deploy, and interact with models without needing deep technical expertise or external integrations. The web app just provides a visually engaging way to demonstrate the capabilities of the Sisense Fusion platform.

Step 1: Selecting the Data Model and Dataset

In the first part of the video, we use the web app that leverages Sisense Fusion’s native API features to select the data model we want to work with. Here’s a breakdown of what happens:

Selecting a Data Model:

After starting the app, we selected the data model available in Sisense that contains our data.

Choosing the Dataset:

Once the data model is selected, the app displays all the tables or datasets contained in that model. We then select the dataset we want to train the machine learning model on.

Target Variable Selection:

After selecting the dataset, the app presents all the columns within the dataset. We select the target variable (the column we want to predict). For example, in customer churn prediction, this could be the Exited column, which indicates whether a customer has churned (1) or not (0).

Selecting the Prediction Type:

Next, we select whether the task is a regression or classification problem, based on the target variable. Since we are predicting customer churn, we select a classification.

Storing Information:

Once all selections are made, the app stores this information. This data will later be used when we select the machine learning model for training.

Step 2: Exploratory Data Analysis (EDA)

After submitting behind the scenes, a Flask application generates an Exploratory Data Analysis (EDA) report based on the dataset. This report provides important insights, such as:

Number of customer records.
Missing values in the dataset.
Relationships between variables.

These insights help us select relevant columns to ensure the machine learning model performs optimally.

Step 3: Model Training Options

We can have multiple options for training our model within Sisense Fusion for example:

Auto-Sklearn:

This is an open-source AutoML library that automates the model training process. Since it runs locally within Sisense, data never leaves the platform, ensuring data security. However, model training can be computationally expensive, meaning your Sisense cluster should have adequate resources to handle it.

AWS Autopilot:

This option leverages Amazon Web Services (AWS) infrastructure to train the model, offering more reliable performance. However, it incurs additional costs and requires your data to be sent to AWS for processing.

After selecting the model training method, the process begins automatically, and you’ll see the status on the screen as it progresses.

Part 4: Integration with AWS SageMaker and Dashboard Creation

After selecting the AWS option for model training in the web app a new Custom Code Table is added to the Sisense data model. This Custom Code Table automates the training and deployment of the machine learning model using AWS SageMaker Autopilot. Here’s how it works:

Input Parameters for the Notebook

The custom code notebook contains a set of input parameters that are passed based on the selections made earlier in Part 1 (dataset and target column). Other parameters include:

Dataset: The table you selected for model training.
Target Column: The data column you want to predict (churn, in this case).
Drop Feature: Columns you wish to exclude from model training (optional).
AWS Credentials: Paths to AWS access keys and secret keys to authenticate with AWS.
S3 Bucket Name: A unique S3 bucket where the dataset is stored for training.
AWS Role ARN: The role with the necessary permissions to access S3 and SageMaker.

The notebook code reads these parameters and uses them to call the AWS SageMaker Autopilot API, which automates the model training and deployment process. The trained model is deployed as an endpoint on SageMaker, allowing for online predictions.

Creating a Blox widget with dynamic input fields

The custom code notebook also contains code that dynamically creates a Sisense dashboard and a Blox widget based on the dataset selected for model training. Here’s what happens:

Dynamic Input Fields: Based on the feature columns in the dataset, the Blox widget dynamically generates input fields (boxes) for each feature. This is crucial for online predictions, as it allows users to input new data for the model to predict outcomes in real time.
Predict Button: A predict button is added to the widget. When a user inputs new data into the input boxes and clicks the predict button, the system requests the SageMaker endpoint, passing the input data. The model processes this data and returns the prediction, which is displayed in the widget.

This setup enables real-time, online predictions directly from the Sisense dashboard, with predictions being powered by the AWS SageMaker endpoint. The dynamic nature of the widget allows the interface to adjust based on the dataset used for training, making the system flexible and user-friendly.

The custom code table outputs key information about the trained model and its deployment status, which includes the following columns:

Model Name: The name assigned to the trained machine learning model.
Metric Name: The evaluation metric used to assess the model's performance, such as accuracy, precision, or recall.
Score: The metric score that indicates how well the model performed during evaluation.
Local Path: The path within the Sisense environment where the model is stored.
Model S3 Location: The S3 location where the trained model is saved after deployment.
AWS Model Name: The name of the model is registered in AWS SageMaker.
Endpoint Name: The name of the deployed SageMaker endpoint used for making real-time predictions.

This output allows users to track key details about the model, including where it's stored, its performance, and the endpoint used for predictions.

Step 5: Saving Model Versions

The notebook not only trains the model but also saves important metadata within Sisense’s file management storage. This allows you to maintain version control over your models.

For each model training session, I store details such as the model metrics (accuracy, precision, etc.) and save each model in a folder based on the timestamp. This ensures easy traceability and allows for multiple versions of models to be stored and retrieved as needed.

Step 6: Making Predictions

Once the model is trained, we move on to predictions. There are two ways to handle predictions:

Batch Predictions (Offline):

This method allows you to process thousands of records at once. It's suitable for scenarios where real-time predictions are not required, and predictions can be generated in bulk.

Online Predictions (Real-Time):

For real-time applications, you can provide individual customer records and receive immediate predictions. This is ideal for real-time decision-making, such as predicting whether a new customer will churn based on their current attributes.

Online Predictions (Real-Time):

When a custom code table was built it automatically generated a Sisense dashboard and a Blox widget based on the input features used during model training. This integration allows predictions to be embedded directly into Sisense dashboards, enabling users to interact with the model seamlessly.

Here’s how it works:

The Blox widget takes the input data from the user and sends an API request to Sisense’s custom code transformation.
In the case of Auto-Sklearn, the pre-trained model is loaded locally within Sisense, as the model was trained and stored in the local environment.
For AWS SageMaker, instead of loading a local model, the system sends a request to the SageMaker endpoint (where the model is deployed) for predictions.
The prediction results, whether generated locally with Auto-Sklearn or through the SageMaker API, are returned to the dashboard and displayed within the Blox widget in real-time.

This process ensures that predictions are fully integrated into the Sisense environment, providing an interactive and real-time experience for users, with the flexibility to use either local or cloud-based models depending on their needs.

Conclusion

Sisense Fusion, combined with AutoML, offers an efficient and powerful way to integrate machine learning into real-world business applications. Whether using Auto-Sklearn for local, cost-efficient model training or AWS Autopilot for cloud-based scalability, Sisense provides seamless version control and easy integration into dashboards, making it a comprehensive platform for automating machine learning at scale.

If you’re interested in integrating this solution into your Sisense deployment, please reach out to your dedicated Customer Success Manager (CSM) for further assistance.

Related Content:

https://docs.sisense.com/main/SisenseLinux/ai-overview.htm

https://academy.sisense.com/gen-ai

Related Content:

Sisense Community

Automated Machine Learning with Sisense Fusion: A Practical Guide