Knowledge Base Article

Understanding custom code tables: why “infer from notebook” fails and when to avoid it [Linux]

Custom Code Tables allow you to generate ElastiCube data using Python and Jupyter notebooks. However, confusion often arises around how data is actually ingested and why the Infer from Notebook option can fail in real-world scenarios. This article explains how the feature works internally, why inference can be fragile, and when a file-based approach is recommended. Applies to: Sisense for Linux (Cloud and On-Prem)

Step-by-step guide

How custom code tables work (high-level)

When an ElastiCube build runs a Custom Code Table:

  1. Sisense executes the notebook.
  2. The notebook must return a pandas DataFrame at the end of execution.
  3. Sisense temporarily serializes that DataFrame to an internal CSV file.
  4. The CSV is ingested into the ElastiCube.
  5. The temporary file is deleted after the build completes.

Important:
Regardless of configuration, only the returned DataFrame is ingested. Displaying a DataFrame in the notebook UI does not load data into the cube.

Option 1: “Infer from Notebook” (schema inference)

What it does

  • Runs the notebook.
  • Reads the returned DataFrame.
  • Automatically infers column names and data types.
  • Saves the inferred schema to the table definition.

This is intended to simplify setup by avoiding manual schema definition.

Why it can fail:

Schema inference is fragile and commonly fails with:

  • Mixed data types in a column
  • Null-heavy columns
  • Date / datetime fields
  • Large datasets
  • Nested or irregular JSON structures

Failures typically surface as build-time errors, such as:

  • NPE in the guess column type
  • CSV connector errors during schema guessing

Although these appear as ingestion errors, they occur before ingestion, during schema inference.

When to use

  • Prototyping
  • Small, clean, well-typed datasets
  • Early development only

Option 2: File-based output (recommended for production)

What it does

  • The notebook explicitly writes a file (CSV, Parquet, etc.) to:
    /opt/sisense/storage/notebooks/output/
  • The Custom Code Table is created to read from that file.
  • The schema is fixed and deterministic.

Why this works reliably

  • Removes automatic schema guessing
  • Ensures consistent data types
  • Matches how most production-grade pipelines operate

Example (Python code)

# Python
output_path = "/opt/sisense/storage/notebooks/output/my_data.csv"
df.to_csv(output_path, index=False)

Note:
Even in this approach, Sisense still ingests data via an internal CSV — the difference is that the schema is no longer inferred from a live DataFrame.

When to use

  • Production deployments
  • APIs and external data sources
  • Large or complex datasets
  • Any scenario where inference errors occur

Common misunderstanding: notebook “test” or preview cells

Developers often add cells that display a DataFrame for validation during development. While useful for debugging, these cells:

  • Only affects the Jupyter UI
  • Do not impact ElastiCube ingestion
  • Are ignored during the build process

During builds:

  • Sisense may ignore designated dev/test cells
  • Sisense injects its own cells for parameters and final serialization

Only the final returned DataFrame (or its serialized output) is ingested.

Conclusion

Custom Code Tables rely on a single ingestion pipeline: a returned pandas DataFrame that is serialized and loaded during the build. The Infer from Notebook option affects schema discovery, not ingestion, and can fail with real-world data. Writing a file and using it as the table source removes schema inference and provides the most stable, production-ready configuration.

References / related content

Published 12-30-2025
No CommentsBe the first to comment