cancel
Showing results for 
Search instead for 
Did you mean: 
OlehChudiiovych
Sisense Team Member
Sisense Team Member

Linux | ElastiCube Local files system

Sisense application utilizes MonetDB to write raw data into Elasticubes (commonly referred to as EC or cube). Why does this happen? This allows Sisense to query the data from itself rather than sending a query back to the data source every time the dashboard is accessed.

The process of importing raw data in Sisense Elasticube is called "build".
On Linux, there are a few different types of build modes that exist in Sisense:

  1. Regular - shared storage 

  2. Build on local

  3. Build on local with S3

In each case, the build method will be different, which means the query will also be different too.

 

1. Regular - Shared Storage

Regular builds put the data into a shared storage /opt/sisense/storage/farms while building.

The name of the specific farm is composed of the elasticube name and the timestamp at which the build was started. For example, the directory for a Sample Ecommerce cube that started building on September 23rd may look like aSampleIAAaECommerce_2022.09.23.07.26.31.585

A new directory (composed of name + timestamp) is created for every build.

 

Full Build:

When a full build process starts, an empty directory is created, and the data is imported to that location.

In the case of a failure, this newly created directory will be deleted.

In the case of success, a new EC query pod will be loaded and attached to the new location. After a successful startup, the previous copy of the query pod of this cube (if it exists) will be terminated, and the directory attached to it will be deleted. These operations are done by the management service.

 

Accumulative or Schema Changes Build:

In cases involving accumulative or schema changes, the original directory will be copied before the build starts. Because of this, the first build after an accumulative behavior is selected will take more time.

After the build finishes successfully, and the new cube is loaded, the original cube will get patched with the changes, so it is not necessary to copy the full data set each time.

The new cube will be activated, and the old cube will become ready for the next build.  

The directory will be the same as the original but with "_next" added to the end. aSampleIAAaECommerce_2021.02.23.07.26.31.585_next

When the next accumulative build starts, it will use the _next directory and will not need to copy the entire cube.

Note: this will double the storage size since both directories will remain. 

 

Loading of the Cube

When the cube loads, the farms point to /tmp/aSampleIAAaECommerce_2021.02.23.07.26.31.585/dbfarm inside the cube pod.

The files are copied to  /opt/sisense/storage/farms/aSampleIAAaECommerce_2021.02.23.07.26.31.585/

 

2. Build on Local:

Full Build, Accumulative, or Schema changes:

Same as regular behavior, with a few modifications to farm locations: 

Builds put the data into local storage  /opt/sisense/local_storage/ while building.

Using the Sample Ecommerce Example: /opt/sisense/local_storage/aSampleIAAaECommerce

After the build finishes, copy the cube into the shared storage:  /opt/sisense/storage/farms/aSampleIAAaECommerce_2021.02.23.07.26.31.585

Delete the local folder  /opt/sisense/local_storage/aSampleIAAaECommerce from the server.

Additionally, you will need to create a compressed file dbfarm.tar.gz of all the files smaller than 65K and all the symbolic links that will be needed to operate. Symbolic links should point to the shared storage.

This speeds up how fast the cube loads. Creating symbolic links takes time, and getting small files from the shared storage can create high IOPS that can overload the shared storage.

It is more useful to open one tar.gz file that includes all links and small files.

When the POD is terminated or killed, delete the temporary folder /opt/sisense/local_storage/aSampleIAAaECommerce.

If we kill the pod  with "kubectl delete pod ec-sample-ECommerce-bld-749dd924-45a0-4-5866b6c47d-nlsq7 --force --grace-period=0" the delete will not take place, and the farm will remain on the server.

If the server reboots and /opt/sisense/local_storage is not ephemeral storage, the files will remain.

In AWS, it is recommended to use local NVMe ephemeral storage that comes with r5d or m4d machines.

 

Loading of the Cube:

When cube load, the farm points to  /opt/sisense/local_storage/aSampleIAAaECommerce_2021.02.23.07.26.31.585-bbe0/

All files from  /opt/sisense/storage/farms/aSampleIAAaECommerce_2021.02.23.07.26.31.585/ will be copied there.

Additionally, the dbfarm.tar.gz, which was generated during the build stage, and contains small files and links to large files, is copied and extracted into a local storage folder.

Also, notice that the folders are under /opt/sisense/local_storage, which is a host map and not pod local storage.

The name of the folder has a unique identifier (-bbe0) to prevent two EC instances on the same server from stepping on each other.

When the POD is terminated or killed, we delete the temporary folder /opt/sisense/local_storage/aSampleIAAaECommerce_2021.02.23.07.26.31.585-bbe0/.

If you kill the pod with "kubectl delete pod ec-sample-ECommerce-qry-749dd924-45a0-4-5866b6c47d-nlsq7 --force --grace-period=0" the delete will not occur, and the farm will remain on the server.

If the server reboots and /opt/sisense/local_storage is not ephemeral storage, the files will remain.

In AWS, it is recommended to use local NVMe ephemeral storage that comes with r5d or m4d machines.

 

3. Build on S3

This is the same as building on local, with a few additions.

Building happens on the local path - /opt/sisense/local_storage/aSampleIAAaECommerce.

At the end of the build, we pack all the folders as one big tar.gz file and copy it to s3.  s3://sisensebucket/sisense/Default/aSampleIAAaECommerce_2021.02.23.07.26.31.585.tar.gz

The path determined from s3://<management.S3bucket>/<management.S3path>/<DataGroupName>/<farms>.tar.gz

To set the management.S3bucket and management.S3path use the CLI:

 

 

 

 

 

 

si config set -key management.S3bucket -new-value sisensebucket
si config set -key management.S3path -new-value sisense

 

 

 

 

 

After the copy finishes, the local directory will be deleted, which is similar to building on local. 

Loading of the cube:

When the cube loads to the farms point to /opt/sisense/local_storage/aSampleIAAaECommerce_2021.02.23.07.26.31.585-bbe0/

We download the farms from the s3 ( s3://sisensebucket/sisense/Default/aSampleIAAaECommerce_2021.02.23.07.26.31.585.tar.gz) and open it on the local file system.

Since we do not delete the files from s3 and leave them to be deleted by the s3 lifecycle, you may have a cube pointing to a file that the life cycle deletes before you rebuild the cube. 

The following rule deletes the files after 14 days.

cat <<EOF >expire-s3-rule.json
{
"Rules": [
{
"Expiration": {
"Days": 14
},
"ID": "deleteold",
"Filter": {
"Prefix": "*"
},
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 14
},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 1
}
}
]
}
EOF

aws s3api put-bucket-lifecycle-configuration --bucket sisense-shared-s3-storage --lifecycle-configuration file://expire-s3-rule.json

In this case, the download will fail, and you will see that reflected in the log:

download failed:   s3://sisensebucket/sisense/Default/aSampleIAAaECommerce_2021.02.23.07.26.31.585.tar.gz to - An error occurred (404) when calling the HeadObject operation: Not Found

The cube will not be loaded.

To avoid this, make sure the lifecycle is longer than the building period of the cube.

When the POD is terminated or killed, delete the temporary folder /opt/sisense/local_storage/aSampleIAAaECommerce_2021.02.23.07.26.31.585-bbe0/.

If you kill the pod with "kubectl delete pod ec-sample-ECommerce-qry-749dd924-45a0-4-5866b6c47d-nlsq7 --force --grace-period=0" the delete will not occur, and the farm will remain on the server.

If the server reboot and /opt/sisense/local_storage is not ephemeral storage, the files will remain.

In AWS, it is recommended to use local NVMe ephemeral storage that comes with r5d or m4d machines.

 

Conclusion

This information can be used to understand the path to import row data, the purpose of having multiple folders, and help troubleshoot the build process itself.

As always, if you need additional help, please contact Sisense Technical Support to get more in-depth assistance!

Rate this article:
Version history
Last update:
‎02-15-2024 09:19 AM
Updated by: