cancel
Showing results for 
Search instead for 
Did you mean: 
intapiuser
Community Team Member
Community Team Member
ARIMA Model Forecasting
ARIMA Model Forecasting Mean and Upper 95% Limit

OVERVIEW

In this article, we tackle how to produce a robust forecast by using the Sisense and R integration.

UNDERSTANDING THE NUANCE OF R

For forecasting functionality, we must consider this law of the R integration:
The length of the vector of data we pass to R, determined by the X-axis, equals the vector of data we receive from R.
Length of vector out (n-out) = vector in (n-in)
This raises a question, however:
Doesn’t a forecast return additional days to what we have in the X axis, and therefor n-in > n-out?
This is precisely the issue we tackle in this article.

HIGH-LEVEL PROCEDURE

  1. Data Modeling
    1. Creating records for future dates
  2. R Script Modification
    1. Modifying the R script to slice the n-in vector to future and past components
    2. Running the forecast on past values and returning forecasted values
    3. Concatenating past and forecasted values for returning to Sisense

FactData
We will use custom SQL expressions to extend our data into the future with dummy records. We will need to do this for every dimension we expect to analyze our data by. Above is the table we will be using to demonstrate our modeling strategy. We’ll call it “factData”. Not all dimensions may have data available for the last day, in this case, the 5th. Company c data is only available until the 4th, so we will need to populate our future values dynamically across dimensions.

DATA MODELING PROCEDURE

  1. Create a csv or excel document with a single column called “nums” and values 1 – n. n represents how many days out you will forecast. We’ll call this document “extraDays”
  2. Create a custom SQL table and create the following expression
SELECT *, ‘base’ as type 
FROM “factData”
UNION ALL
SELECT 
* 
FROM (
SELECT ADDDAYS(a.Date, b.nums) AS newDate, -1 as Units, Company, ‘forecast’ as type
FROM (       
SELECT
Company, MAX(Date) AS maxDate
FROM FactData
GROUP BY Company
) a
CROSS JOIN extraDays b
) z

R SCRIPT MODIFICATION

With our data modeling done we will have a full vector of data incoming to R from Sisense. This data will include valuable historical data and temporary dummy future values, in the case above, -1. We will handle slicing the data to represent past values and future values with R vector slicing.
One convenient way to control how to select historical values is by creating a function that will accept the vector of records we run our analysis on, and another value that specifies how many days out we will forecast (matching the n values in the excel document)
module <- function(fullArgs, forecastDays=30, frq=30){#function body}

We will use the forecastDays variable for both setting our forecast and slicing our vector. This can be done using the following syntax:
# Start of function
truncArgs <- fullArgs[1:(length(fullArgs)-forecastDays)]

We will run our analysis on a dataframe we build off truncArgs:
df3<-data.frame(fcastValues=truncArgs)
In this example we are running an ARIMA model and returning 30 days:
mySeries = ts(df3[, c(‘fcastValues’)])
library('forecast')
tsSeries = ts(na.omit(mySeries), frequency=frq)
decomp = stl(tsSeries, s.window="periodic")
deseasonal_cnt <- seasadj(decomp)
fit_w_seasonality = auto.arima(deseasonal_cnt, seasonal=TRUE)
seas_fcast <- forecast(fit_w_seasonality, h=forecastDays)
futureValues <- seas_fcast[["mean"]]
We will combine our original truncArgs with our future values into a single vector and return this vector back to Sisense. Doing this ensures we maintain the length of vector out (n-out) = vector in (n-in) rule.
resultSet <- c(truncArgs, futureValues)
return(resultSet)
# End of function
Save the R script on the server for it to be called during the widget query.

WRITING THE SISENSE FUNCTION

In Sisense we simply use the RDouble function and pass in the R function arguments as expected. In the R function defined above we included default values for arguments, so we do not need to include them as they default to 30.
Replace [Average Units] with a measure you are forecasting. It can be of any aggregation.
RDOUBLE(True,
"save(args, file='C:/Users/user/Documents/Rlog/log.txt');
source('C:/Users/user/Documents/Sisense/Rscript.R'); 
module (args[[1]]);"
,  [Average Units])
We created a "type" column in our data, so we can now use it to differentiate the forecast from past values by using the “break by” in the widget editor. This is how we achieved the green to orange shift in the first example picture.
Rate this article:
Version history
Last update:
‎03-02-2023 08:35 AM
Updated by:
Contributors