Trigger Pulse Alerts with Forecasted Values with R
Use case Imagine you're Jeff Bezos and you own a few servers. You sell compute on these servers to customers and you want to make sure that you always have enough computer power to meet demand. This means when overall RAM consumption approaches the max RAM of your servers you must buy more servers so your customers never have an outage! But it takes at least a week to set everything up! In this scenario, we need to be alerted to a metric hitting a maximum ahead of time. This is done with a forecasted value. Solution We can use the R integration to predict n days ahead and trigger a pulse alert to notify us when predicted values breach a threshold value. In this example, we will use the Prophet library to forecast our data, but any library can be used - even regression. As an example, let’s look at a time series of the log daily page views for the Wikipedia page for Peyton Manning. Peyton Manning provides a nice example because it illustrates some of Prophet’s features, like multiple seasonality, changing growth rates, and the ability to model special days (such as Manning’s playoff and superbowl appearances). The CSV is available here. https://facebook.github.io/prophet/docs/quick_start.html#r-api Steps to implement 1. Elasticube Setup: In the Elasticube, break up the date you will forecast over into independent Year, Month, and Day integer columns using custom columns for each. custom column name: function year: getYear(date) month: getMonth(date) days: getDay(date) 2. RServe & Environment Setup Run RServe and download the following libraries to R Tidyr Dplyr Prophet 3. Sisense Indicator and Function Setup Paste the following code into an indicator's function editor. RDOUBLE( TRUE, "library('tidyr'); library('dplyr'); library('prophet'); df <- data.frame(y=args[[1]], year=args[[2]], month=args[[3]], day=args[[4]]); df <- df %>% unite('ds', year, month, day, sep='-', remove = FALSE); df <- df[c('ds', 'y')]; df$ds <- as.Date(df$ds , format = '%Y-%m-%d'); m <- prophet(df); future <- make_future_dataframe(m, periods = args[[5]], include_history = FALSE); forecast <- predict(m, future); max(forecast$yhat); ", [y]*1,[year]*1,[month]*1,[day]*1, 60) In the code above the arguments to the function are columns y (as all items), year (as all items), day (as all items), and the number 60. y: the value to forecast year: the integer year portion of the date month: the integer month portion of the date day: the integer day portion of the date 60: the parameter that determines how many days into the future to forecast Save the widget. This number represents the maximum value among forecasted values. 4. Creating the Pulse Alert On the widget you just made, create a pulse alert, and set a threshold. Use the "greater than" option. When any of the values in the forecast go above the threshold you will be notified! This works with data security, so each tenant's data will be forecasted and evaluated against the threshold when the elasticube is built. Only users who's forecasted values are greater than the threshold will be notified. Cheers!763Views0likes0CommentsSo How Exactly Does R Work With Sisense?
To learn about how Sisense pipes data in and out of R and onto the display for visualization, we will review basic demos to make you say "aha!". Then we'll get a basic understanding of the underlying technologies that make this possible. Once you’ve got the idea of what technologies play what roles we'll take a look at Advanced Analytics and R utility examples and how they can impact your Advanced Analytics dashboarding needs. What can I do with R? Almost anything! Yes, R is just another programming language, albeit a specialized one. Just like you enter code into a console for execution, you can write R in Sisense RINT and RDOUBLE functions for execution! Some Simple Examples Let’s look at some simple examples that illustrate the versatility of the Sisense R functions and review a couple of simple rules for writing R functions in Sisense. Demos: Create a CSV with the pivot table’s aggregated values Multiply [Numbers] times 10 Make an API call on widget refresh and get the most up to date conversion rate Using R File Modules LAYING THE GROUNDWORK It probably won’t make too much sense until we get to the examples, but just bear these two rules in mind and you’ll be writing your own functions in no time. RINT Returns an array of INTEGERS. Must return an array (called a vector in R) the same LENGTH as the input array. RDOUBLE Returns an array of DOUBLES (decimals). Is NOT required to return a vector of the same LENGTH as the input array. We’ll use this table as a data source for our demos. DEMO 1: CREATE A CSV WITH THE PIVOT TABLE’S AGGREGATED VALUES This Demo demonstrates what data gets passed to R when we run a Sisense function. We’ll save the data into a CSV so you can inspect it without having to open R. R functions and syntax used Write a CSV write.csv(MyData, file = "MyData.csv", row.names=TRUE/FALSE) Create a pivot table [Letters] to rows Add a custom function to values column1 and enter the RINT() function with the following arguments TRUE This means we want to run the R formula on every widget refresh. “RINT(TRUE, “write.csv(args[[1]], file = ‘C:\\ numbers.csv’, row.names = FALSE’, MAX([Numbers]); args[[1]])” Here we will put the R code to be executed. MAX([Numbers]), MAX([moreNumbers]) The args object that Sisense passes to R consists of two elements, our first and second aggregation, in this case [Numbers] and [moreNumbers]. In the example above we are only asking for [Numbers] because the selector next to args is set to 1 and Numbers is the first item in the list. Once you’ve saved this widget, you should see a CSV file on your C drive called numbers.csv with the same content as your column1 in the pivot table! Can you guess what would happen if we change the args[[1]] from one to two? That’s right! It would overwrite the CSV with MAX([moreNumbers]) data and display [moreNumbers] data in the pivot! DEMO 2: MULTIPLY [NUMBERS] TIMES 10 In this demo, we are not going to use any functions. Instead, we will demonstrate how to manipulate the set of data we pass to R. We will use the RDOUBLE function as we’ll be using arithmetic in our function. RDOUBLE is usually the safer bet in these conditions. Open edit mode for the widget we created above. Edit the R function value by first replacing RINT with RDOUBLE Replace the code inside the quotes with: args[[1]] * 10 4. Now let’s compare the values to the original values of [Numbers] by placing MAX([Numbers]) in another value column. Your code should be RDOUBLE(TRUE, “args[[1]] * 10”, MAX([Numbers]), MAX([moreNumbers]) You should see that our value has been multiplied by 10! Can you guess what would happen if you swapped args[[1]] for args[[2]]? DEMO 3: MAKE AN API CALL AND GET THE MOST UP TO DATE CONVERSION RATE Now we’re really going to have some fun and push the limits of what we can do with this integration. We’re going to imagine that every one of those numbers above is an American dollar and we want to convert it to a Euro, but we can’t wait for the next build to happen! We need this conversion rate NOW! First, you will need to run install.packages(‘httr’) in the R GUI or command line to download the httr package that allows us to make GET requests. In the previous widget we will replace the second argument content with the following code: library('httr'); apiElement <- GET('https://api.exchangeratesapi.io/latest?base=USD'); USDtoEUR <- content(apiElement, 'parsed')$rates$EUR; args[[1]] * USDtoEUR; Let’s walk through this code line by line together to understand what’s happening. First, we say that we want to use the httr library. Then we make a GET request and put the response into a variable called apiElement. We use the content function to parse the content, get the desired information from it, and put the conversion rate into a variable. Finally, we multiply the MAX([Number]) by the conversion rate and return the outcome. DEMO 4: USING R FILE MODULES A neat option for longer code is packaging it in its own R file and calling it from the dashboard instead of writing it out in the function editor. Take a look at how to do this in the example below. RDOUBLE(TRUE, "source('C:/Users/Administrator/Documents/R/ArtemExample/AddThirtyForecast.R'); module (args, 30);", [Average Numbers]) The Complete Advanced Analytics Pipeline Sisense leverages a technology called Rserve to pass instructions from the Sisense application to a dedicated data space and working directory for R execution without having to spin up an instance of R every time. Once processed, Rserve sends Sisense the results to be visualized. According to the Original Rserve Technology Paper, Rserve is a TCP/IP server which allows other programs to use facilities of R from various languages without the need to initialize R or link to the R library. Every connection has a separate workspace and working directory. Clientside implementations are available for popular languages such as C/C++ and Java. Rserve supports remote connection, authentication, and file transfer. Scaling The Pipeline Scaling the Advanced Analytics solution can be done by the decentralization of Sisense and Rserve nodes. In the multimode solution illustrated below, you can see how load balancing can direct users to a Rserve node with the most availability for processing. Rserve allows for the complete distribution of applications. Advanced Analytics R Applications Now that you know about how Rserve works, it's time to look at applications that can provide value to your business. Take a look at the following pieces of documentation to start your creative juices flowing! K-Mean Clustering R Logistic Regression Loading an Existing Model Building and Saving an R Model Using Data From Sisense1.6KViews0likes0CommentsExploring ElastiCube data with ODBC connection to R
Problem You want to determine if there are any problems with your dataset from the ElastiCube and whether the question you are asking for Sisense with R integration can be answered by the data that you have. Solution Installing the Sisense ODBC driver Download and Install Sisense ODBC driver from this link: Sisense ODBC Note: Sisense ODBC is a licensed feature Adding new Sisense Data ElastiCube Source to ODBC Open ODBC -> System DSN Add New Sisense ElastiCube data source. From the ElastiCube drop-down list, select the ElastiCube. Click Test to verify the connection to the ElastiCube. Exploring ElastiCube data in R Environment Open an R console window. Install and load the “RODBC” Package into R > install.packages("RODBC") > library(RODBC) Open the connection to an ODBC ElastiCube Database. >myconn <-odbcConnect("NW-Dan Kushner", uid="", pwd="") Viewing ElastiCube Data. Submit a query and return a result for exploring in R Discussion Exploratory data analysis is the process of exploring your data, and it typically includes examining the structure and components of your dataset, the distributions of individual variables, and the relationships between two or more variables. References http://www.statmethods.net/input/dbinterface.html https://www.sisense.com/data-connectors/odbc/902Views0likes0CommentsConnecting Sisense to R Server
Problem You want to establish the connection between Sisense server console to the R server. Solution Installing Rserve Open an R console window. Run the command prompt: > install.packages("Rserve") Running Rserve Open an R console window. Load the Rserve Package into R. > library("Rserve") Setup the environment. > Rserve() Open the Sisense server Console from the start menu. Click on the Server preference icon to open the server settings. In the Rserver field, enter the IP address of your R server, and select the checkbox to enable the connection. Test the connection by clicking on the Test button. The default Rserve port is 6311, make sure it is accessible (Test by telnet command) if necessary. Windows users: Running Rserve automatically every time you start windows Rserve is a stand -alone program so it can be started directly as well. Search for "Rserve.exe" file and click copy. Paste "Rserver.exe" file into "C:\Program Files\R\R-3.2.4revised\bin\x64" Create "Short cut" to the "C:\Program Files\R\R-3.2.4revised\bin\x64\Rserve.exe" file. Paste the "Short cut" Rserve.exe in the Startup folder(Path: "%appdata%\Microsoft\Windows\Start Menu\Programs\Startup") Discussion Rserve acts as a socket server (TCP/IP or local sockets) which allows binary requests to be sent to R from various languages without the need to initialize R. References https://rforge.net/Rserve/doc.html#inst https://cran.r-project.org/web/packages/Rserve/index.html https://docs.sisense.com/main/SisenseLinux/using-r-in-formulas.htm http://qu-c.com/blogs/install-r-server-on-aws/setup-security-for-r-server881Views0likes0CommentsCalling R Functions Defined In Other Text File From Sisense
Problem In some cases, you want to capture a series of R commands in a text file, and you want to execute them from Sisense (edit formula). Solution Write script that will handle and manipulate the arguments you send from Sisense Open RGui or RStudio. Write the script that will handle the arguments you send from Sisense, make sure you use function syntax (subprogram that returns values). RGui script editor Or RStudio script editor Save the script in R working directory or another path in R server machine. Load R script commands to Sisense edit formula The setwd() function is setting working directory of the folder path where we saved our R script. The source() function instructs R to read the text file and execute its contents. Sisense arguments accessed to R script by calling the name of the function in the script (example line 3). Save function/model for future usage. Discussion R is a general-purpose programming language, so we can use it to automate analysis and create new functions that extend the existing language features. You can capture series of R commands in a text file and execute them from Sisense. References https://stat.ethz.ch/R-manual/R-devel/library/base/html/source.html http://www.statmethods.net/management/userfunctions.html http://www.statmethods.net/interface/workspace.html http://www.sisense.com/documentation/using-r-in-formulas2.1KViews0likes0CommentsAnomaly Detection with Sisense using R
Anomaly Detection with Sisense using R Anomaly Detection AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences. The package will behave differently for different scenarios. Additional information about how the package works as well as the package itself can be found here. How to get started Install the R package using the following commands on the R console: install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") library(AnomalyDetection) The function AnomalyDetectionTs is called to detect one or more statistically significant anomalies in the input time series. The function AnomalyDetectionVec is called to detect one or more statistically significant anomalies in a vector of observations. Both functions have several configurations and parameters which can be found with additional documentation by running the following commands respectively: help(AnomalyDetectionTs) help(AnomalyDetectionVec) For more details on the AnomalyDetection package, visit https://github.com/twitter/AnomalyDetection Sisense Example In our example, we will plot daily data and with use of the AnomalyDetectionTS function in R, find the anomaly. AnomalyDetectionTS will return a 0 value for non-anomalies and the value itself if recognized as an anomaly. We will visualize the data plotted on a daily level, in 2 ways: - Line Chart with data markers, where the anomaly will be colored in red - Scatter Chart with which the anomaly will be conditionally formatted according to our R formula Line Chart In our first example, we will plot days, Total Count and the R formula on a line chart. Setting markers on this line chart and configuring the colors accordingly, we will be able to see all the daily data, and the anomaly detected by a bold color: Scatter Chart In our second example using a scatter chart, we will plot the dates on the x-axis, the Total Count as the y-axis, and break the color of the data markers by conditional formatting to point out the anomaly: Note: By default, the x-axis dates’ labels will render horizontally, to plot them with an angle, add the following widget script to the scatter chart: widget.on('render', function(se,ev){ se.queryResult.xAxis.labels.rotation = -70; }) RDOUBLE function with R code We’ll break the formula into several components: RDOUBLE([recycle], “Rcode”, [Sisense arguments]) Recycle (True/False) - This is an optional parameter that controls whether the results from R should be recycled (cached), so that consequent queries will not have to be recalculated unless they or the data have changed. Rcode – The R Code found following introduces the Sisense arguments into an R data frame and runs the AnomalyDetectionTS function on them. Note that we’ll need to send the dates to R as integers, so we’ll need to initially create a custom column in the ElastiCube in this table which calculates the dates as such. Name this custom column ‘date_int’. In this example, we use the following formula: getyear([_time])*10000+getmonth([_time])*100+getday([_time]) Note: as stated above, this example is for daily granularity data, that is, the data is already aggregated at a daily level. For higher resolution time requirements, one can use a more detailed formula. Bear in mind that this will require further adjustment of the R code. The R code will transform the integers representing dates into dates, and run the AnomalyDetectionTS function on them, together with the Total Count values per date, and return the anomalies detected. Sisense Arguments – As stated above, we will need to transfer the dates in a numeric representation. This means that we’ll need to set this in the function as well. For this, we will send the Max(date_int) as a value, this will plot the corresponding value of date_int to its respective date. The Sisense arguments sent will be sent in the following order: [Max date_int],[Total count] where ‘Max date_int’ is our integers representing the dates, and ‘Total Count’ is the value we’re testing for anomalies. Note that the R code registers the ‘AnomalyDetection’ library, this may require further adjustment of this library’s location: library('AnomalyDetection', lib.loc='~/R/win-library/3.2'); Following, please find the RDOUBLE function reference: RDOUBLE(false, "df = data.frame(X_time=args[[1]], DailyFailedLogins=args[[2]]); df$X_time <- as.Date(as.character(df$X_time), format='%Y%m%d'); df$X_time <- as.POSIXct(df$X_time); library('AnomalyDetection', lib.loc='~/R/win-library/3.2'); res = AnomalyDetectionTs(df, max_anoms=0.02, direction='pos', plot=FALSE); df$X_time_key <- as.Date(substr(as.character.Date(df$X_time),1,10)); res_df <- data.frame(X_time_key=as.Date(res$anoms$timestamp),anoms=res$anoms$anoms); final_df <- merge(x=df, y=res_df, by='X_time_key', all.x = TRUE); final_df$anoms", [Max date_int],[Total count]) For additional documentation for using R in Sisense formulas, please see this link. Attached, please find .ecdata file as well as a .dash which holds the charts above, as well as additional methods for visualizing the above. Download: AnomalyDetection.ecdata AnomalyDetection.dash Anomaly Detection.csv1.9KViews0likes0CommentsControl Chart using R Integration
Problem You want to create a Control Chart in Sisense to study how a process changes over time. A control chart always has a central line for the average, an upper line for upper control limit and a lower control limit. By comparing current data to these lines, you can draw conclusions about whether the process variation is consistent (in control) or is unpredictable. Solution Install “qicharts” package in R console. install.packages(qicharts); R script (Copy paste to Sisense Formula) Center Line: library(qicharts); y<-c(do.call('cbind',args)); sisense_result<-qic(y,chart='i'); sisense_result$cl Upper Control Limit: library(qicharts); y<-c(do.call('cbind',args)); sisense_result<-qic(y,chart='i'); sisense_result$ucl Lower Control Limit: library(qicharts); y<-c(do.call('cbind',args)); sisense_result<-qic(y,chart='i'); sisense_result$lcl Discussion Control charts, also known as Shewhart charts or process-behavior charts, in statistical process control, are tools used to determine if a manufacturing or business process is in a state of statistical control. References https://www.isixsigma.com/tools-templates/control-charts/a-guide-to-control-charts https://en.wikipedia.org/wiki/Control_chart https://cran.r-project.org/web/packages/qicharts/vignettes/controlcharts.html#types-of-control-charts http://blog.minitab.com/blog/understanding-statistics/control-charts-show-you-variation-that-matters1.8KViews0likes0CommentsPareto Chart using R Integration
Problem You want to create a Pareto Chart in Sisense from a collection of numbers that represent frequencies (or counts) of events or outcomes that fall into different categories. Solution R script (Copy paste to Sisense Formula) counts<-c(do.call('cbind',args)); t<-data.frame(counts); id<-as.numeric(rownames(t)); t<-cbind(id=id,t); t<-t[order(t$counts, decreasing=T),]; total <- sum(t$counts); cumulative_precentage <- function(c){return(c/total)}; sisense_result <- sapply(t$counts, cumulative_precentage); t<-cbind(t,sisense_result); t$sisense_result<-cumsum(t$sisense_result); t<-t[order(t$id),]; t$sisense_result Discussion A Pareto Chart is a sorted bar chart that displays the frequency (or count) of occurrences that fall into various categories, from greatest frequency on the left to least frequency on the right, with an overlaid line chart that plots the cumulative percentage of occurrences. The vertical axis on the left of the chart shows frequency (or count), and the vertical axis on the right of the chart shows the cumulative percentage. A ParetoChart is typically used to visualize: Primary types or sources of defects. Most frequent reasons for customer complaints. The amount of some variable (e.g. money, energy usage, time) that can be attributed to or classified according to a particular category. The Pareto Chart is typically used to separate the “vital few” from the “trivial many” using the Pareto principle, also called the 80/20 Rule, which asserts that approximately 80% of effects come from 20% of causes for many systems. Pareto analysis can thus be used to find, for example, the most critical types or sources of defects, the most common complaints that customers have, or an essential categories within which to focus problem-solving efforts. References https://en.wikipedia.org/wiki/Pareto_chart http://www.r-bloggers.com/pareto-charts-in-r1.5KViews0likes0CommentsAdvanced Forecasting With R
ARIMA Model Forecasting ARIMA Model Forecasting Mean and Upper 95% Limit OVERVIEW In this article, we tackle how to produce a robust forecast by using the Sisense and R integration. UNDERSTANDING THE NUANCE OF R For forecasting functionality, we must consider this law of the R integration: The length of the vector of data we pass to R, determined by the X-axis, equals the vector of data we receive from R. Length of vector out (n-out) = vector in (n-in) This raises a question, however: Doesn’t a forecast return additional days to what we have in the X axis, and therefor n-in > n-out? This is precisely the issue we tackle in this article. HIGH-LEVEL PROCEDURE Data Modeling Creating records for future dates R Script Modification Modifying the R script to slice the n-in vector to future and past components Running the forecast on past values and returning forecasted values Concatenating past and forecasted values for returning to Sisense FactData We will use custom SQL expressions to extend our data into the future with dummy records. We will need to do this for every dimension we expect to analyze our data by. Above is the table we will be using to demonstrate our modeling strategy. We’ll call it “factData”. Not all dimensions may have data available for the last day, in this case, the 5th. Company c data is only available until the 4th, so we will need to populate our future values dynamically across dimensions. DATA MODELING PROCEDURE Create a csv or excel document with a single column called “nums” and values 1 – n. n represents how many days out you will forecast. We’ll call this document “extraDays” Create a custom SQL table and create the following expression SELECT *, ‘base’ as type FROM “factData” UNION ALL SELECT * FROM ( SELECT ADDDAYS(a.Date, b.nums) AS newDate, -1 as Units, Company, ‘forecast’ as type FROM ( SELECT Company, MAX(Date) AS maxDate FROM FactData GROUP BY Company ) a CROSS JOIN extraDays b ) z R SCRIPT MODIFICATION With our data modeling done we will have a full vector of data incoming to R from Sisense. This data will include valuable historical data and temporary dummy future values, in the case above, -1. We will handle slicing the data to represent past values and future values with R vector slicing. One convenient way to control how to select historical values is by creating a function that will accept the vector of records we run our analysis on, and another value that specifies how many days out we will forecast (matching the n values in the excel document) module <- function(fullArgs, forecastDays=30, frq=30){#function body} We will use the forecastDays variable for both setting our forecast and slicing our vector. This can be done using the following syntax: # Start of function truncArgs <- fullArgs[1:(length(fullArgs)-forecastDays)] We will run our analysis on a dataframe we build off truncArgs: df3<-data.frame(fcastValues=truncArgs) In this example we are running an ARIMA model and returning 30 days: mySeries = ts(df3[, c(‘fcastValues’)]) library('forecast') tsSeries = ts(na.omit(mySeries), frequency=frq) decomp = stl(tsSeries, s.window="periodic") deseasonal_cnt <- seasadj(decomp) fit_w_seasonality = auto.arima(deseasonal_cnt, seasonal=TRUE) seas_fcast <- forecast(fit_w_seasonality, h=forecastDays) futureValues <- seas_fcast[["mean"]] We will combine our original truncArgs with our future values into a single vector and return this vector back to Sisense. Doing this ensures we maintain the length of vector out (n-out) = vector in (n-in) rule. resultSet <- c(truncArgs, futureValues) return(resultSet) # End of function Save the R script on the server for it to be called during the widget query. WRITING THE SISENSE FUNCTION In Sisense we simply use the RDouble function and pass in the R function arguments as expected. In the R function defined above we included default values for arguments, so we do not need to include them as they default to 30. Replace [Average Units] with a measure you are forecasting. It can be of any aggregation. RDOUBLE(True, "save(args, file='C:/Users/user/Documents/Rlog/log.txt'); source('C:/Users/user/Documents/Sisense/Rscript.R'); module (args[[1]]);" , [Average Units]) We created a "type" column in our data, so we can now use it to differentiate the forecast from past values by using the “break by” in the widget editor. This is how we achieved the green to orange shift in the first example picture.1.1KViews0likes0CommentsPolynomial Regression
polynomial regression where x = day, y = cost polynomial regression degrees 2-4 Method The polynomial regression can be generated by the RDOUBLE formula below. The function in the example below is for a polynomial regression of the third degree. If you require a function of a different degree, simply edit the 3 in the poly function to whichever degree you need. Formulation To keep this approach generalized, I did not convert the values to a time series (ts). Instead, I used the rank function to convert each x-axis element to a vector of 1-n by using the RANK function. rdouble(FALSE, " x <- args[[2]]; y <- args[[1]]; model <- lm(y ~ poly(x, 3)); predict(model); ", [Total Cost], rank(max([Months in Date]), "ASC", "1223"))492Views0likes0Comments