Generate Series Of Dates In Snowflake
As Snowflake doesn't have a native generate_series function, here is our solution to generating a table of incrementing dates, starting from the current date, in Snowflake. It is currently set to generate 1095 rows (3 years) of dates. select dateadd(day, '-' || seq4(), current_date()) as dte from table (generator(rowcount => 1095)) Was this article helpful?5.8KViews0likes0CommentsElevate Your Data Product’s Quality with Streamlined Version Control leveraging Our Git Integration
Elevate Your Data Product’s Quality with Streamlined Version Control Leveraging the Sisense Git Integration! In today's CI/CD ecosystems, efficient asset migration from development to production environments is crucial for delivering high-quality data products. Sisense being a leading embedded analytics technology offers a powerful Git integration that simplifies and enhances the migration process. In this blog, we will explore leveraging the Sisense Git Version Control to streamline asset migration, ensuring smooth transitions and maintaining data product integrity. To understand the value of Sisense Git Version Control it is important to understand what Git is. Git offers users (often developers and/or engineers) a structured and efficient approach to managing files, collaborating with others, and maintaining a clear history of changes. Git enhances team productivity, reduces errors, and provides a sense of control over projects. Teams who leverage Git ultimately benefit from better organization, teamwork, and effective management of files and projects. When building your data products in a technology like Sisense, there is massive value in integrating with your developer’s CI/CD workflow for continuity, quality, and time to delivery. Users who leverage the Sisense Git Version Control can collaborate on building data products, manage changes to products over time, and migrate assets across Sisense environments through remote Git repositories. The Sisense Git Integration is a feature that is offered out of the box with Sisense Linux Version(s) 2022.10 and up. To begin leveraging the Sisense Git Integration feature you can click on the Git Logo in the top right of your Sisense environment. The Git GUI will open in a separate browser tab and you will be asked to create a new project. After creating a new project your team will be prompted to name the project, name the default branch, and if you desire to connect to a remote Git repository (further instructions are included in Sisense Git Documentation depending on which Git repository your team leverages). After these steps are complete you can choose to invite others to collaborate with you on the project. If you choose collaborators or decide to lone-wolf a project you will be asked next if you’d like to “add assets” to the project. Do not worry lonely wolves, if you would like to invite collaborators down the road you can share the project after the fact. Assets available to modify/track in Sisense Git Version Control are Data Models and Dashboards, or you can simply continue without if you intend to “Pull” Sisense assets from a remote repository. Once a team has created and defined a project, they can start working. Users familiar with Git will find continuity in terminology and functionality with the Sisense Git GUI and popular Git repositories. Dashboards and Models are compressed into JSON files, allowing users to review, commit, or discard changes. Teams can create branches, checkout branches, and revert changes if needed. When a project is ready to progress to the next stage, users can "Push" the assets/branches to the remote repository. The assets can be reviewed in their JSON format in the remote repository. If a CI/CD pipeline includes QA, Staging, or Production Sisense environments, users can leverage the Git GUI in those environments to "Pull" assets for review or publication. So let’s land this plane! The Sisense Git Integration is a tool that provides tremendous value to your developer/engineering team's workflow, while significantly improving your business with better data product quality and delivery. If your team already leverages Git, this tool will be easy to incorporate and drive value. For users unfamiliar with Git, we strongly recommend adopting this approach, as it only involves a minimal learning curve but offers improved version control, streamlined asset migration, and overall enhanced quality. We hope this information3.7KViews3likes0CommentsAdd SQL Charts from Notebooks to a Dashboard
Sisense is happy to announce added functionality that allows users to add SQL charts to a Dashboard directly from Notebooks. This update significantly reduces the steps required to share insights and eliminates task switching between Notebooks, Dashboards, and Models while maintaining data security and data continuity.3.3KViews1like0CommentsFunnel Charts In Python
Funnel charts are a great way to represent any drop-offs in sample size throughout a series of steps. Using a little bit of Python handiwork in Sisense for Cloud Data Teams' R/Python integration, we can easily create this chart type. A common use case for this chart type is to visualize your pipeline via a sales funnel. This is an extremely effective way to spot opportunities to improve your current sales process! Below is the Python snippet used to create the funnel chart above and the input parameters. Inputs df: a data frame from your SQL output with 2 columns: "step" (the parts of the process you want to visualize in a funnel chart) and "val" (the value associated with the Step in the funnel) show_n (optional parameter): boolean value that determines whether the user wants to display the value corresponding to each step. Default is set to True show_pct (optional parameter): parameter that determines whether the user wants to show each step as... A percent of the previous step ('of last step') A percent of the initial sample ('of whole') Show no percent values ('NA') - default value Snippet # SQL output is imported as a pandas dataframe variable called "df" import pandas as pd import matplotlib.pyplot as plt # Function: funnel_chart, creates a funnel chart image from a dataframe of steps # Inputs: (1) dataframe with 2 columns. The first is the "step", and the second is the "val" corresponding to each step. (2) Optional boolean parameter show_n (displays the n size of each step). (3) Optional paramater show_pct which allows the user to show each step of the funnel as a percent of the original n size ('of whole') or a percent or the previous step ('of last step') # Output: matplotlib image representing a funnel chart def funnel_chart(df, show_n=True,show_pct='NA'): #set up data frame df['val']=df['val'].astype(int) my_max=df['val'][0] df = df.sort_values('val',ascending=False).reset_index(drop=True) df['pct_of_whole']=round((df['val']/my_max)*100).astype(int) pct_change=[100] for j in range(1,len(df['val'])): pct_change.append(int(round(df['val'][j]/df['val'][j-1]*100))) df['pct_change']=pct_change df = df.sort_values('val').reset_index(drop=True) df['left']=(my_max-df['val'])/2 df['col']=['indigo','purple','darkviolet','DarkOrchid','MediumOrchid','orchid','plum','thistle'] #initialize plot fig, ax = plt.subplots() for i in range(len(df['step'])): ax.barh(0.5*i+0.5, df['val'][i], height=0.3, left=df['left'][i], align='center', color=df['col'][i],alpha = 1.0, label=df['step'][i]) if(show_n==True): ax.annotate(' ' + df['step'][i] + ': ' + str(df['val'][i]),xy=(my_max,0.5*i+0.45),horizontalalignment='left') else: ax.annotate(' ' + df['step'][i],xy=(my_max,0.5*i+0.45),horizontalalignment='left') if(show_pct=='of whole'): ax.annotate(str(df['pct_of_whole'][i]) + '%' ,xy=(my_max/2,0.5*i+0.45),horizontalalignment='center',color='white') elif(show_pct=='of last step'): ax.annotate(str(df['pct_change'][i]) + '%' ,xy=(my_max/2,0.5*i+0.45),horizontalalignment='center',color='white') #remove border and align chart ax.axis('off') fig.subplots_adjust(right=0.8) return fig # Use Sisense for Cloud Data Teams to visualize a dataframe or an image by passing data to periscope.output() periscope.output(funnel_chart(df,show_n=True,show_pct='of last step')) What other chart types do you like to use for your sales processes?3.2KViews1like1CommentSankey Diagrams With Plot.Ly In Periscope
Sankey diagrams are a very easy-to-digest way to look at the flow of objects between different points. The example we will walk through here is a basic Sankey overview, leveraging data from a fictional gaming company. The emphasis of this post is focused on how to get your pandas dataframe data into something that can be used by plot.ly in a Sankey diagram. The Sankey set up itself is modeled after the first example in the documentation here. If you want to look at different Sankey plots from a visualization standpoint, plot.ly has a great library or resources. I've linked some of those resources at the bottom of this post for further reference Without any further ado, here is the annotated code! Note that you would need the Periscope Python and R integration to use our Plot.ly integration. Our SQL output is a dataframe that contains 3 columns: Source - The source that the user came from Platform - The user's platform of choice Count - The number of users for that source-platform combination The SQL to generate this is noted below for reference select source , platform , count(*) from users group by 1 , 2 And here is the Python 3.6 code, annotated up top where we set up our pandas dataframe df for use in plot.ly. # SQL output is imported as a dataframe variable called 'df' import pandas as pd import plotly.plotly as py # First, we get a list of all of sources, remove duplicates, and make this a list sources = df["source"].drop_duplicates().tolist() # Then, we get a list of all of platforms (our targets), remove duplicates, and make this a list platforms = df["platform"].drop_duplicates().tolist() # Finally, create a list of all our nodes. We will use this for giving an id to each node for plot.ly all_nodes = sources + platforms # Keeping the size of our dataframes, this would be useful for applying the same color for each "node" and "link" of our sankey diagram, if we so choose to do so n = len(all_nodes) n2 = len(df["source"]) # Create a dataframe that has all of the node ids. We will join this to the original data frame to reference later df1 = pd.DataFrame(all_nodes, columns = ['node']) df1 = df1.reset_index() df2 = pd.merge(pd.merge(df, df1, how = 'inner', left_on = "source", right_on ="node"), df1, how = 'inner', left_on = "platform", right_on ="node", suffixes = ('_source','_target')) # Setting up the data in the plotly "data" argument. # The nodes are described in the "node" dictionary (these are the vertical rectangles in the diagram) # The links are described in the "link" dictionary. These have 3 attributes, the "source" (the index of the node they start at), the "target" (the index of the node they end at), and the "value" the thickness of the band. Additional attributes, such as color can also be specified. data = dict( type='sankey', node = dict( pad = 15, thickness = 20, line = dict( color = "#435951", width = 0.5 ), label = all_nodes, color = ["#84baa6"] * n ), link = dict( source = df2["index_source"], target = df2["index_target"], value = df2["count"], color = ['#bdf9e5'] * n2 )) # Setting up the layout settings in the "layout" argument layout = dict( title = "An Example Sankey Diagram", font = dict( size = 12 ) ) fig = dict(data=[data], layout=layout) # Use Periscope to visualize a dataframe, text, or an image by passing data to periscope.table(), periscope.text(), or periscope.image() respectively. periscope.plotly(fig) Plot.ly Sankey Diagrams - 1 Plot.ly Sankey Diagrams - 2 Plot.ly Sankey Diagrams - 33.1KViews0likes0CommentsManipulating Dates, Datetimes, Unix Timestamps, And Other Date Formatting Tricks In Redshift
Dates aren't stored in the database in the format you want to display? You can use the Redshift TO_CHAR() function to create a string-formatted extract in order to display dates however you want! Note that in Redshift, there are DATE (no timestamp), TIMESTAMP (timestamp, no timezone) and TIMESTAMPTZ (timestamp with time zone) types. Another common datetime type you'll find is the UNIX timestamp, which is usually a 10 digit integer, representing the number of seconds passed since 1970-01-01 00:00:00 UTC (midnight). Less commonly you may find them with 13 digits (3 extra digits for millisecond precision) or 16 digits (microsecond precision). I've not seen a 19-digit nanosecond precision UNIX timestamp out in the wild, but if you happen to find one let me know. Reference Documents: Redshift Datetime types - https://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_types.html Redshift formatters - https://docs.aws.amazon.com/redshift/latest/dg/r_FORMAT_strings.html Postgres - https://www.postgresql.org/docs/9.5/static/functions-formatting.html Here are some examples using various formatters! I'm going to use 'getdate()' so you can run the queries below, but you can substitute any datetime column for actual usage. SELECT to_char(getdate(), 'Mon') -> 'Feb' SELECT to_char(getdate(), 'MON') -> 'FEB' SELECT to_char(getdate(), 'Month') -> 'February' SELECT to_char(getdate(), 'Mon DD HH12:MI:SS') -> 'Feb 27 12:39:12' (12-hour clock) SELECT to_char(getdate(), 'Mon DD, HH24:MI:SS.MS') -> 'Feb 27, 00:39:12.331' (24-hour clock, milliseconds) How about something like grabbing the week number out of 52? select to_char(getdate(), 'WW') -> 09 Try 'DDD' for day number, 'D' for day of week (Sunday =1), or maybe you just really want the month number in Roman numerals; use 'RM' (or 'rm' lower case) for that. What if you have a unix timestamp i.e. 1519692428 ? Redshift doesn't have a native 'timestamp_to_datetime' type function. Luckily, you can use our Date Aggregators, here we'll use the :ts aggregator which turns a unix timestamp into a datetime: SELECT to_char([1519716997:ts] , 'W') -> 4 (week of month) What if you want to display the dates in the format 'Q1 - 2018'? select to_char([1519692428:ts], 'YYYY') || ' - Q' || to_char([1519692428:ts], 'Q') But what if you're running this directly on your Redshift cluster (outside of periscope) and don't have access to our handy SQL formatters? SELECT to_char(TIMESTAMP 'epoch' + 1519692428 * interval '1 second', 'DD-Mon-YY HH24:MM:SS') -> 27-Feb-18 00:02:08 More generally: SELECT to_char(TIMESTAMP 'epoch' + YOUR_UNIX_TIMESTAMP_COLUMN * interval '1 second', '<formatter string>') And the reverse, taking a date/datetime, and converting to unix timestamp (remember to account for timezone conversions if you have a TIMESTAMPTZ - unix timestamps are always defined in UTC) SELECT extract(epoch from getdate()) -> 1520205318 If you have a 13-digit timestamp (instead of the standard 10-digit), your timestamp is in milliseconds. Divide by 1000 (or if you don't care about the precision, just take the leftmost 10 digits) and use the above function. Same procedure for 16-digit microsecond timestamps. Between CAST/CONVERT, EXTRACT, TO_CHAR, and the date formatters, you can take any kind of date field and display it however you'd like! The examples above use the current-datetime-returning GETDATE() function; in Postgres, you'll use NOW(). You can pass any date or datetime column in for these examples, although you'll ned to make sure your underlying column has the necessary precision (i.e. a DATE type doesn't have time precision). Maybe your data is a bit messier and you have the date field stored as a string with mixed dates and datetimes. (if anyone has unix timestamps and nulls mixed in with their date/datetimes, buy your DBA a drink and ask them to fix it. But really, if someone is actually facing this issue, let me know in the comments and I'll whip up a solution... then ask your DBA to fix it). Or maybe you just need more manual control over over the output, like perhaps you wish to capitalize month names, but only if they're 31 days long, and also include the number of days in the month. The tried-and-true CASE WHEN never fails: CASE WHEN extract(MONTH from date_column::date) = 1 then 'JANUARY (31 Days)' WHEN date_part(month, date_column::date) = 2 then 'February (28 Days)' WHEN extract(month from date_column::date) = 3 then 'MARCH (31 Days)' WHEN date_part(month, date_column::date) = 4 then 'April (30 Days)' ... END as month_name and so on. You can use what you prefer between EXTRACT and DATE_PART here. If you're working with time precision, make sure to cast using ::timestamp instead of ::date. Finally, you probably don't want to type that in every time, so use a SQL Snippet to save and reuse!3KViews0likes0CommentsChart Type - Radial Bar Chart In Matplotlib (Python)
Here's a script that takes a data frame with two values, the current and benchmark, and returns radial bar charts to plot progress toward a goal. You can also choose a color using the color_theme parameter that takes values 'Grey', 'Purple, 'Blue', 'Green', 'Orange', or 'Red'. Reason to use it: One of the most common questions an analyst is asked is how close am I to a goal? We had a great thread about doing this in a variety of ways here. The most common tool that people default to on these is to use a gauge chart, which albeit feeling familiar with driving a car, is not necessarily appropriate for visually encoding in an elegant way. With Python in Sisense for Cloud Data Teams, we can start to create custom chart types that are awesome and fit use-cases perfectly. The biggest issue with gauge charts is when you go beyond 100%, do you loop back to the beginning of the gauge or just adjust where 100% is? If we take a step back and look at how the visual could work, we can assemble a few radial components into a full 360° = 100%. That way, when we want a 200%, we can just have multiple 360° rings. ####################################################################### ### LIBRARIES ### ####################################################################### import pandas as pd import matplotlib.pyplot as plt import numpy as np ####################################################################### ### HELPER FUNCTIONS ### ####################################################################### #USE: Create an array structure for rings. #INPUT: a df of row length 1 with the first column as the current metric value and the second colum is the target metric value #OUTPUT: an aray of arrays representing each ring def calculate_rings(df): if df.iloc[0,0] < df.iloc[0,1]: rings=[[df.iloc[0,0],df.iloc[0,1]-df.iloc[0,0]],[0,0]] elif df.iloc[0,0] / df.iloc[0,1] < 2: rings=[[df.iloc[0,0],0],[df.iloc[0,0] % df.iloc[0,1], df.iloc[0,1]-df.iloc[0,0] % df.iloc[0,1]]] else: rings = [[0,0],[0,0]] return rings #USE: Determine if the label for the rotating number label should be left/center/right #INPUT: a df of row length 1 with the first column as the current metric value and the second colum is the target metric value #OUTPUT: the proper text alignment def horizontal_aligner(df): metric = 1.0 * df.iloc[0,0] % df.iloc[0,1] / df.iloc[0,1] if metric in (0, 0.5): align = 'center' elif metric < 0.5: align = 'left' else: align = 'right' return align def vertical_aligner(df): metric = 1.0 * df.iloc[0,0] % df.iloc[0,1] / df.iloc[0,1] if metric < 0.25: align = 'bottom' elif metric < 0.75: align = 'top' elif metric > 0.75: align = 'bottom' else: align = 'center' return align #USE: Create a center label in the middle of the radial chart. #INPUT: a df of row length 1 with the first column as the current metric value and the second column is the target metric value #OUTPUT: the proper text label def add_center_label(df): percent = str(round(1.0*df.iloc[0, 0]/df.iloc[0, 1]*100,1)) + '%' return plt.text(0, 0.2, percent, horizontalalignment='center', verticalalignment='center', fontsize = 40, family = 'sans-serif') #USE: Formats a number with the apropiate currency tags. #INPUT: a currency number #OUTPUT: the properly formmated currency string def get_currency_label(num): currency = '' if num < 10**3: currency = '$' + str(num) elif num < 10**6: currency = '$' + str(round(1.0*num/10**3,1)) + 'K' elif df.iloc[0,0] < 10**9: currency = '$' + str(round(num/10**6,1)) + 'M' else: currency = '$' + str(round(num/10**9,1)) + 'B' return currency #USE: Create a dynamic outer label that servers a pointer on the ring. #INPUT: a df of row length 1 with the first column as the current metric value and the second column is the target metric value #OUTPUT: the proper text label at the apropiate position def add_current_label(df): currency = get_currency_label(df.iloc[0,0]) print('vertical: ' + vertical_aligner(df)) print('horizontal: ' + horizontal_aligner(df)) return plt.text(1.5 * np.cos(0.5 *np.pi - 2 * np.pi * (float(df.iloc[0,0]) % df.iloc[0,1] /df.iloc[0,1])), 1.5 * np.sin(0.5 *np.pi - 2 * np.pi * (float(df.iloc[0,0]) % df.iloc[0,1] / df.iloc[0,1])), currency, horizontalalignment=horizontal_aligner(df), verticalalignment=vertical_aligner(df), fontsize = 20, family = 'sans-serif') def add_sub_center_label(df): amount = 'Goal: ' + get_currency_label(df.iloc[0,1]) return plt.text(0, -.1, amount, horizontalalignment='center', verticalalignment='top', fontsize = 22,family = 'sans-serif') ####################################################################### ### MAIN FUNCTION ### ####################################################################### def create_radial_chart(df, color_theme = 'Purple'): # base styling logic color = plt.get_cmap(color_theme + 's') ring_width = 0.3 outer_radius = 1.5 inner_radius = outer_radius - ring_width # set up plot ring_arrays = calculate_rings(df) fig, ax = plt.subplots() if df.iloc[0, 0] > df.iloc[0, 1]: ring_to_label = 0 outer_edge_color = None inner_edge_color = 'white' else: ring_to_label = 1 outer_edge_color, inner_edge_color = ['white', None] # plot logic outer_ring, _ = ax.pie(ring_arrays[0],radius=1.5, colors=[color(0.9), color(0.15)], startangle = 90, counterclock = False) plt.setp( outer_ring, width=ring_width, edgecolor=outer_edge_color) inner_ring, _ = ax.pie(ring_arrays[1], radius=inner_radius, colors=[color(0.55), color(0.05)], startangle = 90, counterclock = False) plt.setp(inner_ring, width=ring_width, edgecolor=inner_edge_color) # add labels and format plots add_center_label(df) add_current_label(df) add_sub_center_label(df) ax.axis('equal') plt.margins(0,0) plt.autoscale('enable') return plt # call the chart maker function and display the chart periscope.output(create_radial_chart(df, color_theme='Purple')) # Currently supported color themes: Grey, Purple, Blue, Green, Orange, Red An additional layer of information can be introduced using a subtitle below the percentage. Thanks to Kyle Dempsey for developing a simple function to do this. Here's the code with an adjusted position of the percent and subtitle. def add_center_label(df): percent = str(round(1.0*df.iloc[0, 0]/df.iloc[0, 1]*100)) + '%' return plt.text(0, 0.2, percent, horizontalalignment='center', verticalalignment='center', fontsize = 40, family = 'sans-serif') def add_sub_center_label(df): amount = 'Goal: ' + get_currency_label(df.iloc[0,1]) return plt.text(0, -0.1, amount, horizontalalignment='center', verticalalignment='top', fontsize = 22,family = 'sans-serif')2.4KViews1like0CommentsBloX - A Technical Introduction
BloX - A Technical Introduction To start, download the attached file ‘BloXTutorial-Styling.dash’ file and import this into your Sisense environment. This dashboard uses the ‘Sample Ecommerce’ data model and contains all iterations of the BloX widget. If connecting the .dash file to your Sample Ecommerce data model does not work, the attached file ‘BloX-Tutorial-ECommerce-DataModel.zip’ contains a version of this data model that will work. Additionally, ensure BloX is already enabled on your Sisense environment. Step 1 - BloX Structure In the first iteration of this BloX widget, we start with a BloX structure as depicted in the image above. In our widget, we have: Country name panel item in the header TextBlock Three TextBlocks in the left column Profit Profit Margin Units Sold Three TextBlocks in the right column Profit panel item Profit Margin panel item Units Sold panel item It is worth noting that with the ‘showCarousel’ attribute set to ‘true’, only one Country will display at a time. Now that we understand the layout of the BloX widget, let’s add some styling to make it look more presentable. Step 2 - Styling I also encourage you to play around with the values of each property to get a better understanding of how these affect the widget. Contai ner style block In our Container, we’ve added a style element with padding, width, and two margin types. We can use these CSS properties to manipulate the Container element of our BloX widget. In this example, they are dictating the position of the container and, therefore all of the text within. Country panel TextBlock In our Country TextBlock, we’ve added three text styling elements: weight, color, and size. We can use these BloX elements to manipulate how a TextBlock is styled rather than using CSS properties. Note that the size element has also been added to the other TextBlocks as well. Notice that the CSS properties need to be used within a style element while the BloX elements do not. The BloX elements provide easy access to commonly used style properties. However, you will see we can accomplish the same using the style element. Step 3 - More Styling The third iteration of our BloX widget adds background coloring and a border radius. This gives our widget a much more presentable ‘card’ feel. Note: we’ve added a second container within our first container because the padding property conflicts with the border-radius property. Container style block 2 Here, we’ve added the border-radius property to give the container’s edges a sleeker, rounded feel. We also added the overflow property to prevent text from leaving the container. TextBlocks In each text block, we’ve removed the TextBlock attributes and replaced them with CSS properties. Using style elements, we have more control over the look and feel with CSS. The background-color property visually separates each row of text. Using the style attributes gives more control over the widget, but the code in our Editor can become redundant, bulky, and difficult to read. We can help alleviate this using CSS classes. Step 3.5 - CSS Classes The widget in this iteration looks identical to the previous version. However, we consolidated the style code and replaced the style elements with the class element. These classes are defined in the style element in our main BloX . The code in our editor is now much more readable and easy to understand. Main style attribute In our BloX main style attribute, we’ve defined several classes. Let's examine the ‘blox-row-odd-key’ class. We see ‘.blox-row-odd-key{css code} in this style attribute with this class referenced in two TextBlocks. We moved the style elements from the previous step within this class definition. This makes the code more readable and manageable. Now, if we want to change the background color for these TextBlocks, we only need to change it once. The downside to defining our classes in the BloX main style element is that the code is limited to one line and difficult to read itself. Using external stylesheets, we can maintain readability for our class definitions. Step 4 - Other Useful BloX elements The following BloX elements are useful and some are sparsely documented. The Step 4 widget in the .dash file has placeholders for all of these in the BloX main section. Changing the values of these will give a good sense of how they affect the widget. showCarousel Setting this element to ‘true’ limits the widget to only show one value of the Items Setting this element to ‘false’ shows multiple values from the Items carouselAnimation With the showCarousel element set to ‘true’, the ‘showButtons’ and ‘delay’ element become applicable Setting ‘showButtons’ element to false hides the carousel buttons Setting ‘delay’ element to an integer cycles through the items. The integer is the number of milliseconds between each cycle i.e. delay = 1000 cycles through the items once per second titleStyle Setting ‘diplay’ element of ‘titlestyle’ to ‘none’ hides the title header row of the BloX widget The ‘fontColor’ element allows changing the text color of the title The ‘backgroundColor’ element allows changing the background color of the title header row The ‘backgroundImage’ element allows changing the background of the title header row to an image with a url disableLazyLoading By default, a limited number of items (10) are loaded when the widget is rendered With showCarousel set to false, setting this element to true loads all values when the widget is rendered. This is most useful to show more than 10 columns Warning: Disabling lazy loading can have performance impacts on the dashboard Hopefully, you feel comfortable enough to jump in and start creating your own custom visualizations. This walkthrough gives users a taste of how styling with CSS can help you create your own BloX widgets and touched on some important BloX elements to be aware of. The only way to get better at something is to practice, so I encourage you to challenge yourself and create your own widget.2.4KViews1like0CommentsCalculating Trimmed Means (SQL And Python Variations)
Data can oftentimes have extreme outliers, which can heavily skew certain metrics, such as the mean. One way to get around this is to use a trimmed mean. Using a trimmed mean, users will remove the top and bottom x percent of their data and take the average of the result. We see trimmed ranges as a standard methodology that college programs use to communicate the standardized test scores of their admitted students (ex: 25th to 75th percentile of SAT scores). This is the same thought process that drives some data professionals to use trimmed means. For the examples below, let's assume we have the following data: user_ids, and how much money they spent on a fictional gaming app. Figure 1 Note: There are a few nuances to keep in mind when developing a protocol for trimmed means. Primarily, we need to assess how to handle multiple occurrences of a value when it is either the lower or upper bound of results to be trimmed. Case 1: Exclude upper and lower bounds of the trimmed dataset You can perform this calculation in Redshift SQL or Python Redshift SQL (assume the table in Figure 1 is stored in t1) with t2 as (select case when amt_paid <= PERCENTILE_DISC(0.1) WITHIN group(order by amt_paid) over() then null when amt_paid >= PERCENTILE_DISC(1-0.1) WITHIN group(order by amt_paid) over() then null else amt_paid end as trimmed_vals from t1) select avg(trimmed_vals) from t2 Python 3.6 (assume the table in Figure 1 is stored in a dataframe df) # SQL output is imported as a pandas dataframe variable called "df" # Source: https://stackoverflow.com/questions/19441730/trimmed-mean-with-percentage-limit-in-python import pandas as pd import matplotlib.pyplot as plt from scipy.stats import tmean, scoreatpercentile import numpy as np def trimmean(arr, percent): lower_limit = scoreatpercentile(arr, percent) upper_limit = scoreatpercentile(arr, 100-percent) return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(False, False)) my_result = trimmean(df["amt_paid"].values,10) Case 2: Disregard duplicate values of the upper/lower bounds In other words, ensure that exactly x% of the lowest and x% of the highest values are removed from the dataset. If there are multiple records where the value is equal to the upper limit, discard enough copies such that the resulting sample size is 2*x% smaller than the initial sample size. SQL There is no known effective SQL equivalent for this behavior Python 3.6 # SQL output is imported as a pandas dataframe variable called "df" import pandas as pd from scipy.stats import trim_mean import numpy as np my_result = trim_mean(df["amt_paid"].values, 0.1) Case 3: Include upper and lower bounds of the trimmed dataset Redshift SQL (assume the table in Figure 1 is stored in t1) with t2 as (select case when amt_paid < PERCENTILE_DISC(0.1) WITHIN group(order by amt_paid) over() then null when amt_paid > PERCENTILE_DISC(1-0.1) WITHIN group(order by amt_paid) over() then null else amt_paid end as trimmed_vals from t1) select avg(trimmed_vals) from t2 Python 3.6 (assume the table in Figure 1 is stored in a dataframe df) # SQL output is imported as a pandas dataframe variable called "df" # Source: https://stackoverflow.com/questions/19441730/trimmed-mean-with-percentage-limit-in-python import pandas as pd import matplotlib.pyplot as plt from scipy.stats import tmean, scoreatpercentile import numpy as np def trimmean(arr, percent): lower_limit = scoreatpercentile(arr, percent) upper_limit = scoreatpercentile(arr, 100-percent) return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(True, True)) my_result = trimmean(df["amt_paid"].values,10)2.3KViews0likes0Comments