cancel
Showing results for 
Search instead for 
Did you mean: 
intapiuser
Community Team Member
Community Team Member

Analytical Need

Sisense comes with a function called Median() to calculate the median of a set of values. However, this function works with one assumption: we have all the values stored in one column and each row in the table represents one occurrence of that value. What if we are dealing with a frequency table that stores each distinct value in one column and the number of occurrences of that value in another column? Take a look at this sample data below.

Challenge

In this example, we need to calculate the median score (Mark) that students received per subject (Class). We cannot use the median function for this type of table because each score has a 'weight' i.e. the number of occurrence, which is stored in a different column called Frequency, e.g. there are 12 students that received a score of 100 for Biology.
 

Solution

Key Concepts
This solution employs these key concepts. Please follow the links below for the detailed documentation.
Implementation:
To calculate the median for a frequency table, we need to follow this logic:
If the total number of occurrences (let's call it 'n', i.e. the sum of the frequencies / the total number of students) is odd, then the median is the ((n+1) / 2)-th value. If n is even, then the median is the average of the (n/2)-th and the ((n/2) + 1)-th value.
For example,
Odd data set:
1, 3, 4, 5, 6
n = 5
Median is the ((5+1)/2)-th value = the third value  = 4
Even data set:
1, 3, 45, 6, 9
n = 6
Median is the average of the (6/2)-th value and the ((6/2)+1)-th value = (the third value + the fourth value) / 2  = (4+5) / 2 = 4.5
 
Follow these steps to implement the weighted median in a pivot table:
Step 1:
To calculate median, values have to be sorted.
  • Add the value column (Mark) to the Rows panel, not Values panel.
  • Sort the table by the value column (Mark).
Step 2:
Add the frequency column, the running sum of the frequency, and the total sum of the frequency per Class to the Values panel. We're going to need these values for the next steps.
  • Frequency:
SUM(Frequency)
  • Running sum of the frequency, save and name the formula for reuse, e.g. 'Frequency Running Sum':
RSUM(SUM(Frequency))
  • Total sum of the frequency, save and name the formula for reuse, e.g. 'Total Frequency':
(SUM([Frequency]), ALL([Mark]))
 
Step 3:
To determine if n is odd or even, calculate the remainder of the total frequency divided by 2 using the modulo function. We can determine that n is even when the formula returns 0 or odd when the formula returns 1.
  • Add the modulo formula to the Values panel. Use the saved formula from the previous step.
MOD([Total Frequency], 2)
 
  • Save and name the formula for reuse, e.g. 'Modulo':
Step 4:
The next step is to determine the position(s) of the median value(s). If n is odd, then the position of the median is (n+1)/2. If n is even, then there are two median values that need to be averaged and the positions of those values are n/2 and (n/2)+1. Therefore, we need to add two columns, each holds the position of one of the median values (when n is odd, the second column will be NULL).
  • Add the first formula to get the median position when n is odd or the first median position when n is even to the Values panel. Use the saved formula from the previous step.
CASE WHEN [Modulo] = 1 THEN ([Total Frequency] + 1) / 2
ELSE [Total Frequency] / 2
END
  • Save and name the formula for reuse, e.g. 'Odd Median Point OR Even Median Point 1':
  • Add the second formula to get the second median position when n is even to the Values panel. If n is odd, then it will be NULL. Use the saved formula from the previous step.
CASE WHEN [Modulo] = 0 THEN ([Total Frequency] / 2) + 1
ELSE NULL
END
  • Save and name the formula for reuse, e.g. 'Even Median Point 2':
Step 5:
Once we have the position(s) of the median, we can determine the actual median value(s) by comparing the position(s) to the running sum. The first running sum record that is larger than the median position holds the actual median value(s). For example, if the median position is the 28-th, then the first record where the running sum is larger than 28 is the second row (running sum = 29), therefore the median value is 73 (see screenshot below).
We need to add two columns again, each holds the actual median value (when n is odd, the second column will be NULL).
  • Add the first formula to get the median value when n is odd or the first median value when n is even to the Values panel. Use the saved formula from the previous step.
(MIN([Mark], CASE WHEN [Frequency Running Sum] >= [Odd Median Point OR Even Median Point 1] THEN SUM([Mark])
ELSE NULL
END)
  • Save and name the formula for reuse, e.g. 'Median Value 1':
  • Add the second formula to get the second median value when n is even to the Values panel. If n is odd, then it will be NULL. Use the saved formula from the previous step.
MIN([Mark], CASE WHEN NOT ISNULL([Even Median Point 2]) and [Frequency Running Sum] >= [Even Median Point 2] THEN SUM([Mark])
ELSE NULL
END)
  • Save and name the formula for reuse, e.g. 'Median Value 2':
Step 6:
The last step to calculate the 'weighted median' is to average the two median values for when n is even. When n is odd, the value is equal to the 'Median Value 1' from the previous step.
  • Add the formula to get the final median value to the Values panel. Use the saved formula from the previous step.
CASE WHEN ISNULL([Median Value 2]) THEN [Median Value 1]
ELSE ([Median Value 1] + [Median Value 2]) / 2
END
  • Save and name the formula for reuse, e.g. 'Median':
Step 7:
Verify the results, then hide all the temp columns and only keep what you need to display. Note that the value column (Mark) should not be hidden/removed from the Rows panel, otherwise the calculation will be completely inaccurate.
Step 8 (optional):
It might be helpful to also show mean and mode in addition to median.
  • To calculate the mean, the formula is where x is the value and f is the frequency. Save and name the formula for reuse, e.g. 'Mean'.
SUM([Mark], SUM([Mark] * [Frequency])) / SUM([Mark], SUM([Frequency]))
  • To calculate the mode, choose the value with the highest frequency by using the Ranking filter (see screenshot below). Save and name the formula for reuse, e.g. 'Mode'.
(SUM([Mark]), [Mark])
The final result:
To get rid of the repeating rows and the 'Mark' column without affecting the calculation, you have to write a widget script to hide the Mark column and keep only the first record for each ID / Class, so that the final table looks like this.
Version history
Last update:
‎02-01-2024 10:56 AM
Updated by:
Contributors
Community Toolbox

Recommended quick links to assist you in optimizing your community experience:

Developers Group:

Product Feedback Forum:

Need additional support?:

Submit a Support Request

The Legal Stuff

Have a question about the Sisense Community?

Email [email protected]

Share this page: