Retention Cohort Without An 'All Activities' Table
Our blog has a post about calculating retention, but this assumes I have a table with user activity from every single day per user, which is, more often than not, not the case. So I put together a version that uses just the start date and end date, and generates the in between:
A: (this equals new_user_activity from the blog)
select subscriptionid, create month
B:
select Subscriptionid, end month
C:
A.subsciption id, A.create month, B.end month from a left join b on A.subsciption id=B.subsciption id
D:
select *, case when c.end month is not null
then datediff('month', start month, end month)
else datediff('month', start month, dateadd('hour', -6, getdate()))
end as months_lasted
from C
E: (This generates a series of months, to help us get those "middle" dates)
select
date_trunc('month', dateadd(month, (-1 * row_number() over(order by true)), getdate()::date)) as n
from
subscription
limit 60
F: (this equals activity in the blog)
select * from D join E on 1=1
where E.n >= D.start_ month
and E.n <= nvl(d.end_month,dateadd('hour',-6,getdate()))
G: (this equals cohort_active_user_count in the blog)
select start month, count(distinct id)
from A
group by 1
H:
select
A.start_month as date
datediff(month, a.start_month, f.n) as period
, max(g.count) as new_users
, count(distinct f.subscription_id) as retained_users
, count(distinct f.subscription_id) / max(g.count)::float as retention
from
A
left join F on
A.subscription_id = F.subscription_id
and A.start_month < F.n
left join G on
A.start_month =G.start_month
group by
1
, 2
I: (final query to format as a cohort)
select date
, new_users
, period
, new_users-retained_users as churned_users
, retained_users
, retention as retention_percentage
where period is not null
This uses month over month retention, in Redshift Pseudocode, but you can edit it to use any period, or even the aggregation filter.
Updated 03-02-2023
intapiuser
Admin
Joined December 15, 2022