Funnel Charts in R with funneljoin - Gaps and Deadlines

2019/11/09

funneljoin

In my previous blog post, I explored the awesome funneljoin package. Check that post out for an overview of the package and some examples.

In this post, I’m going to investigate business problems that can’t be modeled with gaps, and how we can work around that.

Gaps vs. deadlines

Gap

Example:

if a customer clicks on an ad, and then purchases the product 30 days later the company may not want to attribute that purchase to the ad.

Deadline

Examples:

Turbo Tax wants to know the conversion rate between uploading your W-2s and filing your taxes. But they only care if you file your taxes before April 15.

A football coaches need to print off opponent reports before their upcoming game.

How to model deadlines

In the last blog post, I showed how funneljoin supports gaps. Let’s figure out how we can model deadlines using funneljoin.

I’m going to start with the upload/submit/report logs again. Recall that the deadline to print is the same calendar week as the upload.

library(dplyr)
library(tidyr)
library(purrr)
library(funneljoin)
library(lubridate)

logs <- readRDS("funneljoin-logs.RDS")
logs
## # A tibble: 1,759 x 3
##    date        user event 
##    <date>     <int> <chr> 
##  1 2020-01-06     1 upload
##  2 2020-01-06     2 upload
##  3 2020-01-06     3 upload
##  4 2020-01-06     4 upload
##  5 2020-01-06     5 upload
##  6 2020-01-06     6 upload
##  7 2020-01-06     7 upload
##  8 2020-01-06     8 upload
##  9 2020-01-06     9 upload
## 10 2020-01-06    10 upload
## # … with 1,749 more rows

We want to separate the logs based on their deadline, which is the week of the upload. We can do that with lubridate::floor_date and tidyr::nest.

nested_events <- logs %>%
  mutate(deadline = floor_date(date, "week")) %>%
  nest(events = -deadline)

nested_events
## # A tibble: 4 x 2
##   deadline   events                                                        
##   <date>     <S3: vctrs_list_of>                                           
## 1 2020-01-05 18267 , 18267 , 18267 , 18267 , 18267 , 18267 , 18267 , 18267…
## 2 2020-01-12 18274 , 18274 , 18274 , 18274 , 18274 , 18274 , 18274 , 18274…
## 3 2020-01-19 18281 , 18281 , 18281 , 18281 , 18281 , 18281 , 18281 , 18281…
## 4 2020-01-26 18288 , 18288 , 18288 , 18288 , 18288 , 18288 , 18288 , 18288…

This gives us a set of events for each deadline period.

Now, we want to apply the funnel and summary for the events in each deadline period. We’ll use purrr::map to calculate the conversions for each set of events.

customer_funnel <- function(.data) {
  .data %>% 
    funnel_start(
      moment_type = "upload",
      moment = "event",
      tstamp = "date",
      user = "user"
    ) %>%
    funnel_steps(
      moment_types = c("submit", "print"),
      type = "any-firstafter",
    )
}
nested_events %>%
  mutate(conversions = map(
    events, 
    ~customer_funnel(.) %>%
      summarize_conversions(date_print)
  )) %>%
  unnest(conversions) %>%
  select(-events)
## # A tibble: 4 x 4
##   deadline   nb_users nb_conversions pct_converted
##   <date>        <int>          <int>         <dbl>
## 1 2020-01-05      300            132         0.44 
## 2 2020-01-12      250            184         0.736
## 3 2020-01-19      174            150         0.862
## 4 2020-01-26      118            118         1

In this case, the conversion percentages are the same whether we use gaps or deadline.

However, this isn’t real data. The simulated data I created doesn’t cover every scenario. With real data, we need to carefully think if we want gaps or deadlines.

What I’d like to see

As much as I love funneljoin, I think the

nest -> funnel -> summarise -> unnest

workflow is a little clunky.

I think it would be cool if funnel_start respected a grouped data frame, when looking for the next step in the funnel.

For example, we can pass a grouped data frame to funnel_start:

logs %>%
  mutate(week = week(date)) %>%
  group_by(week) %>%
  funnel_start(
    moment_type = "upload",
    moment = "event",
    tstamp = "date",
    user = "user"
  ) %>%
  funnel_steps(
    moment_types = c("submit", "print"),
    type = "any-firstafter"
  ) %>%
  select(date_upload, date_print)
## # A tibble: 5,476 x 3
## # Groups:   week_upload.x.x [4]
##    week_upload.x.x date_upload date_print
##              <dbl> <date>      <date>    
##  1               1 2020-01-06  2020-01-24
##  2               1 2020-01-06  2020-01-31
##  3               1 2020-01-06  2020-01-24
##  4               1 2020-01-06  2020-01-31
##  5               1 2020-01-06  2020-01-24
##  6               1 2020-01-06  2020-01-31
##  7               1 2020-01-06  2020-01-24
##  8               1 2020-01-06  2020-01-31
##  9               1 2020-01-06  2020-01-24
## 10               1 2020-01-06  2020-01-31
## # … with 5,466 more rows

But funneljoin finds the firstafter print date outside of the week of the upload (the group parameter for the data-frame).

I want to find the firstafter inside the week of the upload (the group parameter).

Conclusion

Gaps or deadlines depend on the business we are trying to model. While gaps are very easy to use with funneljoin, we can still analyze conversions in a deadline-based funnel with a little extra work.