Dynamic Time Warping and Time Series Clustering


Note: This is a part of a series of articles for my package tsrecipes (Github, website). The full article, including the code, can be found here.


Working with a set of time series measuring related observations requires a different set of tools compared to analyzing or forecasting a single time series.

If you want to cluster time series into groups with similar behaviors, one option is feature extraction: statistical summaries that characterize some feature of the time series, such as min, max, or spectral density. The feasts R package and the Python package tsfresh provide tools to make this easier.

Why not cluster on the time series directly? Standard methods don’t work as well, and can produce clusters that fail to capture visual similarities in shape and size.

Dynamic time warping is method that aligns with intuitive notions of time series similarity. To show how it works, I’ll walk through

  1. how standard distance metrics fail to create useful time series clusters

  2. dynamic time warping distance as a method for similarity

Distance Metrics

To cluster, we need to measure the distance between every member of the group.1 Typically we think of Euclidean distance: the length of a straight line between two points.

This distance pops up all the time in data science, usually in Mean Squared Error (MSE) or it’s counterpart Root Mean Squared Error (RMSE). These metrics are used to measure regression error in machine learning and assess the accuracy of a time series forecast.

To evaluate the fit of the forecast to the actual data, you can calculate the Euclidean distance between the corresponding points in the time series and the forecasts. The smaller the distance, the better the forecast: the more similar the two series are.

A straight line between two points isn’t always the possible. In a city grid, we are constrained by the blocks. In this situation, the distance between two points is called the Manhattan distance.

Time series also need a special distance metric. The most common is called Dynamic Time Warping.

Time Series Distance

Plotted below are three time series. I’ve plotted blue and green to both overlap red. Is blue or green more similar to red?

I think it’s blue: blue and red both has an early dip after 750. Around 1000 they both have a slim, deep trough. The major difference is that blue seems shifted to the left.

Green is all wrong: where red dips around 750, green has a bump. And the dip after 1000 is wider and shallower.

The Euclidean distance tells a different story. Red is actually closer to green, because it has a smaller distance metric (9.78 vs 9.83).

##           red    blue
## blue  9.83149        
## green 9.78531 9.82103

Dynamic Time Warping

To capture our intuition about the similarity of red and blue, we need a new metric. This metric can’t simply measure the point-to-point distance between the series. As we saw, blue is shifted to the left of red, even though the shape is really similar. We need to warp time to account for this shift!

In the visualizations below2, you can see how dynamic time warping stretches (warps) time to match up nearby points.

When comparing red to green below, there is a lot more warping going on to match up points (as measured by the light gray concentric lines between the series), so the time series are more dissimilar.

The dissimilarity between red and green is reflected when we calculate the dynamic time warping distance.

##            red     blue
## blue  28.26073         
## green 33.82476 31.50148

  1. The UC Business Analytics R Programming Guide has an excellent series on clustering, covering dissimilarity measures to the final clustering algorithms.↩︎

  2. https://www.r-bloggers.com/time-series-matching-with-dynamic-time-warping/↩︎