Visualizing the Similarity Between Multiple Time Series

Presenting the similarity between multiple time series in an intuitive manner is not an easy problem. The standard solution is a correlation matrix, but it’s a problematic approach. While it makes it easy to check the correlation between any two series, and (with the help of conditional formatting) the relation between one series and all the rest, it’s hard to extract an intuitive understanding of how all the series are related to each other. And if you want to add a time dimension to see how correlations have changed, things become even more troublesome.

The solution is multidimensional scaling (the “classical” version of which is known as Principal Coordinates Analysis). It is a way of taking a distance matrix and then placing each object in N dimensions such that the distances between each of them are preserved as well as possible. Obviously N = 2 is the obvious use case, as it makes for the simplest visualizations. MDS works similarly to PCA, but uses the dissimilarity matrix as input instead of the series. Here’s a good take on the math behind it.

It should be noted that MDS doesn’t care about how you choose to measure the distance between the time series. While I used correlations in this example, you could just as easily use a technique like dynamic time warping.

Below is an example with SPY, TLT, GLD, SLV, IWM, VNQ, VGK, EEM, EMB, using 252 day correlations as the distance measure, calculated every Monday. The motion chart lets us see not only the distances between each ETF at one point in time, but also how they have evolved.

Some interesting stuff to note: watch how REITs (VNQ) become more closely correlated with equities during the financial crisis, how distant emerging market debt (EMB) is from everything else, and the changing relationship between silver (SLV) and gold (GLD).

Here’s the same thing with a bunch of sector ETFs:

To do MDS at home: in R and MATLAB you can use cmdscale(). I have posted a C# implementation here.

Comments (23)

  • drveen says:

    I’ve been experimenting with variatons of your previous work from the post concerning candlestick patterns. I’ve simply captured the current (once closed) and prior 3-4 days as a vector, then walked (pandas, python) the test dataset, comparing each N-day window as an identically constructed vector. Ok, that might seem “slow”, but on 10 years of daily data, its really quite quick. Seems awfully similar to the iterative qualities of DTW. Could you comment on this?

    Do you think either approach is “superior”? Pros? Cons?

    Thanks for posting the interesting work.

    • qusma says:

      >Seems awfully similar to the iterative qualities of DTW.

      I’m not sure what you mean, could you elaborate?

      DTW is just a way of measuring distance between time series…an alternative to, say, Euclidean distance (or whatever you’re using for the knn calcs). In the case of comparing sequences of OHLC bars it’s not directly comparable because they’re not really a single time series.

      • drveen says:

        In very layman terms, how I’m understanding DTW: take two squigly lines, lay them end to end. Now slide one across the other slowly unti they “fit”. The slow slide being “iteration”, especially if the one line is much smaller (e.g. 3 days vs. 10 years).

        So, following your example from the candlestick pattern posts, I’m calculating a similar vector for the past 3 days of data (A), and “sliding it along” 10 years of (say, SPX). At each day in the 10 year sample, I calculate the 3 day vector at that point (B) and hand both A and B to a function which gives me a scalar value for the norm of their difference. (I believe it is doing sqrt of sum-of-sqr-differences).

        np.linalg.norm(A – B) # what I’m actually using in python.

        So, realizing that I’m being quite naive, it does seem at least superficially similar. And yes, comparing constructed vectors of various features at each point is indeed different from comparing the squigly lines formed by simply connecting the price close (y) at each day (x).

        Again, thanks.

        • qusma says:

          DTW doesn’t involve sliding, but “stretching” (the titular “warping”), see the pic below for an intuitive visual example of how the series are warped to fit each other. This is done by changing which point in one series “corresponds” to which point in the other. In a situation with 3 days vs 10 years of data this procedure is meaningless though.

          What you COULD do (but it wouldn’t work well because the series are too short, and not a true time series anyway) is, when you’re sliding your 3 day vector against a rolling window for 10 years of data… you can compute the distance between the two vectors using DTW instead of the sum of squared diffs.

  • Mini Trader says:

    What is the input to the C# MDS function? Also, why couldn’t you just sample correlations over a moving window and take the corr at a given point in time. Trying to understand the advantage of this approach.

    • qusma says:

      Yeah that should’ve been a bit clearer…the input is a matrix of pairwise distances. The element at [x,y] is the distance between the xth and yth objects. So the diagonal is 0s, etc.

      >Also, why couldn’t you just sample correlations over a moving window and take the corr at a given point in time.

      That’s exactly what I’m doing, I’m sampling 252-day correlations every week, MDS simply provides a good way of visualizing those numbers.

  • Mini Trader says:

    Off topic. But I’m curious if you have tried applying genetic programming to any trading problems perhaps using a framework such as ECJ or HeuristicLab.

    • qusma says:

      Not yet, but it’s my next “big project”. I’m not really sure how effective the approach will be…there seem to be a lot of fundamental problems with GP that need to solved or at least worked around. The claims made by the trading system lab guy are interesting enough to at least warrant some investigation though.

      As for the framework of choice I was thinking of using the AForge.NET GP library, with some modifications to be able to use time series as inputs. ECJ looks great, but the C# port seems to be abandoned which is really problematic for me.

    • Dave says:

      If GP worked, then all those professors who published articles about GP and constructing trading systems in the 1990s would own the market today. It is just a curve-fitting approach. No statistical test will help here because as Aronson says regular tests of significance do not apply when there is large bias.

      • Mini Trader says:

        I completely agree. I think at best one can do is limit their allowable operations and size of the GP to try and find optimal cases using indicators that they deem to be potentially important.

      • qusma says:

        Professors are notoriously terrible at actually trading, even if the underlying ideas are legit, so I don’t really consider their failure a bad sign.

        Ultimately every optimization method is a “curve-fitting approach”, the key is to fit just enough to create a model that does well OoS. I’m not necessarily saying GP is a good approach, but the reasons you named aren’t enough to completely rule it out.

  • DMcC says:

    Wouldn’t it be useful to label the items in the graphs you’re displaying above?

    • qusma says:

      It’s a problem with google charts. You can click on each marker to add a label, but I don’t think it’s possible to have them enabled by default.

  • Problems in the Optimization of Swing Portfolios - QUSMA says:

    […] and a China ETF (FXI). Let’s take a look at some numbers. Here are the correlations and MDS plot so you can get a feel for how these instruments relate to each […]


Leave a Reply

Your email address will not be published. Required fields are marked *