DTW Barycenter Averaging (or DBA) is an iterative algorithm that uses dynamic time warping to align the series to be averaged with an evolving average. It was introduced in *A global averaging method for dynamic time warping, with applications to clustering *by Petitjean, et. al. As you’ll see below, the DBA method has several advantages that are quite important when it comes to combining financial time series. Note that it can also be used to cluster time series using k-means. Roughly, the algorithm works as follows:

- The n series to be averaged are labeled S
_{1}…S_{n}and have length T. - Begin with an initial average series A.
- While average has not converged:
- For each series S, perform DTW against A, and save the path.
- Use the paths, and construct a new average A by giving each point a new value: the average of every point from S connected to it in the DTW path.

You can find detailed step-by-step instructions in the paper linked above.

A good initialization process is extremely important because while the DBA process itself is deterministic, the final result depends heavily on the initial average sequence. For our purposes, we have 3 distinct goals:

- To preserve the shape of the inputs.
- To preserve the magnitude of the extremes on the y axis.
- To preserve the timing of those extremes on the x axis.

Let’s take a look at how DBA compares to normal averaging, and how the initial average sequence affects the end result. For testing purposes I started out with this series:

Then created a bunch of copies by adding some random variation and an x-axis offset:

To start out, let’s see what a simple average does. Note the shape, the distance between peak and valley, and the magnitude of the minimum and maximum values: all far from the original series.

Now, on to DBA. What are our initialization options? My first instinct was to try to start the process using the simple average, above. While this achieves goal #2, the overall shape is obviously wrong.

Petitjean et. al. recommend picking one of the input series at random. On the one hand this preserves the shape well, but the timing of the extremes depends on which series happened to be chosen. Additionally, a deterministic process is preferable for obvious reasons.

My solution was to use an input series for initialization, but to choose it through a deterministic process. I first de-trend every timeseries, then record the x-axis value of the y-axis maximum and minimum values for each series. The series that is closest to the median of those values is chosen. This allows us to preserve the shape, the y-axis extreme magnitudes, and get a good idea of the typical x-axis position of those extremes:

You can find C# code to do DBA here.

]]>The main conceptual difference between K-means and K-medoids is the distance used in the clustering algorithm. K-means uses the distance from a centroid (an average of the points in the cluster), while K-medoids uses distance from a medoid, which is simply a point selected from the data. The algorithms used for arriving at the final clusters are quite different.

How does K-medoids compare to K-means? It has several features that make it superior for most problems in finance.

- K-medoids is not sensitive to outliers that might “pull” the centroid to a disadvantageous position.
- K-medoids can handle arbitrary distance functions. Unlike K-means, there is no need for a mean to be defined.
- Unlike K-means, K-medoids has no problems with differently-sized clusters.
- K-medoids is also potentially less computationally intensive when the distance function is difficult to solve, as distances only need to be computed once for each pair of points.

Note that there’s no guarantee on the size of each cluster. If you want to, it’s trivial to add some sort of penalty function to force similarly-sized clusters.

The algorithm is simple:

- Choose k points from the sample to be the initial medoids (see below for specific methods).
- Give cluster labels to each point in the sample based on the closest medoid.
- Replace the medoids with some other point in the sample. If the total cost (sum of distances from the closest medoid) decreases, keep this new confirugration.
- Repeat until there is no further change in medoids.

Initialization, i.e. picking the initial clusters before the algorithm is run, is an important issue. The final result is sensitive to the initial set-up, as the clustering algorithm can get caught in local minimums. It is possible to simply assign labels at random and repeat the algorithm multiple times. I prefer a deterministic initialization method: I use a procedure based on Park et al., the code for which you can find further down. The gist of it is that it selects the first medoid to be the point with the smallest average distance to all other points, then selects the remaining medoids based on maximum distance from the previous medoids. It works best when k is set to (or at least close to) the number of clusters in the data.

An example, using two distinct groups (and two outliers):

The first medoid is selected due to its closeness to the rest of the points in the lower left cluster, then the second one is selected to be furthest away from the first one, thus “setting up” the two obvious clusters in the data.

In terms of practical applications, clustering can be used to group candlesticks and create a transition matrix (also here), to group and identify trading algorithms, or for clustering returns series for forecasting (also here). I imagine there’s some use in finding groups of similar assets for statistical arbitrage, as well.

Something I haven’t seen done but I suspect has potential is to cluster trades based on return, length, adverse excursion, etc. Then look at the average state of the market (as measured by some indicators) in each cluster, the most common industries of the stocks in each cluster, or perhaps simply the cumulative returns series of each trade at a reasonably high frequency. Differences between the “good trades” cluster(s) and the “bad trades” cluster(s) could then be used to create filters. The reverse, clustering based on conditions and then looking at the average returns in each cluster would achieve the same objective.

I wrote a simple K-medoids class in C#, which can handle arbitrary data types and distance functions. You can find it here. I believe there are packages for R and python if that’s your thing.

]]>IBS did pretty badly in 2012, and didn’t manage to reach the amazing performance of 2007-2010 this year either. However, it still worked reasonably well: IBS < 0.5 led to far higher returns than IBS > 0.5, and the highest quarter had negative returns. It still works amazingly well as a filter. Most importantly the magnitude of the effect has diminished. This is partly due to the low volatility we’ve seen this year. After all IBS does best when movements are large, and SPY’s 10-day realized volatility never even broke 20% this year. Here are the stats:

The original post can be found here. Performance in 2013 hasn’t been as good as in the past, but was still reasonably OK. I think the results are, again, at least partially due to the low volatility environment in equities this year.

I’ve done 3 posts on day of the month seasonality (US, EU, Asia), and on average the DOTM effect did its job this year. There are some cases where the top quarter does not have the top returns, but a single year is a relatively small sample so I doubt this has any long-term implications. Here are the stats for 9 major indices:

My studies on the implied volatility indices ratio turned out to work pretty badly. Returns when the VIX:VXV ratio was 5% above the 10-day SMA were -0.03%. There were no 200-day highs in the ratio in 2013!

Overall I would say it was a mixed bag for me this year. Returns were reasonably good, but a bit below my long-term expectations. It was a very good year for equities, and my results can’t compete with SPY’s 5.12 MAR ratio, which makes me feel pretty bad. Of course I understand that years like this one don’t represent the long-term, but it’s annoying to get beaten by b&h nonetheless.

Some strategies did really well:

Risk was kept under control and entirely within my target range, both in terms of volatility and maximum drawdown. Even when I was at the year’s maximum drawdown I felt comfortable…there is still “psychological room” for more leverage. Daily returns were positively skewed. My biggest success was diversifying across strategies and asset classes. A year ago I was trading few instruments (almost exclusively US equity ETFs) with a limited number of strategies. Combine that with a pretty heavy equity tilt in the GTAA allocation, and my portfolio returns were moving almost in lockstep with the indices (there were very few shorting opportunities in this year’s environment, so the choice was almost always between being long or in cash). Widening my asset universe combined with research into new strategies made a gigantic difference:

I made a series of mistakes that significantly hurt my performance figures this year. Small mistakes pile on top of each other and in the end have a pretty large effect. All in all I lost several hundred bp on these screw-ups. Hopefully you can learn from my errors:

- Back in March I forgot the US daylight savings time kicks in earlier than it does here in Europe. I had positions to exit at the open and I got there 45 minutes late. Naturally the market had moved against me.
- A bug in my software led to incorrectly handling dividends, which led to signals being calculated using incorrect prices, which led to a long position when I should have taken a short. Taught me the importance of testing with extreme caution.
- Problems with reporting trade executions at an exchange led to an error where I sent the same order twice and it took me a few minutes to close out the position I had inadvertently created.
- I took delivery on some FX futures when I didn’t want to, cost me commissions and spread to unwind the position.
- Order entry, sent a buy order when I was trying to sell. Caught it immediately so the cost was only commissions + spread.
- And of course the biggest one: not following my systems to the letter. A combination of fear, cowardice, over-confidence in my discretion, and under-confidence in my modeling skills led to some instances where I didn’t take trades that I should have. This is the most shameful mistake of all because of its banality. I don’t plan on repeating it in 2014.

- Beat my 2013 risk-adjusted returns.
- Don’t repeat any mistakes.
- Make new mistakes! But minimize their impact. Every error is a valuable learning experience.
- Continue on the same path in terms of research.
- Minimize model implementation risk through better unit testing.

Finally, the most popular posts of the year:

- The original IBS post. Read the paper instead.
- Doing the Jaffray Woodriff Thing. I still need to follow up on that…
- Mining for Three Day Candlestick Patterns, which also spawned a short series of posts.

I want to wish you all a happy and profitable 2014!

]]>In this post I’ll do a short presentation of dynamic time warping, a method of measuring the similarity between time series. In part 2 we will look at a clustering method called K-medoids. Finally in part 3 we will put the two together and generate charts similar to the alpha curves. The terminology might be a bit intimidating, but the ideas are fundamentally highly intuitive. As long as you can grasp the concepts, the implementation details are easy to figure out.

To be honest I’m not so sure about the practical value of this concept, and I have no clue how to quantify its performance. Still, it’s an interesting idea and the concepts that go into it are useful in other areas as well, so this is not an entirely pointless endeavor. My backtesting platform still can’t handle intraday data properly, so I’ll be using daily bars instead, but the ideas are the same no matter the frequency.

So, let’s begin with why we need DTW at all in the first place. What can it do that other measures of similarity, such as Euclidean distance and correlation can not? Starting with correlation: one must keep in mind that it is a measure of similarity based on the difference between means. Significantly different means can lead to high correlation, yet strikingly different price series. For example, the returns of these two series have a correlation of 0.81, despite being quite dissimilar.

A second issue, comes up in the case of slightly out of phase series, which are very similar but can have low correlations and high Euclidean distances. The returns of these two curves have a correlation of .14:

So, what is the solution to these issues? Dynamic Time Warping. The main idea behind DTW is to “warp” the time series so that the distance measurement between each point does not necessarily require both points to have the same x-axis value. Instead, the points further away can be selected, so as to minimize the total distance between the series. The algorithm (the original 1987 paper by Sakoe & Chiba can be found here) restricts the first and last points to be the beginning and end of each series. From there, the matching of points can be visualized as a path on an n by m grid, where n and m are the number of points in each time series.

The algorithm finds the path through this grid that minimizes the total distance. The function that measures the distance between each set of points can be anything we want. To restrict the number of possible paths, we restrict the possible points that can be connected, by requiring the path to be monotonically increasing, limiting the slope, and restricting how far away from a straight line the path can stray. The difference between standard Euclidean distance and DTW can be demonstrated graphically. In this case I use two sin curves. The gray lines between the series show which points the distance measurements are done between.

Notice the warping at the start and end of the series, and how the points in the middle have identical y-values, thus minimizing the total distance.

What are the practical applications of DTW in trading? As we’ll see in the next parts, it can be used to cluster time series. It can also be used to average time series, with the DBA algorithm. Another potential use is k-nn pattern matching strategies, which I have experimented with a bit…some quick tests showed small but persistent improvements in performance over Euclidean distance.

If you want to test it out yourselves, there are plenty of tools out there. I’m using the NDTW .NET library. There are libraries available for R and python as well.

]]>QDMS uses a client/server model. The server acts as a broker between clients and external data sources. It also manages metadata on instruments, and local storage of historical data. Finally it also functions as a UI for managing the metadata & data, as well as importing/exporting data from and to CSV files.

Currently it supports two external data sources: Interactive Brokers and Yahoo, but I’ll be adding more in the future.

Note that it’s not “production-ready” right now. There are still a few bugs to iron out, and it also uses unstable 3rd party libraries (the alpha version of the MySQL .NET connector, because it’s the only one that supports Entity Framework 6). All the “core” functionality is implemented and functional, however.

You can find the code here: https://github.com/qusma/qdms. I’m releasing it under the permissive BSD License. Contributions by way of pull requests are more than welcome. If you’d like to make any feature requests or just flame me for the quality of my code, leave a comment right here.

I personally use it in my backtester and portfolio performance evaluation applications, and I’m in the process of integrating it with my live trading app as well.

Using the client is easy, let’s take a look at some code examples:

**Getting a list of all instruments:**

List<Instrument> instruments = client.FindInstruments();

**Searching for a specific instrument, for example SPY:**

Instrument spy = client.FindInstruments(new Instrument { Symbol = "SPY" }).FirstOrDefault();

**Requesting historical data:**

var histRequest = new HistoricalDataRequest( spy, BarSize.OneDay, new DateTime(2013, 1, 1), new DateTime(2013, 1, 15)); client.RequestHistoricalData(histRequest);

**Requesting real time data:**

var rtRequest = new RealTimeDataRequest(spy, BarSize.OneSecond); client.RequestRealTimeData(rtRequest);

]]>

The main idea behind WVF is that it acts similarly to VIX in high-volatility situations, possibly enough to serve as an implied volatility substitute in cases where such an index does not exist. It can be useful to “confirm” signals based on implied volatility, or to replace them completely in cases where no implied volatility index exists. First of all let’s take a look at an updated VIX & SPY WVF chart:

The first post was a while ago, so let’s check how VIX and WVF have performed for SPY since then. The number of signals is very small, and WVF alone has underperformed compared to its historical results, but once again we see that the combination of VIX and WVF offered by far the best results:

Let’s take a look at how these signals work internationally:

It’s clear that using only VIX is pretty useless. Overall the returns are not significantly different from zero, and are even negative in many cases. Let’s check out WVF, which appears to work far better across most ETFs:

Finally, when VIX and WVF extreme movements coincide, the results look fantastic:

Note that even in cases where WVF alone did not show good results (the VT and WVF ETFs for example), combining VIX and WVF still results in great improvement. There is an important, general lesson here about using non-price data as trade set-ups. With few exceptions, implied volatility, breadth, seasonality, etc. need to be “confirmed” by price to actually be useful.

]]>The QUSMA Data Management System (QDMS) works as a centralized data access point: it connects clients to external historical/real time data sources, manages metadata on instruments, and also provides local storage for historical data.

I was heavily influenced by the MultiCharts approach, though my own system is of course a bit less complex. I based a lot of the instrument metadata management as well as some of the UI design on the MC “QuoteManager” application as I think their approach is quite intuitive.

The system is designed in a modular fashion so it’s trivial to add additional data sources (as well as alternative local storage mechanisms…if I ever start storing tick data I will have to move away from my current relational database storage mechanism). The interfaces for writing external data source modules are very simple right now:

A couple screenshots of the server interface:

There’s also the client side of things, here’s the interface for selecting data series in the backtester:

The client/server approach lets multiple clients use the same data stream. For example, if computations are distributed over multiple boxes and each client needs access to the same real time data, only a single connection to the external data source is required: the data is then distributed by the broker to every client that has requested that stream.

There is also the ability to push data into the local storage. One possible use for this is saving results from a backtest, then using that equity curve as a benchmark in a performance evaluation application.

I’m probably going to open source this project eventually, but right now I’m using a couple of proprietary libraries that prevent me from distributing it. It’ll take a bit of work to “disentangle” those bits. In any case I’m striving to comment well and write in a good style so that opening up the code will be relatively painless.

I learned a ton writing the QDMS because it was an opportunity to use a bunch of interesting libraries and technologies that I had never touched before: ZeroMQ, Protocol Buffers, the Entity Framework, WPF, NLog, and Reactive Extensions. I was amazed at the performance of ZMQ: out of the box, in a simple test using a single socket and a single thread, it managed to transfer nearly 200 OHLC bars per millisecond.

There’s still a bit of work to be done: one major issue is that there is no way to construct lower-frequency bars from higher-frequency data (e.g. daily bars made from 1-minute data), and only time-based bars are possible. The biggest missing piece however is generating continuous futures data. It’s a much harder problem than it seems at first glance because it’s necessary to incorporate a great deal of flexibility both in terms of the futures expiration rules and the rollover rules.

I haven’t done any actual research in quite a while because I’ve been preoccupied with coding but I’ll be back soon! I’ve been accumulating a giant backlog of ideas that are waiting to be tested. Hopefully my new tools will be good enough to give some special insights. In any case, I can’t wait to get started.

]]>There are of course some “standard” straightness metrics. R-squared is the most popular, and it works pretty well. I like to raise it to the 4th power or so in order to magnify small differences and make it a bit more “readable”. Another popular metric is the K-Ratio, of which there are at least 3 different versions floating around. The K-Ratio also takes returns into account, so it’s not purely a straightness measure. I prefer the Zephyr version which is calculated as the slope of the equity curve divided by its standard error.

Let’s see if we can construct some alternatives. To start out, we need a benchmark to measure straightness against. That is the “ideal line”. It is the straight line that connects the first and last points of the equity curve. The further away the equity curve is from the ideal, the less desirable it is.

There are certain obvious principles we can derive from this simple analysis:

- We want to minimize the area of deviation from the ideal line.
- The further away we are from the ideal, the worse (non-linearly).
- Being below the ideal is worse than being above it.
- Being below the ideal for long periods of time is undesirable.

We can easily quantify these ideas into a useful measure of equity curve straightness by using numbers such as the total area of deviation from the ideal, the volatility of the deviation, the length of time spent below the ideal, etc.

An interesting heuristic to look at is the number of times the equity curve crosses the ideal line. The closer the equity tracks the ideal line, the more times it will cross it. This metric fails in idealized tests, but works well in real-world scenarios. It also tends to fail when there are few trades in the sample. Divide the number by the total number of observations in the sample to standardize it.

A similar metric is the average drawdown length. Perhaps it is even more useful because presumably long deviations below the ideal are more important than deviations above it, and the number of crosses does not differentiate between the two.

Some other numbers I think may be interesting: the ratio between the area of difference above and below the ideal, the volatility of the difference, the volatility of the difference below the ideal, average absolute deviation, and average absolute deviation below the ideal (both standardized to the magnitude of the curve).

I created a metric that arbitrarily and haphazardly combines some of the above concepts, and I’m calling it the** Q**USMA **E**quity **C**urve **S**traightness, **D**ownward **D**eviation, and **S**tability **M**easure (QECSDDSM, pronounced /keɪks-du-sʌm/). It is intended purely as a straightness measure, and does not take into account returns or the slope of the equity curve. It is calculated as follows (see the excel file at the bottom to make sense of it):

Let’s take a look at some extreme examples:

First of all, note that both the Sharpe ratio and the MAR ratio would select the “wrong” strategy if they were used naively: they both prefer Series 3 & 4 over Series 2. Both the K-Ratio and QECSDDSM correctly prefer the first two. Note that the number of crosses is a useless metric here because the most perfect line has very few of them, simply due to being “too straight”. The ratio of the areas above and below the idealized line is not very useful in these scenarios because they are so extreme.

In general most of the numbers roughly agree with each other in terms of ordering the curves from best to worst, so the actual formulation of QECSDDSM doesn’t really matter all that much.

Let’s look at a slightly more realistic assortment of equity curves:

In this case the intuition behind the number of crosses metric becomes obvious. Interestingly QECSDDSM is the only metric to prefer Series 4 to Series 3, which I think is undesirable. Series 4 highlights a problem with the metrics that measure volatility or focus on the area below the ideal: simply having very few trades “gets around” them and produces an overly-high score. Again the Sharpe and MAR ratios produce an “incorrect” ranking by preferring Series 3 to Series 2. The difference mainly comes from the fact that the curve is not very volatile and does not spend a lot of time below the ideal. Some fine tuning of the parameters should smooth things out pretty easily, though.

Another potentially interesting approach to the issue would be to do some sort of regime change detection on the returns (here’s one simple approach). A straight curve will obviously have fewer changes in the average of the returns.

Finally, here’s an excel file that you can play around with.

]]>Based on the Aforge.NET library, it offers tons of useful stuff for traders: matrices, descriptive statistics, probability distributions, optimization methods, regression, PCA, as well as a wide array of machine learning algorithms. I use it all the time.

Tons of useful math and stats functions, matrices & linear algebra (very fast), PCA, regression, unsupervised machine learning.

Probability distributions & random number generation, linear algebra, simple statistical analysis, various useful math functions.

Various extremely useful data structures: double ended queue, dictionary with multiple values per key, red-black tree, ordered dictionary/list.

A port of quantlib to C#. Not just derivatives, there’s a lot of useful stuff in here such as calendars with holidays for a very wide array of markets. There’s also NQuantLib which I haven’t tried.

Time handling done right. You need this.

Lets you use R from your .NET applications. Slow, buggy as hell, hard to work with, but some times it’s very useful to have access to some of the more obscure/specialized R libraries.

Regression, some machine learning, PCA, optimization, and linear algebra, simple hypothesis testing.

There are several commercial options available as well, such as Extreme Optimization and NMath. I haven’t used either of them so I can’t comment on their quality.

]]>I investigate mean reversion in equity ETF prices at the daily frequency by employing a simple technical indicator, Internal Bar Strength (IBS). IBS is based on the position of the day’s close in relation to the day’s range. I use it to forecast close-to-close returns with statistically and economically significant results for most instruments. A simple strategy based on IBS generates an average alpha of over 30% p.a. before transaction costs. I show that equity index ETFs have had strong and consistent mean reverting tendencies since the 90s, and that these effects can be exploited as part of a profitable trading strategy. The IBS effect is stronger during times of high volatility, in bear markets, after high-range days, after high-volume days, and early in the week.

**Feedback is highly appreciated, either in the comments below or by email to qusmablog at gmail dot com.**

Some of the interesting things you’ll find within:

Update: the comparison chart for the Australian ETF now correctly uses EWA instead of EWO (the Austrian ETF).

]]>