Mining for Three Day Candlestick Patterns

I’ve been thinking a lot about candlestick patterns lately but grew tired of trying to generate ideas and instead decided to mine for them. I must confess I didn’t expect much from such a simplistic approach, so I was pleasantly surprised to see it working well. Unfortunately I wasn’t able to discover any short set-ups. The general bias of equity markets toward the upside makes it difficult to find enough instances of patterns that are followed by negative returns.

The idea is to mine past data for similar 3 day patterns, and then use that information to make trading decisions. There are several choices we must make:

  • The size of the lookback window. I use an expanding window that starts at 2000 days.
  • Once we find similar patterns, how do we choose which ones to use?
  • How do we measure the similarity between the patterns?

To fully describe a three day candlestick pattern we need 11 numbers. The close-to-close percentage change from day 1 to day 2, and from day 2 to day 3, as well as the positions of the open, high, and low relative to the close for each day.

To measure the degree of similarity between any two 3-day patterns, I tried both the sum of absolute differences and the sum of the squared differences between those 11 numbers; the results were quite similar. It would be interesting to try to optimize individual weights for each number, as I imagine some are more important than others.

The final step is to select a number of the closest patterns we find, and simply average their next-day returns to arrive at an expected return.

absolute difference 50 closest exp vs realized

Expected vs realized returns for SPY, 50 closest patterns by absolute difference. Numbers above the bars indicate the number of instances in each bucket.

How do we choose which patterns are “close enough” to use? Choose too few and the sample will be too small. Choose too many and you risk using irrelevant data. That’s a number that we’ll have to optimize.

histogram squared

Histogram of expected return estimates for different sample sizes.

When comparing the results we also run into another problem: the smaller the sample, the more spread out the expected return estimates will be, which means more trades will be chosen given a certain minimum limit for entry. My solution was to choose a different limit for trade entry, such that all sample sizes would generate the same number of days in the market (300 in this case). Here are the walk-forward results:

closest count tests

The trade-off between sample size and relevance is clear, and the “sweet spot” appears to be somewhere in the 50-150 range or so, for both the absolute difference and squared difference approaches. Depending on how selective you want to be, you can decrease the limit and trade off more trades for lower expected returns. For me, 30 bp is a reasonable area to aim for.

A nice little addition is to use IBS by filtering out any trades with IBS > 50%. Using squared differences, I select the 50 closest patterns. When their average next-day return is greater than 0.2%, a long position is taken. The results are predictably great:

equity curves with without IBS

squared 50 closest 0.2pct limit ibs filter results

The IBS filter removes close to 40% of days in the market yet maintains essentially the same CAGR, while also more than halving the maximum drawdown.

Let’s take a look at some of the actual patterns. Using squared differences, the 50 closest patterns, and a 0.2% limit, the last successful trade was on February 26, 2013. The expected return on that day was 0.307%. Here’s what those 3 days looked like, as well as the 5 closest historical patterns:

patterns

As you can see below, even the 50th closest pattern seems to be, based on visual inspection, rather close. The “main idea” of the pattern seems to be there:

patterns 50th closest

Here are the stats from a bunch of different equity index ETFs, using square differences, the 50 closest patterns, 0.2% expected return limit and the IBS < 0.5 filter.

ETFs square 50 closest 0.2 ibs filter results

The 0.2% limit seems to be too low for some of them, producing too many trades. Perhaps setting an appropriate limit per-instrument would be a good idea.

The obvious path forward is to also produce 2-day, 4-day, 5-day, etc. versions, perhaps with optimized distance weighting and some outlier filtering, and combine them all in a nice little ensemble to get your predictions out of. The implementation is left as an exercise for the reader.