I’ve been thinking a lot about candlestick patterns lately but grew tired of trying to generate ideas and instead decided to mine for them. I must confess I didn’t expect much from such a simplistic approach, so I was pleasantly surprised to see it working well. Unfortunately I wasn’t able to discover any short set-ups. The general bias of equity markets toward the upside makes it difficult to find enough instances of patterns that are followed by negative returns.
The idea is to mine past data for similar 3 day patterns, and then use that information to make trading decisions. There are several choices we must make:
- The size of the lookback window. I use an expanding window that starts at 2000 days.
- Once we find similar patterns, how do we choose which ones to use?
- How do we measure the similarity between the patterns?
To fully describe a three day candlestick pattern we need 11 numbers. The close-to-close percentage change from day 1 to day 2, and from day 2 to day 3, as well as the positions of the open, high, and low relative to the close for each day.
To measure the degree of similarity between any two 3-day patterns, I tried both the sum of absolute differences and the sum of the squared differences between those 11 numbers; the results were quite similar. It would be interesting to try to optimize individual weights for each number, as I imagine some are more important than others.
The final step is to select a number of the closest patterns we find, and simply average their next-day returns to arrive at an expected return.
How do we choose which patterns are “close enough” to use? Choose too few and the sample will be too small. Choose too many and you risk using irrelevant data. That’s a number that we’ll have to optimize.
When comparing the results we also run into another problem: the smaller the sample, the more spread out the expected return estimates will be, which means more trades will be chosen given a certain minimum limit for entry. My solution was to choose a different limit for trade entry, such that all sample sizes would generate the same number of days in the market (300 in this case). Here are the walk-forward results:
The trade-off between sample size and relevance is clear, and the “sweet spot” appears to be somewhere in the 50-150 range or so, for both the absolute difference and squared difference approaches. Depending on how selective you want to be, you can decrease the limit and trade off more trades for lower expected returns. For me, 30 bp is a reasonable area to aim for.
A nice little addition is to use IBS by filtering out any trades with IBS > 50%. Using squared differences, I select the 50 closest patterns. When their average next-day return is greater than 0.2%, a long position is taken. The results are predictably great:
The IBS filter removes close to 40% of days in the market yet maintains essentially the same CAGR, while also more than halving the maximum drawdown.
Let’s take a look at some of the actual patterns. Using squared differences, the 50 closest patterns, and a 0.2% limit, the last successful trade was on February 26, 2013. The expected return on that day was 0.307%. Here’s what those 3 days looked like, as well as the 5 closest historical patterns:
As you can see below, even the 50th closest pattern seems to be, based on visual inspection, rather close. The “main idea” of the pattern seems to be there:
Here are the stats from a bunch of different equity index ETFs, using square differences, the 50 closest patterns, 0.2% expected return limit and the IBS < 0.5 filter.
The 0.2% limit seems to be too low for some of them, producing too many trades. Perhaps setting an appropriate limit per-instrument would be a good idea.
The obvious path forward is to also produce 2-day, 4-day, 5-day, etc. versions, perhaps with optimized distance weighting and some outlier filtering, and combine them all in a nice little ensemble to get your predictions out of. The implementation is left as an exercise for the reader.
Emil says:
I did some research with candlestick patterns and a genetic algorithm. I measured the normalized return, range, volume and close-to-range then let the algorithm optimize how the best three day patterns would look like. Apparently it was usually a variation of the IBS indicator. I think the problem with data mining price patterns is that you just rediscover some general market property, like mean reversion in this case. The trick is of course how to set up the search.
qusma says:
I didn’t even consider volume because I’ve found it to be rather useful in general, but I’ll definitely have a look at it.
Naturally, patterns only “work” because they are a recurring expression of certain underlying processes. I don’t see this as a problem. Mean reversion isn’t just a single property: compare, for example, IBS(1), DV(2), and RSI(3). These are all short-term mean-reversion indicators, but trades based on them will be far from 100% overlapping. Mean reversion happens both on different time-frames and in response to different “types” of price action.
Even if the pattern approach adds zero new trade opportunities because it simply repeats the capture of an underlying process that we have modeled through other means, it can still serve a useful confirmatory/filtering purpose. It would be interesting to compare, for example, the returns to a simple RSI strategy when it overlaps with a pattern signal and when it does not.
Joe says:
Quite interesting work, thanks! Are you familiar with the price action lab software that finds up to 6-day price patterns and after how many days (bars) do you think significance is lost?
qusma says:
Thanks!
I’ve heard about price action lab but I’ve never used it. Not sure what the upper limit for relevance is, either…I haven’t had the time to get up there yet.
DeanJ says:
Great work, enjoying the blog immensely. I had a couple of questions on this one:
From what I can make out the system takes a position at the end of the third day of each 3 day pattern. If this is the case then I’m not sure what this means:
“I select the 50 closest patterns. When their average next-day return is greater than 0.2%, a long position is taken”.
I’m assuming this means at the end of the third day a long is taken with 0.2% being the minimum return filter? I thought the 0.2% actually applied to the minimum 4th day return criteria?
The second part is the IBS. All of the examples above look the IBS is > 50 for the 3rd candle. So I’m assuming those are pre-IBS and the IBS filter <50% is on the third candle?
Thanks
Dean
qusma says:
Hi Dean,
The minimum criteria apply to the expected return from the 3rd to the 4th day (and the expectation is found by simply averaging the next-day return after each of those 50 closest patterns).
You are of course right on the examples, those are pre-IBS. The filter is applied on the third candle.
k-NN Candlestick Pattern Search Extensions: More Data : QUSMA says:
[…] is a followup to the Mining for Three Day Candlestick Patterns post. If you haven’t read the original post, do so now because I’m not going to repeat […]
Mat says:
So How are you closing the trades I can’t seem to find that. Once pattern is matched and trade is opened, when is it closed?
qusma says:
Hi Mat,
The forecast is up to the next close, so that’s when I exit. The only exception is if the next forecast also satisfies the criteria to go long, in which case the position is simply maintained.
Mat says:
Thanks for response. Maybe you should try some sort of trailing stop close or maybe a pattern close see how they perform. Just like to add your blog is excellent. Other articles awesome too. Has this performed better than simply using the IBS filter with optimised thresholds ie buy when >x close/ sell when <y x,y found from data
qusma says:
I don’t use stops in my trading so using a trailing stop would be out of character for me. I don’t even have the means of testing it right now because I haven’t implemented stop orders in my backtesting software yet.
Using patterns to close the trade would probably not be a great idea because they’re unreliable…you can go for quite a long time before seeing a significant pattern, especially one that would tell you to exit.
If I wanted to make the trades longer I’d probably use something like a cross above the 5 day SMA or a neutral RSI(3) reading to exit.
Looking at the 10 days after a pattern signal is triggered, the returns are _slightly_ above average, but only the first day’s returns are statistically significantly different from an average day.
That being said it is of course possible to modify the thing that we are forecasting. If we forecasted 10-day returns instead of 1-day returns things would be very different.
And yes, of course this is a gigantic improvement over simply using IBS. The pattern + IBS combo gives roughly 50% higher next-day returns than IBS alone, and this is across the board (i.e. even at very extreme IBS values).
Alex says:
Quite interesting post – thank you so much!
I am curious what you think on the following – I replicated your analysis with only SPY as of 1990 till now using daily data. Calculated 11 numbers using discrete type return on capital formula on each day like you described above. Than feed it to kd-tree for k-nn searches of 50 closest neighbours based on Euclidean distances. And used expanding lookback window of 2000 as well as fixed-sized 2000 window. I did not use any IBS-like filters and fees. The final result is that expected return is not quite different from zero in contrast to your results.
I tried several implementations of kd-tree algo as well as bd-tree, so the findings can not be attributed to k-nn search algo as far as I can see.
What do you thing could be wrong? (I missed something?)
qusma says:
Hi Alex,
If you want to mail me your code I could have a look. I have posted my own code here: http://qusma.com/2013/06/08/blueprint-of-a-backtesting-and-trading-suite/, perhaps that can help you.
Alex says:
I looked through your code but was not able to fund much difference, but I am quite new to c# and could overlooked something. Below is what I did using R
qusma says:
Hi Alex, I ran your code and it seems to be working perfectly fine. The results are not identical to my implementation, but very close. If you look at instances where the forecasted return is >20bp for example, next-day returns are ~28bp (in my implementation 27bp for some reason).
Alex says:
Still can’t understand where you get the numbers =)
I terms of the R code above: mean(mrets_oos[which(mrets_ins>20/100/100)])*100 ~= 0.065%
or just using scatter plot
qusma says:
> mean(mrets_oos[which(mrets_ins>20/100/100)])
[1] 0.002773157
Not sure why you’re getting different results. What version of R/the packages are you using?
Alex says:
Quite honestly I have no idea what was wrong – re-ran the program and get the same results as yours.
Thanks you so much for your help!!
qusma says:
Good to hear.
By the way, I HIGHLY recommend using a “real” language for this sort of thing. You can learn the basics of C# in a week. For data-intensive applications like this one the performance gains are insane. My C# implementation is at least two orders of magnitude quicker. Only having to wait a second or two for the results makes research not only much easier, but more enjoyable as well.
Alex says:
I am absolutely agree with you!
Every time I need a boost in performance I resort to parallelism or kind of approximations or Rcpp sometimes. For this particular example I get approximately one order of magnitude boost using 8 cores and approximate kd-tree instead of exact one. And so every time I can not convince myself to start learning these low-level languages.
Maybe you could recommend some book to start with for those who are not completely new? I heard as well that Python is much faster for data intensive tasks than R, so have no idea what to learn better.
qusma says:
I would recommend either C# In Depth or C# 4.0 The Complete Reference.
quantivity says:
Here is parallelized R version, which will utilize all local cores:
library("foreach")
library("quantmod")
library("RANN")
library("snow")
…dataset setup…
cl <- makeCluster(detectCores())
clusterCall(cl, worker.init <- function(packages) {
for (p in packages) { library(p, character.only=TRUE) }; NULL
}, c("xts", "zoo", "RANN"))
registerDoParallel(cl)
mrets_oos <- foreach(indx=index(dataset)[lookback:nrow(dataset)]) %dopar% {
…for-loop body…
# return out-of-sample predictor value
if (mean(tmp[whichones[-1],]$predictor) > 0.002) {
tmp[whichones[1],]$predictor
}
else { -1 }
}
stopCluster(cl)
Vladimir says:
I may be missing something, but it looks like the R code has a look-ahead bias (the closing price of a bar is unknown until the bar closes but we use it in features). Also, maybe it’s better to replace the second parameter of nn2 with tmp[nrow(tmp), -1] to increase speed? Anyway, I may be missing something obvious, so please correct me if I’m wrong. Here’s my R code, which runs relatively fast.
quantivity says:
Vladimir: look-ahead can be avoided either via backward lagging features (as you did) or forward lagging predictor (as Alex did with
lag(ROC(Ad(x)),k=-1)
). Your suggestion for secondnn2(.)
parameter optimization is valid, although marginal benefit may be modest given majority of processing time appears to be consumed preprocessing the ANN search data structure.Mat says:
Why only daily returns?
Why not get higher frequency data?
That will give you a bigger data set, plus it might give you some surprisingly good results
EURUSD like 10gbs of data is readily available online
Why not use that?
Again I like to say, apart from a bit of vagueness (your blog isn’t always that clear, maybe it’s intentional), this work and creativity is by far the best I have seen as far as quant blogs go. Would be awesome if you could give me your details.
qusma says:
I’m glad you like the blog Mat. If I’m a bit unclear some times, I think it’s because what I’m trying to get across with most of my posts is not a specific trading recommendation (“when X > Y, buy!”) but how I arrived at a strategy, how to think about the market, how to generalize and quantify the things you observe, how to evaluate your ideas and refine them, etc.
As for the frequency, well, daily is the frequency I trade at. Daily is…easy.
Intraday the competition is intense, average returns are going to be smaller, execution is far more important, you don’t have the IBS “crutch”, etc.
I can foresee a lot of complications: what frequency do we use and why? Do we use a fixed-size pattern, or an expanding-size pattern starting at the open? Does time of day matter, and how do we account for it? Do we stick to a time-based exit or do we need targets and stops? Do we isolate the pattern or try to place it within the day’s total range? For the currencies, how do we account for the different dynamics of the Asian/European/American sessions?
I’m not saying that going to higher frequencies is a bad idea. But there are obstacles to overcome, obstacles that simply don’t exist when you’re trading the daily chart.
Send me a mail (qusmablog at gmail dot com) if you want to chat about it more.
Shaun Overton says:
you don’t have the IBS “crutch”
That’s not my experience at all. I grabbed the IBS idea and apply it to EURUSD H1 charts. It works incredibly well in currencies. The lower the time frame, the stronger the IBS effect.
Sam says:
This is really excellent work here. This is an entirely new field for me but I find it very interesting to think about.
My first question: When you are looking back for similar 3-day patterns, why are you using the closest K neighbors, regardless of their absolute distance? That is, the 50 most similar data points might not be that similar at all, yet you’re considering them just as strongly. Why not only look at data points that meet some threshold of similarity?
Second question: Have you thought about making the data points scale invariant? That is, you can normalize each 11-dimensional vector so that all that really matters are the dimensions relative to one another. An example with only 3 dimensions (relative lows for d1, d2, d3, for example): [2%, -1.5%, 3%] would be considered the same as [1%, -.75%, 1.5%] — after normalization they’d both be the same vector [.61, -.46, .91], and so would be 100% similar. Perhaps this ends up as a better way to compute similarity.
Third question: My mind is already racing with different possibilities. For example, of the set of data points which are similar to the last 3 days, one could do all sorts of analysis on the expected result besides merely take average. For example, maybe only look for result sets with a high average and very low standard deviation… or look for certain types of distributions which may increase your confidence that the next day will be positive. You could also add in a few more dimensions, like volume moving average, volatility, etc. How does one decide what to use and what not to use?
Lastly, I feel like an idiot about this, but I’m very confused about the first histogram which is plotting expected return vs. next-day return. Why would “expected return”, which is the average of all similar 3-day periods’ next-day returns, ever be different from “next-day return”? Is this due to the expanding window?
I thought the process works like this, please correct me if wrong: for each day, look at the last 3 days, and collect the 50 most similar last-3-day data points. Of that dataset, look at each 3 day period and compute the average return of the next day. This is your “expected return”. Now, see what actually happens the next day. This is your “next day return”. It seems like over time expected and next-day returns would converge…
Thanks again for the cool write-up and sorry if I’m asking newb questions.
qusma says:
The first question I answered on the other post.
#2: Yeah I’ve tried it and the results are not as good. I think the reason behind that is simply that scale does matter. Markets will behave differently after a 2% drop than they do after a 0.5% drop, even though the pattern may (de-scaled) look similar.
#3: I did try doing some basic analysis, for example there is a relationship between the standard deviation of the returns and the accuracy of the forecast. However, it’s very weak and can safely be ignored. There’s some other stuff that could be fruitful to look at, such as the skewness of the returns in the set.
As for input selection, your imagination is the limit, there’s obviously tons and tons of stuff you can add to the mix. There aren’t really any hard and fast rules on selecting inputs, the marginal effect of adding an input on out of sample performance is obviously what’s important. Exactly how to measure that is up to you.
#4: That’s a great question, deceptively simple.. We would only expect them to converge if the knn forecast was an unbiased estimator of next-day returns. I don’t see any a-priori reason to believe that would be the case, and it’s obviously not borne out in the data. The explanation is that the estimate is biased toward the overall average return (which is something like 0.03%).
An intuitive way to conceptualize this is to think of the sample as being partially made up of the actual population we’re trying to measure, and partially out of “average” days, which are “incorrectly” included. Thus the estimate of the mean will always be biased toward the average daily return.
2013: Lessons Learned and Revisiting Some Studies says:
[…] Mining for Three Day Candlestick Patterns, which also spawned a short series of posts. […]
shahyad says:
This methodology is excellent. How did you get these?کرکره برقی–راهبند اتوماتیک–لامپ کم مصرف
disq_alex says:
Hi, could you please say what do you mean by “as well as the positions of the open, high, and low relative to the close for each day”
Do you mean:
Open1/Close1 ?
Open1-Close1 ?
(Close1-Open1) / Open1 * 100 ?
… ?
Could you please say how do you measure the relative position?