k-NN Candlestick Pattern Search Extensions: Combining Forecasts

The second, and probably final, followup to the Mining for Three Day Candlestick Patterns post. Previously, we improved performance by adding more data to the search. In this post we’ll try to improve the system further by combining multiple predictors. The central question is how to combine the forecasts. I test averaging, weighted averaging, regression, and a voting scheme and compare them against a baseline one-predictor strategy.


Combining predictors is a standard tactic in machine learning, but the case of k-NN predictors is a bit of an outlier. Typical ensemble methods depend on generating variations in the data set in order to generate different and complementary predictors (as in the cases of boosting and bagging). This doesn’t work very well with nearest neighbor predictors, however, because they tend to be insensitive to variations in the data set. So what can we vary? The choice of k, the choice of inputs, the choice of distance measure for the nearest neighbors, and some pre-processing options such as whether to adjust for volatility or not.

I am not going to make any variation in outputs as that’s reserved for a post of its own. The idea is pretty simple: it’s essentially a random forest with k-NN predictors instead of decision trees (here’s an interesting paper on it).

So we’re left with k, sum of absolute or sum of square distances, and volatility adjustment. I picked 10 combinations of these options:


The k values were picked at random and I’m sure it’s possible to do better by optimizing them using cross validation.

The signals obviously overlap significantly, and have similar stats when used one-by-one:

Long position threshold: forecast > 5 basis points & IBS < 0.5.

Long signal stats. Long position threshold: forecast > 5 basis points & IBS < 0.5.


Short signal stats. Short position threshold: forecast < -10 basis points & IBS > 0.5.

The instrument traded is SPY. Additional data is taken from the following instruments for the pattern search: EWY, EWD, EWC, EWQ, EWU, EWA, EWP, EWH, EWL, EFA, EPP, EWM, EWI, EWG, EWO, IWM, QQQ, EWS, EWT, and EWJ. The thresholds in each case are adjusted to result in a similar length of time spent in the market. Position sizing is done based on the 10-day realized volatility of SPY, as described in this post: leverage is equal to 20% divided by 10-day realized annualized standard deviation, with a maximum leverage of 200%. Finally, an IBS filter is applied that allows long positions only when IBS < 0.5 and short positions only when IBS > 0.5.

The baseline is the PF3 predictor: k = 75, square distance measure, no volatility adjustment. Here’s the equity curve:


PF3 predictor equity curve. $0.005 per share in commissions.



The simplest approach is obviously to just average the 10 forecasts and then use the average value to generate trades. A long position is taken when the average forecast is greater than 15 basis points, and a short position when the average is smaller than -12.5 basis points. Here’s what the equity curve looks like:

Equity curve using average forecast. $0.005 per share in commissions.

Equity curve using average forecast. $0.005 per share in commissions.

It’s interesting to note that the dispersion of forecasts is inversely related to the accuracy of the average: the smaller the standard deviation of the forecasts, the more accurate they are. Unfortunately effect is marginal and thus not particularly useful for improving the strategy.


Weighted Averaging

A simple extension, that generates slightly better stats, is to weigh each forecast before averaging. There’s a wide array of stats one can use here (Sharpe/Sortino/MAR ratios are obvious candidates); I picked the mean square error. The inverse of the MSE becomes the forecast’s weight, so that smaller errors result in greater weights. The same thresholds as above are used to generate signals. The weights provide a slight improvement both in terms of Sharpe and MAR ratios. The equity curve:

Weighted average

Equity curve using weighted average forecast, with weights equal to the inverse of the mean square error. $0.005 per share in commissions.



Using a threshold for each forecast, (>5 basis points for a “long” vote, and <-10 basis points for a “short” vote), each predictor is assigned a long or short vote. The overlap between the votes is significant, between 88% and 97% for different estimators. How many votes should we require for a trade? It quickly becomes obvious that simple majority voting isn’t enough, as only near-unanimous decisions provide worthwhile predictions. The average next-day return when there are between 1 and 8 long votes is 0.4 basis points. The average return after 9 or 10 long votes is 23 basis points.

The resulting equity curve looks like this:


Equity curve using voting system. 9 or more votes required to take a position. $0.005 per share in commissions.


Ordinary Least Squares

It’s also possible to combine the forecasts using regression, with next-day returns as the dependent variable and the k-NN predictor forecasts as the independent ones.

The distribution of forecasts with OLS is very tightly clustered around 0, and for some reason higher forecasts are not associated with higher next-day returns (as they are for the 3 methods above). I don’t really understand why this is the case. The thresholds for trades are 0.5 basis points for a long trade, and -0.5 basis points for a short trade.

An issue here is, of course, multicollinearity due to the similarity of the independent variables. This can lead to, among other problems, overfitting (which is usually characterized by very large absolute values of the coefficients). Using ridge regression solves that issue by limiting the absolute value of coefficients.

A potentially interesting idea would be to constrain the coefficients to positive values, which might lessen the overfitting effects and also make much more sense on an intuitive level (after all, we know all the forecasts are similarly accurate, so negative coefficients don’t make much sense).


Equity curve using OLS regression. $0.005 per share in commissions.


Ridge Regression

If multicollinearity is a significant problem, we can use ridge regression to solve it. It offer significant improvement over the OLS approach, but it still fares badly compared to the one-predictor case. The same thresholds as in the OLS approach are used. Here’s the equity curve:


Equity curve using ridge regression. $0.005 per share in commissions.



Here are the stats for the single-predictor base case and all the combination methods:


All of them other than the voting failed horribly. I’m not sure why, but it’s good to know. The improvement provided by the voting system is sizable, however. Not only does the voting-based strategy achieve significantly higher risk-adjusted returns, it does it while spending 15% less time in the market. Those results are also easy to improve on by simply adding more predictors. The marginal gain from each new predictor will be diminishing, but there is definitely more value to wring out of it. And this is just with 3-day patterns: we can easily add 2 and 4 day patterns into the mix as well.

Other Possibilities

A wide array of machine learning methods can be used to combine predictions. Especially if the number of forecasts grew larger, techniques such as random forests or ANNs would be interesting to investigate. As long as simpler methods work very well I think there is little reason to increase the complexity (not to mention the opaqueness) of the strategy.

k-NN Candlestick Pattern Search Extensions: More Data

This is a followup to the Mining for Three Day Candlestick Patterns post. If you haven’t read the original post, do so now because I’m not going to repeat the basic mechanics of the strategy. While the approach was somewhat fruitful, it also had some obvious problems: it only seems to work in bearish or high volatility market regimes, and it couldn’t produce good short signals. The main idea I had to resolve these issues was simply to get more data.

equity curves with without IBS

Original strategy using only SPY data. Note long stretches of flat results.

That is easier said than done. Could we use mutual funds or index values to extend the dataset backwards? No, because the daily high/low values are inaccurate. The only alternative we are left with is using data from other instruments. So I picked a broad selection of equity ETFs to include: EWY, EWD, EWC, EWQ, EWU, EWA, EWP, EWH, EWL, EFA, EPP, EWM, EWI, EWG, EWO, IWM, QQQ, EWS, EWT, and EWJ.

The selection was comprehensive and unoptimized. I think you could do some sort of walk-forward optimization that picks the best combination of securities to include in the data set. I’m not sure how much that would help.

The additional data worked fantastically well, resolving both problems. The number of opportunities to trade increased significantly, long signals work very nicely under all market conditions, and predicting negative returns works far better. There was also an unexpected benefit: far less time is needed before the forecasts become usable. In the original implementation I waited 2000 days before starting to use the forecasts. With the extended data set this can be cut to 500, thus letting the backtest cover a longer period.

Performance-wise there were no problems, as the Accord .NET k-d tree implementation that I use is very quick. Finding the nearest 75 points in a data set of approximately 100,000, in 11 dimensions, takes less than 2 milliseconds on my overclocked 2500K.

The settings used in the search are simple: the length of the patterns is 3 days, the 75 closest ones are used to construct a forecast by averaging their next-day returns, and distance is calculated as the sum of squared distances in every dimension. Trades are taken when the forecast is above/below a certain threshold. They are then passed through a filter which only allows long positions when IBS < 0.5 and short positions only when IBS > 0.5.

It should be noted that using traditional measures of “fit” does not work very well with pattern matching. Adding the above instruments actually increases the RMSE, despite significantly increasing the trading performance of the forecasts.

A look at forecasts vs realized next-day returns:

PatternFinderMultiInput (x-axes) vs next day returns (y-axies), when IBS < 0.5

PatternFinderMultiInput (x-axes) vs next day returns (y-axies), for IBS < 0.5 and forecast > 0

An important aspect to note is that even marginally positive forecasts work very well. For example, with the extended dataset, forecasts between 5 and 10 basis points resulted in an average 21 bp return the next day. On the other hand, using SPY data only, the return for those forecasts was just 5 basis points. What this means is that there are many more trades to take, which is what allows the strategy to do well in all market environments. Here’s the long-only equity curve:

Long position taken when IBS  5 basis points. $0.005 per share in commissions.

Long position taken when IBS < 0.5 and forecast > 5 basis points. $0.005 per share in commissions.


A couple of charts to analyze the sensitivity of the long-only strategy’s results to changes in inputs (IBS limit and minimum forecast limit):

sensitivity analysis

The additional data also has the benefit of making shorting possible. The equity curve doesn’t look as good, but it’s still a giant improvement over zero predictive ability on the short side:

multi input short only

Short position taken when IBS > 0.5 and forecast < -20 basis points. $0.005 per share in commissions.


Finally, the long and short strategies combined, along with the stats:

multi input long short

Long and short strategies above combined. $0.005 per share in commissions.



The concept also seems to work for stocks. For example, I tested a long-only strategy on AAPL, using the same settings as above, both with and without the addition of MSFT data. The Microsoft data improved every aspect of the results, with surprisingly consistent performance over nearly 20 years:


It would be interesting to try to apply this on a more massive scale, by increasing the data set to something like all S&P 500 stocks. Some technical restrictions prevent me from doing that right now, but I’ll come back to the idea in the future.

Blueprint for a Backtesting and Trading Software Suite

Posting has been slow lately because I’ve been busy with a bunch of other stuff, including the CFA Level 3 exam last weekend. I’ve also begun work on a very ambitious project: a fully-featured all-in-one backtesting and live trading suite, which is what prompted this post.

Over the last half year or so I’ve been moving toward more complex tools (away from excel, R, and MATLAB), and generally just writing standalone backtesters in C# for every concept I wanted to try out, only using Multicharts for the simplest ideas. This approach is, of course, incredibly inefficient, but the software packages available to “retail” traders are notoriously horrible, and I have nowhere near the capital I’d need to afford “real” tools like QuantFACTORY or Deltix.

The good thing about knowing how to code is that if a tool doesn’t exist you can just write it, and that’s exactly what I’m doing. Proper portfolio-level backtesting and live trading that’ll be able to easily do everything from intraday pairs trading to long term asset allocation and everything in-between, all under the same roof. On the other hand it’s also tailored to my own needs, and as such contains no plans for things like handling fundamental data. Most importantly it’s my dream research platform that’ll let me go from idea, to robust testing & optimization, to implementation very quickly. Here’s what the basic design looks like:


What’s the point of posting about it? I know there are many other people out there facing the same issues I am, so hopefully I can provide some inspiration and ideas on how to solve them. Maybe it’ll prompt some discussion and idea-bouncing, or perhaps even collaboration.

Most of the essential stuff has already been laid down, so basic testing is already possible. A simple example based on my previous post can showcase some essential features. Below you’ll find the code behind the PatternFinder indicator, which uses the Accord.NET library’s k-d tree and k nearest neighbor algorithm implementation to do candlestick pattern searches as discussed here. Many elements are specific to my system, but the core functionality is trivially portable if you want to borrow it.

Note the use of attributes to denote properties as inputs, and set their default values. Options can be serialized/deserialized for easy storage in files or a database. Priority settings allow the user to specify the order of execution, which can be very important in some cases. Indexer access works with [0] being the current bar, [1] being the previous bar, etc. Different methods for historical and real time bars allow for a ton of optimization to speed up processing when time is scarce, though in this case there isn’t much that can be done.


The VariableSeries class is designed to hold time series, synchronize them across the entire parent object, prevent data snooping, etc. The Indicator and Signal classes are all derived from VariableSeries, which is the basis for the system’s modularity. For example, in the PatternFinder indicator, OHLC inputs can be modified by the user through the UI, e.g. to make use of the values of an indicator rather than the instrument data.


The backtesting analysis stuff is still in its early stages, but again the foundations have been laid. Here are some stats using a two-day PatternFinder combined with IBS, applied on SPY:


Here’s the first iteration of the signal analysis interface. I have added 3 more signals to the backtest: going long for 1 day at every 15 day low close, the set-up Rob Hanna posted yesterday over at Quantifiable Edges (staying in for 5 days after the set-up appears), and UDIDSRI. The idea is to be able to easily spot redundant set-ups, find synergies or anti-synergies between signals, and easily get an idea of the marginal value added by any one particular signal.


And here’s some basic Monte Carlo simulation stuff, with confidence intervals for cumulative returns and PDF/CDF of the maximum drawdown distribution:


Here’s the code for the PatternFinder indicator. Obviously it’s written for my platform, but it should be easily portable. The “meat” is all in CalcHistorical() and GetExpectancy().

/// <summary>
/// K nearest neighbor search for candlestick patterns
/// </summary>
public class PatternFinder : Indicator
    public int PatternLength { get; set; }

    public int MatchCount { get; set; }

    public int MinimumWindowSize { get; set; }

    public bool VolatilityAdjusted { get; set; }

    public bool Overnight { get; set; }

    public bool WeighExpectancyByDistance { get; set; }

    public bool Classification { get; set; }

    public double ClassificationLimit { get; set; }

    public string DistanceType { get; set; }

    public VariableSeries<decimal> Open { get; set; }

    public VariableSeries<decimal> High { get; set; }

    public VariableSeries<decimal> Low { get; set; }

    public VariableSeries<decimal> Close { get; set; }

    public VariableSeries<decimal> AdjClose { get; set; }

    private VariableSeries<double> returns;
    private VariableSeries<double> stDev;
    private KDTree<double> _tree;

    public PatternFinder(QSwing parent, string name = "PatternFinder", int BarsCount = 1000)
        : base(parent, name, BarsCount)
        Priority = 1;
        returns = new VariableSeries<double>(parent, BarsCount);
        stDev = new VariableSeries<double>(parent, BarsCount) { DefaultValue = 1 };

    internal override void Startup()
        _tree = new KDTree<double>(PatternLength * 4 - 1);
        switch (DistanceType)
            case "Euclidean":
                _tree.Distance = Accord.Math.Distance.Euclidean;
            case "Absolute":
                _tree.Distance = AbsDistance;
            case "Chebyshev":
                _tree.Distance = Accord.Math.Distance.Chebyshev;
                _tree.Distance = Accord.Math.Distance.Euclidean;

    public override void CalcHistorical()
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar < PatternLength + 1) return;

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());

        double ret = Overnight ? (double)(Open[0] / Close[1] - 1) : (double)(AdjClose[0] / AdjClose[1] - 1);
        double adjret = ret / stDev[0];

        if (Classification)
            _tree.Add(GetCoords(1), adjret > ClassificationLimit ? 1 : 0);
            _tree.Add(GetCoords(1), adjret);

    public override void CalcRealTime()
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());

    private double GetExpectancy(double[] coords)
        if (!WeighExpectancyByDistance)
            return _tree.Nearest(coords, MatchCount).Average(x => x.Node.Value) * stDev[0];
            var nodes = _tree.Nearest(coords, MatchCount);
            double totweight = nodes.Sum(x => 1 / Math.Pow(x.Distance, 2));
            return nodes.Sum(x => x.Node.Value * ((1 / Math.Pow(x.Distance, 2)) / totweight)) * stDev[0];

    private static double AbsDistance(double[] x, double[] y)
        return x.Select((t, i) => Math.Abs(t - y[i])).Sum();

    private double[] GetCoords(int offset = 0)
        double[] coords = new double[PatternLength * 4 - 1];
        for (int i = 0; i < PatternLength; i++)
            coords[4 * i] = (double)(Open[i + offset] / Close[i + offset]);
            coords[4 * i + 1] = (double)(High[i + offset] / Close[i + offset]);
            coords[4 * i + 2] = (double)(Low[i + offset] / Close[i + offset]);

            if (i < PatternLength - 1)
                coords[4 * i + 3] = (double)(Close[i + offset] / Close[i + 1 + offset]);
        return coords;

Coming up Soon™: a series of posts on cross validation, an in-depth paper on IBS, and possibly a theory-heavy paper on the low volatility effect.