Blueprint for a Backtesting and Trading Software Suite

Posting has been slow lately because I’ve been busy with a bunch of other stuff, including the CFA Level 3 exam last weekend. I’ve also begun work on a very ambitious project: a fully-featured all-in-one backtesting and live trading suite, which is what prompted this post.

Over the last half year or so I’ve been moving toward more complex tools (away from excel, R, and MATLAB), and generally just writing standalone backtesters in C# for every concept I wanted to try out, only using Multicharts for the simplest ideas. This approach is, of course, incredibly inefficient, but the software packages available to “retail” traders are notoriously horrible, and I have nowhere near the capital I’d need to afford “real” tools like QuantFACTORY or Deltix.

The good thing about knowing how to code is that if a tool doesn’t exist you can just write it, and that’s exactly what I’m doing. Proper portfolio-level backtesting and live trading that’ll be able to easily do everything from intraday pairs trading to long term asset allocation and everything in-between, all under the same roof. On the other hand it’s also tailored to my own needs, and as such contains no plans for things like handling fundamental data. Most importantly it’s my dream research platform that’ll let me go from idea, to robust testing & optimization, to implementation very quickly. Here’s what the basic design looks like:

What’s the point of posting about it? I know there are many other people out there facing the same issues I am, so hopefully I can provide some inspiration and ideas on how to solve them. Maybe it’ll prompt some discussion and idea-bouncing, or perhaps even collaboration.

Most of the essential stuff has already been laid down, so basic testing is already possible. A simple example based on my previous post can showcase some essential features. Below you’ll find the code behind the PatternFinder indicator, which uses the Accord.NET library’s k-d tree and k nearest neighbor algorithm implementation to do candlestick pattern searches as discussed here. Many elements are specific to my system, but the core functionality is trivially portable if you want to borrow it.

Note the use of attributes to denote properties as inputs, and set their default values. Options can be serialized/deserialized for easy storage in files or a database. Priority settings allow the user to specify the order of execution, which can be very important in some cases. Indexer access works with [0] being the current bar, [1] being the previous bar, etc. Different methods for historical and real time bars allow for a ton of optimization to speed up processing when time is scarce, though in this case there isn’t much that can be done.

The VariableSeries class is designed to hold time series, synchronize them across the entire parent object, prevent data snooping, etc. The Indicator and Signal classes are all derived from VariableSeries, which is the basis for the system’s modularity. For example, in the PatternFinder indicator, OHLC inputs can be modified by the user through the UI, e.g. to make use of the values of an indicator rather than the instrument data.

The backtesting analysis stuff is still in its early stages, but again the foundations have been laid. Here are some stats using a two-day PatternFinder combined with IBS, applied on SPY:

Here’s the first iteration of the signal analysis interface. I have added 3 more signals to the backtest: going long for 1 day at every 15 day low close, the set-up Rob Hanna posted yesterday over at Quantifiable Edges (staying in for 5 days after the set-up appears), and UDIDSRI. The idea is to be able to easily spot redundant set-ups, find synergies or anti-synergies between signals, and easily get an idea of the marginal value added by any one particular signal.

And here’s some basic Monte Carlo simulation stuff, with confidence intervals for cumulative returns and PDF/CDF of the maximum drawdown distribution:

Here’s the code for the PatternFinder indicator. Obviously it’s written for my platform, but it should be easily portable. The “meat” is all in CalcHistorical() and GetExpectancy().

/// <summary>
/// K nearest neighbor search for candlestick patterns
/// </summary>
public class PatternFinder : Indicator
{
    [Input(3)]
    public int PatternLength { get; set; }

    [Input(75)]
    public int MatchCount { get; set; }

    [Input(2000)]
    public int MinimumWindowSize { get; set; }

    [Input(false)]
    public bool VolatilityAdjusted { get; set; }

    [Input(false)]
    public bool Overnight { get; set; }

    [Input(false)]
    public bool WeighExpectancyByDistance { get; set; }

    [Input(false)]
    public bool Classification { get; set; }

    [Input(0.002)]
    public double ClassificationLimit { get; set; }

    [Input("Euclidean")]
    public string DistanceType { get; set; }

    [SeriesInput("Instrument.Open")]
    public VariableSeries<decimal> Open { get; set; }

    [SeriesInput("Instrument.High")]
    public VariableSeries<decimal> High { get; set; }

    [SeriesInput("Instrument.Low")]
    public VariableSeries<decimal> Low { get; set; }

    [SeriesInput("Instrument.Close")]
    public VariableSeries<decimal> Close { get; set; }

    [SeriesInput("Instrument.AdjClose")]
    public VariableSeries<decimal> AdjClose { get; set; }

    private VariableSeries<double> returns;
    private VariableSeries<double> stDev;
    private KDTree<double> _tree;

    public PatternFinder(QSwing parent, string name = "PatternFinder", int BarsCount = 1000)
        : base(parent, name, BarsCount)
    {
        Priority = 1;
        returns = new VariableSeries<double>(parent, BarsCount);
        stDev = new VariableSeries<double>(parent, BarsCount) { DefaultValue = 1 };
    }

    internal override void Startup()
    {
        _tree = new KDTree<double>(PatternLength * 4 - 1);
        switch (DistanceType)
        {
            case "Euclidean":
                _tree.Distance = Accord.Math.Distance.Euclidean;
                break;
            case "Absolute":
                _tree.Distance = AbsDistance;
                break;
            case "Chebyshev":
                _tree.Distance = Accord.Math.Distance.Chebyshev;
                break;
            default:
                _tree.Distance = Accord.Math.Distance.Euclidean;
                break;
        }
    }

    public override void CalcHistorical()
    {
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar < PatternLength + 1) return;

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());

        double ret = Overnight ? (double)(Open[0] / Close[1] - 1) : (double)(AdjClose[0] / AdjClose[1] - 1);
        double adjret = ret / stDev[0];

        if (Classification)
            _tree.Add(GetCoords(1), adjret > ClassificationLimit ? 1 : 0);
        else
            _tree.Add(GetCoords(1), adjret);
    }

    public override void CalcRealTime()
    {
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());
    }

    private double GetExpectancy(double[] coords)
    {
        if (!WeighExpectancyByDistance)
            return _tree.Nearest(coords, MatchCount).Average(x => x.Node.Value) * stDev[0];
        else
        {
            var nodes = _tree.Nearest(coords, MatchCount);
            double totweight = nodes.Sum(x => 1 / Math.Pow(x.Distance, 2));
            return nodes.Sum(x => x.Node.Value * ((1 / Math.Pow(x.Distance, 2)) / totweight)) * stDev[0];
        }
    }

    private static double AbsDistance(double[] x, double[] y)
    {
        return x.Select((t, i) => Math.Abs(t - y[i])).Sum();
    }

    private double[] GetCoords(int offset = 0)
    {
        double[] coords = new double[PatternLength * 4 - 1];
        for (int i = 0; i < PatternLength; i++)
        {
            coords[4 * i] = (double)(Open[i + offset] / Close[i + offset]);
            coords[4 * i + 1] = (double)(High[i + offset] / Close[i + offset]);
            coords[4 * i + 2] = (double)(Low[i + offset] / Close[i + offset]);

            if (i < PatternLength - 1)
                coords[4 * i + 3] = (double)(Close[i + offset] / Close[i + 1 + offset]);
        }
        return coords;
    }
}

Coming up Soon™: a series of posts on cross validation, an in-depth paper on IBS, and possibly a theory-heavy paper on the low volatility effect.

Comments (12)

RMK says:

How can I contact you via email?
- June 9, 2013 at 00:43
- Reply

qusma says:

It’s [email protected].
- June 9, 2013 at 01:07
- Reply

JBK says:

Are you reinventing the wheel?
- June 9, 2013 at 10:43
- Reply

qusma says:

In a way, yes. But only because the wheels that I need are too expensive. The ones that are in my price range are actually hexagonal, made out of bad materials, and/or don’t fit the cars that I want to put them on.
- June 9, 2013 at 13:32
- Reply

Michael Halls-Moore says:

An ambitious project! However, (to bash an analogy to death) I would add that despite the fact you may be “reinventing the wheel”, at least you have full control over your rubber compounds, rim size, nut threading and tyre pressure.

Outsourcing to a backtesting vendor (even some of the biggies) ensures you will always be taking one aspect of your end-to-end system on faith. Now, I can fully understand that there is a significantly higher likelihood of a long-term vendor having less bugs than a home-grown project, but once those bugs are identified, you are at the mercy of the vendor, which isn’t the case when coding a bespoke system. As such, I agree that you’re doing the right thing by rolling your own.

Your diagram is also extremely useful to get a holistic view of the system. However, I was wondering if you could elaborate on your transaction cost handling. I see the slippage/commission component is “scriptable”, but given the myriad of non-linearities inherent in transaction cost implementation how do you foresee the effectiveness of dealing with issues such as market impact?

-Mike (QuantStart.com).
- June 10, 2013 at 08:55
- Reply

qusma says:

Mike,

First of all I am a tiny fish in a giant ocean. Nearly all my trades are in ultra-liquid ETFs like SPY and QQQ. So, an accurate model of my price impact is generally just to set it at zero and forget about it.

That being said, there aren’t really any restrictions on my ability to simulate price impact…any model can be implemented. At the moment the biggest issue would be that my data structure doesn’t do level 2 data, but that’s already in the plans, and should allow for more realistic modeling of price impact if the need ever arises.
- June 10, 2013 at 12:11
- Reply

Michael Halls-Moore says:

Ahhh…in that case you probably have nothing to worry about 🙂
- June 10, 2013 at 13:26
- Reply

Creating a Data Management System : QUSMA says:

[…] a year ago I posted about writing my own backtesting platform. While it has been even more challenging than I thought it would be, it’s going well: about […]
- November 21, 2013 at 20:38
- Reply
Lore says:

I am creating something similar. My focus is not heavy calcs for backtesting, that is an one-off thing I can easily do in xls, my goal is a) being able to monitor several indicators and spreads b)being ale to download whatever data I need to lookup a trade-idea. A sort of Blmb replacement. Trying to keep things simple I am javascript+SQl based. So far all on GoogleCloud.

Do you download daily data 1 row every day? Not to create data holes if the dbase is not running, or if the datasource has changed or some error happened is probably my biggest concern and do not really have a convincing approach for that.
- January 21, 2014 at 10:46
- Reply

qusma says:

Sounds interesting, though of course the route you’re going with is quite limited in terms of computational power in case you want to do something more complex later on.

I do update the data every day to get new bars, generally there’s no problem with missing data. As long as you have some sort of approach to determine if a day is a business day or not (pretty simple because holidays are relatively few) you can easily make checks for missing data.
- January 21, 2014 at 18:28
- Reply

Indicator says:

Any chance you could make code available?
- March 8, 2014 at 07:43
- Reply
asd says:

what do you use for UML diagram?
- August 31, 2014 at 22:47
- Reply

Blueprint for a Backtesting and Trading Software Suite

Related posts:

Comments (12)

RMK says:

qusma says:

JBK says:

qusma says:

Michael Halls-Moore says:

qusma says:

Michael Halls-Moore says:

Creating a Data Management System : QUSMA says:

Lore says:

qusma says:

Indicator says:

asd says:

Leave a Reply Cancel reply