A Few Notes on System Parameter Permutation

Before you read this post, read the (Wagner award winning) Know Your System! – Turning Data Mining from Bias to Benefit through System Parameter Permutation by Dave Walton.

The concept is essentially to use all the results from a brute force optimization and pick the median as the best estimate of out of sample performance. The first step is:

Parameter scan ranges for the system concept are determined by the system developer.

And herein lies the main problem. The scan range will determine the median. If the range is too wide, the estimate will be too low and based on data that is essentially irrelevant because the trader would never actually pick that combination of parameters. If the range is too narrow, the entire exercise is pointless. But the author provides no way of picking the optimal range a priori (because no such method exists). And of course, as is mentioned in the paper, repeated applications of SPP with different ranges is problematic.

To illustrate, let’s use my UDIDSRI post from October 2012. The in sample period will be the time before that post and the “out of sample” period will be the time after it; the instrument is QQQ, and the strategy is simply to go long at the close when UDIDSRI is below X (the value on the x-axis below).

As you can see, the relationship between next-day returns and UDIDSRI is quite stable. The out of sample returns are higher across most of the range but that’s just an artifact of the giant bull market in the out of sample period. What would have been the optimal SPP range in October 2012? What is the optimal SPP range in hindsight? Would the result have been useful? Ask yourself these questions for each chart below.

udidsri qqq in out sample performance

Let’s have a look at SPY:

udidsri spy in out sample performance

Whoa. The optimum has moved to < 0.05. Given a very wide range, SPP would have made a correct prediction in this case. But is this a permanent shift or just a result of a small sample size? Let’s see the results for 30 equity ETFs1:

udidsri all etfs in out sample performance

Well, that’s that. What about SPP in comparison to other methods?

The use of all available market data enables the best approximation of the long-run so the more market data available, the more accurate the estimate.

This is not the case. The author’s criticism of CV is that it makes “Inefficient use of market data”, but that’s a bad way of looking at things. CV uses all the data (just not in “one go”) and provides us with actual estimates of out of sample performance, whereas SPP just makes an “educated guess”. A guess that is 100% dependent on an arbitrarily chosen parameter range. Imagine, for example, two systems: one has stable optimal parameters over time, while the other one does not. The implications in terms of out of sample performance are obvious. CV will accurately show the difference between the two, while SPP may not. Depending on the range chosen, SPP might severely under-represent the true performance of the stable system. There’s a lot of talk about “regression to the mean”, but what mean is that?

SPP minimizes standard error of the mean (SEM) by using all available market data in the historical simulation.

This is true, but again what mean? The real issue isn’t the error of the estimate, it’s whether you’re estimating the right thing in the first place. CV’s data splitting isn’t an arbitrary mistake done to increase the error! There’s a point, and that is measuring actual out of sample performance given parameters that would actually have been chosen.

tl;dr: for some systems SPP is either pointless or just wrong. For some other classes of systems where out of sample performance can be expected to vary across a range of parameters, SPP will probably produce reasonable results. Even in the latter case, I think you’re better off sticking with CV.

Footnotes
  1. ASEA, DXJ, EEM, EFA, EIDO, EPP, EWA, EWC, EWD, EWG, EWH, EWI, EWJ, EWL, EWM, EWP, EWQ, EWS, EWT, EWU, EWY, EZA, FXI, ILF, IWM, QQQ, SPY, THD, VGK, VT[]

2013: Lessons Learned and Revisiting Some Studies

The year is over in a few hours and I thought it would be nice to do a quick review of the year, revisit some studies and the most popular posts of the year, as well as share some thoughts on my performance in 2013 and my goals for 2014.

Revisiting Old Studies

IBS

IBS did pretty badly in 2012, and didn’t manage to reach the amazing performance of 2007-2010 this year either. However, it still worked reasonably well: IBS < 0.5 led to far higher returns than IBS > 0.5, and the highest quarter had negative returns. It still works amazingly well as a filter. Most importantly the magnitude of the effect has diminished. This is partly due to the low volatility we’ve seen this year. After all IBS does best when movements are large, and SPY’s 10-day realized volatility never even broke 20% this year. Here are the stats:

ibs

UDIDSRI

The original post can be found here. Performance in 2013 hasn’t been as good as in the past, but was still reasonably OK. I think the results are, again, at least partially due to the low volatility environment in equities this year.

UDIDSRI performance, close-to-close returns after an zero reading.

UDIDSRI performance, close-to-close returns after a zero reading.

DOTM seasonality

I’ve done 3 posts on day of the month seasonality (US, EU, Asia), and on average the DOTM effect did its job this year. There are some cases where the top quarter does not have the top returns, but a single year is a relatively small sample so I doubt this has any long-term implications. Here are the stats for 9 major indices:

dotm

Day of the month seasonality in 2013

VIX:VXV Ratio

My studies on the implied volatility indices ratio turned out to work pretty badly. Returns when the VIX:VXV ratio was 5% above the 10-day SMA were -0.03%. There were no 200-day highs in the ratio in 2013!

 

Performance

Overall I would say it was a mixed bag for me this year. Returns were reasonably good, but a bit below my long-term expectations. It was a very good year for equities, and my results can’t compete with SPY’s 5.12 MAR ratio, which makes me feel pretty bad. Of course I understand that years like this one don’t represent the long-term, but it’s annoying to get beaten by b&h nonetheless.

Some strategies did really well:

stratgoodOthers did really poorly:
stratbad

The Good

Risk was kept under control and entirely within my target range, both in terms of volatility and maximum drawdown. Even when I was at the year’s maximum drawdown I felt comfortable…there is still “psychological room” for more leverage. Daily returns were positively skewed. My biggest success was diversifying across strategies and asset classes. A year ago I was trading few instruments (almost exclusively US equity ETFs) with a limited number of strategies. Combine that with a pretty heavy equity tilt in the GTAA allocation, and my portfolio returns were moving almost in lockstep with the indices (there were very few shorting opportunities in this year’s environment, so the choice was almost always between being long or in cash). Widening my asset universe combined with research into new strategies made a gigantic difference:

beta

The Bad

I made a series of mistakes that significantly hurt my performance figures this year. Small mistakes pile on top of each other and in the end have a pretty large effect. All in all I lost several hundred bp on these screw-ups. Hopefully you can learn from my errors:

  • Back in March I forgot the US daylight savings time kicks in earlier than it does here in Europe. I had positions to exit at the open and I got there 45 minutes late. Naturally the market had moved against me.
  • A bug in my software led to incorrectly handling dividends, which led to signals being calculated using incorrect prices, which led to a long position when I should have taken a short. Taught me the importance of testing with extreme caution.
  • Problems with reporting trade executions at an exchange led to an error where I sent the same order twice and it took me a few minutes to close out the position I had inadvertently created.
  • I took delivery on some FX futures when I didn’t want to, cost me commissions and spread to unwind the position.
  • Order entry, sent a buy order when I was trying to sell. Caught it immediately so the cost was only commissions + spread.
  • And of course the biggest one: not following my systems to the letter. A combination of fear, cowardice, over-confidence in my discretion, and under-confidence in my modeling skills led to some instances where I didn’t take trades that I should have. This is the most shameful mistake of all because of its banality. I don’t plan on repeating it in 2014.

Goals for 2014

  • Beat my 2013 risk-adjusted returns.
  • Don’t repeat any mistakes.
  • Make new mistakes! But minimize their impact. Every error is a valuable learning experience.
  • Continue on the same path in terms of research.
  • Minimize model implementation risk through better unit testing.

 

Most Popular

Finally, the most popular posts of the year:

  1. The original IBS post. Read the paper instead.
  2. Doing the Jaffray Woodriff Thing. I still need to follow up on that…
  3. Mining for Three Day Candlestick Patterns, which also spawned a short series of posts.

 

I want to wish you all a happy and profitable 2014!

Volatility-Based Position Sizing of SPY Swing Trades: Realized vs VIX vs GARCH

A simple post on position sizing, comparing three similar volatility-based approaches. In order test the different sizing techniques I’ve set up a long-only strategy applied to SPY, with 4 different signals:

On top of that sits an IBS filter, allowing long positions only when IBS is below 50%. A position is taken if any of the signals is triggered. Entries and exits at the close of the day, no stops or targets. Results include commissions of 1 cent per share.

Sizing based on realized volatility uses the 10-day realized volatility, and then adjusts the size of the position such that, if volatility remains unchanged, the portfolio would have an annualized standard deviation of 17%. The fact that the strategy is not always in the market decreases volatility, which is why to get close to the ~11.5% standard deviation of the fixed fraction sizing we need to “overshoot” by a fair bit.

The same idea is used with the GARCH model, which is used to forecast volatility 3 days ahead. That value is then used to adjust size. And again the same concept is used with VIX, but of course option implied volatility tends to be greater than realized volatility, so we need to overshoot by even more, in this case to 23%.

Let’s take a look at the strategy results with the simplest sizing approach (allocating all available capital):

fixed fraction

Top panel: equity curve. Middle panel: drawdown. Bottom panel: leverage.

Returns are the highest during volatile periods, and so are drawdowns. This results in an uneven equity curve, and highly uneven risk exposure. There is, of course, no reason to let the market decide these things for us. Let’s compare the fixed fraction approach to the realized volatility- and VIX-based sizing approaches:

comparison

These results are obviously unrealistic: nobody in their right mind would use 600% leverage in this type of trade. A Black Monday would very simply wipe you out. These extremes are rather infrequent, however, and leverage can be capped to a lower value without much effect.

With the increased leverage comes an increase in average drawdown, with >5% drawdowns becoming far more frequent. The average time to recovery is also slightly increased. Given the benefits, I don’t see this as a significant drawback. If you’re willing to tolerate  a 20% drawdown, the frequency of 5% drawdowns is not that important.

On the other hand, the deepest drawdowns naturally tend to come during volatile periods, and the decrease of leverage also results in a slight decrease of the max drawdown. Returns are also improved, leading to better risk-adjusted returns across the board for the volatility-based sizing approaches.

The VIX approach underperforms, and the main reason is obviously that it’s not a good measure of expected future volatility. There is also the mismatch between the VIX’s 30-day horizon and the much shorter horizon of the trades. GARCH and realized volatility result in very similar sizing, so the realized volatility approach is preferable due to its simplicity.

stats

Blueprint for a Backtesting and Trading Software Suite

Posting has been slow lately because I’ve been busy with a bunch of other stuff, including the CFA Level 3 exam last weekend. I’ve also begun work on a very ambitious project: a fully-featured all-in-one backtesting and live trading suite, which is what prompted this post.

Over the last half year or so I’ve been moving toward more complex tools (away from excel, R, and MATLAB), and generally just writing standalone backtesters in C# for every concept I wanted to try out, only using Multicharts for the simplest ideas. This approach is, of course, incredibly inefficient, but the software packages available to “retail” traders are notoriously horrible, and I have nowhere near the capital I’d need to afford “real” tools like QuantFACTORY or Deltix.

The good thing about knowing how to code is that if a tool doesn’t exist you can just write it, and that’s exactly what I’m doing. Proper portfolio-level backtesting and live trading that’ll be able to easily do everything from intraday pairs trading to long term asset allocation and everything in-between, all under the same roof. On the other hand it’s also tailored to my own needs, and as such contains no plans for things like handling fundamental data. Most importantly it’s my dream research platform that’ll let me go from idea, to robust testing & optimization, to implementation very quickly. Here’s what the basic design looks like:

QBS-design-chart

What’s the point of posting about it? I know there are many other people out there facing the same issues I am, so hopefully I can provide some inspiration and ideas on how to solve them. Maybe it’ll prompt some discussion and idea-bouncing, or perhaps even collaboration.

Most of the essential stuff has already been laid down, so basic testing is already possible. A simple example based on my previous post can showcase some essential features. Below you’ll find the code behind the PatternFinder indicator, which uses the Accord.NET library’s k-d tree and k nearest neighbor algorithm implementation to do candlestick pattern searches as discussed here. Many elements are specific to my system, but the core functionality is trivially portable if you want to borrow it.

Note the use of attributes to denote properties as inputs, and set their default values. Options can be serialized/deserialized for easy storage in files or a database. Priority settings allow the user to specify the order of execution, which can be very important in some cases. Indexer access works with [0] being the current bar, [1] being the previous bar, etc. Different methods for historical and real time bars allow for a ton of optimization to speed up processing when time is scarce, though in this case there isn’t much that can be done.

showcase_classes

The VariableSeries class is designed to hold time series, synchronize them across the entire parent object, prevent data snooping, etc. The Indicator and Signal classes are all derived from VariableSeries, which is the basis for the system’s modularity. For example, in the PatternFinder indicator, OHLC inputs can be modified by the user through the UI, e.g. to make use of the values of an indicator rather than the instrument data.

showcase_options

The backtesting analysis stuff is still in its early stages, but again the foundations have been laid. Here are some stats using a two-day PatternFinder combined with IBS, applied on SPY:

showcase_patternfinderresults2[1]

Here’s the first iteration of the signal analysis interface. I have added 3 more signals to the backtest: going long for 1 day at every 15 day low close, the set-up Rob Hanna posted yesterday over at Quantifiable Edges (staying in for 5 days after the set-up appears), and UDIDSRI. The idea is to be able to easily spot redundant set-ups, find synergies or anti-synergies between signals, and easily get an idea of the marginal value added by any one particular signal.

showcase_signals

And here’s some basic Monte Carlo simulation stuff, with confidence intervals for cumulative returns and PDF/CDF of the maximum drawdown distribution:

showcase_mc

Here’s the code for the PatternFinder indicator. Obviously it’s written for my platform, but it should be easily portable. The “meat” is all in CalcHistorical() and GetExpectancy().

/// <summary>
/// K nearest neighbor search for candlestick patterns
/// </summary>
public class PatternFinder : Indicator
{
    [Input(3)]
    public int PatternLength { get; set; }

    [Input(75)]
    public int MatchCount { get; set; }

    [Input(2000)]
    public int MinimumWindowSize { get; set; }

    [Input(false)]
    public bool VolatilityAdjusted { get; set; }

    [Input(false)]
    public bool Overnight { get; set; }

    [Input(false)]
    public bool WeighExpectancyByDistance { get; set; }

    [Input(false)]
    public bool Classification { get; set; }

    [Input(0.002)]
    public double ClassificationLimit { get; set; }

    [Input("Euclidean")]
    public string DistanceType { get; set; }

    [SeriesInput("Instrument.Open")]
    public VariableSeries<decimal> Open { get; set; }

    [SeriesInput("Instrument.High")]
    public VariableSeries<decimal> High { get; set; }

    [SeriesInput("Instrument.Low")]
    public VariableSeries<decimal> Low { get; set; }

    [SeriesInput("Instrument.Close")]
    public VariableSeries<decimal> Close { get; set; }

    [SeriesInput("Instrument.AdjClose")]
    public VariableSeries<decimal> AdjClose { get; set; }

    private VariableSeries<double> returns;
    private VariableSeries<double> stDev;
    private KDTree<double> _tree;

    public PatternFinder(QSwing parent, string name = "PatternFinder", int BarsCount = 1000)
        : base(parent, name, BarsCount)
    {
        Priority = 1;
        returns = new VariableSeries<double>(parent, BarsCount);
        stDev = new VariableSeries<double>(parent, BarsCount) { DefaultValue = 1 };
    }

    internal override void Startup()
    {
        _tree = new KDTree<double>(PatternLength * 4 - 1);
        switch (DistanceType)
        {
            case "Euclidean":
                _tree.Distance = Accord.Math.Distance.Euclidean;
                break;
            case "Absolute":
                _tree.Distance = AbsDistance;
                break;
            case "Chebyshev":
                _tree.Distance = Accord.Math.Distance.Chebyshev;
                break;
            default:
                _tree.Distance = Accord.Math.Distance.Euclidean;
                break;
        }
    }

    public override void CalcHistorical()
    {
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar < PatternLength + 1) return;

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());

        double ret = Overnight ? (double)(Open[0] / Close[1] - 1) : (double)(AdjClose[0] / AdjClose[1] - 1);
        double adjret = ret / stDev[0];

        if (Classification)
            _tree.Add(GetCoords(1), adjret > ClassificationLimit ? 1 : 0);
        else
            _tree.Add(GetCoords(1), adjret);
    }

    public override void CalcRealTime()
    {
        if (VolatilityAdjusted && CurrentBar > 0)
            returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1);

        if (VolatilityAdjusted && CurrentBar > 11)
            stDev.Value = returns.StandardDeviation(10);

        if (CurrentBar > MinimumWindowSize)
            Value = GetExpectancy(GetCoords());
    }

    private double GetExpectancy(double[] coords)
    {
        if (!WeighExpectancyByDistance)
            return _tree.Nearest(coords, MatchCount).Average(x => x.Node.Value) * stDev[0];
        else
        {
            var nodes = _tree.Nearest(coords, MatchCount);
            double totweight = nodes.Sum(x => 1 / Math.Pow(x.Distance, 2));
            return nodes.Sum(x => x.Node.Value * ((1 / Math.Pow(x.Distance, 2)) / totweight)) * stDev[0];
        }
    }

    private static double AbsDistance(double[] x, double[] y)
    {
        return x.Select((t, i) => Math.Abs(t - y[i])).Sum();
    }

    private double[] GetCoords(int offset = 0)
    {
        double[] coords = new double[PatternLength * 4 - 1];
        for (int i = 0; i < PatternLength; i++)
        {
            coords[4 * i] = (double)(Open[i + offset] / Close[i + offset]);
            coords[4 * i + 1] = (double)(High[i + offset] / Close[i + offset]);
            coords[4 * i + 2] = (double)(Low[i + offset] / Close[i + offset]);

            if (i < PatternLength - 1)
                coords[4 * i + 3] = (double)(Close[i + offset] / Close[i + 1 + offset]);
        }
        return coords;
    }
}

Coming up Soon™: a series of posts on cross validation, an in-depth paper on IBS, and possibly a theory-heavy paper on the low volatility effect.

The Predictive Value of the Number and Magnitude of Recent Up/Down Days: UDIDSRI

Rummaging through my bottomless “TO DO” list, I found this little comment:

# of up/dn days in period, then re-scale that using percentrank….with net % move?

An interesting way to spend Sunday afternoon, and another blog post!

After playing around with the concept for a while, I ended up creating a simple indicator that, as it turns out, is impressively good at picking out bottoms1, and has very strong 1-day and acceptable medium-term predictive power. In an attempt to come up with the most awkward-sounding acronym possible, I decided on the name “Up/Down and Intensity Day Summation Rank Indicator”, or UDIDSRI. Here’s what I did:

The first iteration

I started out with a very simple concept:

  • If the day closes up, movement = 1, otherwise movement = -1.
  • Sum movement over the last 20 days.
  • UDIDSRI is the % rank of today’s sum, compared to the last 50 days of sums.

The case that presents the most interest is when UDIDSRI is equal to zero (i.e. the lowest value in 50 days), and we’ll have a look at how this works further down. I felt that this indicator could be significantly improved by adding a bit of nuance. Not all down days are equal, so I thought it would be a good idea to take into account the magnitude of the moves as well as their direction.

The second iteration

The second version of the algorithm:

  • If the day closes up, movement = 1, otherwise movement = -1.
  • Multiply movement by (1 +abs( return))5
  • Sum the movements for the last 20 days.
  • UDIDSRI is the % rank of today’s sum, compared to the last 50 days of sums.

The choice of the 5th power is completely arbitrary and un-optimized (as are the 20-day summation, and 50-day ranking) and can probably be optimized for better performance.

Here’s a chart comparing the two versions on the last few months of SPY (yellow is 1st iteration, red is 2nd):

UDIDSRI comparison chart

You can clearly see that the 2nd iteration doesn’t like to stay at 0 for so long, and tends to respond a bit faster to movements. As such, the 2nd iteration gives far fewer signals, but they’re of much higher quality. I’ll be using the 2nd version for the rest of this post.

UDIDSRI iteration comparison

Note that this approach is completely useless for going short. The indicator hitting its maximum value provides no edge either for mean reversion or trend following.

A quick test around the globe to make sure I wasn’t curve fitting:

all country ETF statistics

That turned out rather well…

Thus far we have only looked at the short-term performance of UDIDSRI. Let’s see how it does over the medium term after hitting 0:

medium term UDIDSRI performance

There seems to be a period of about 30 trading days after UDIDSRI hits 0, during which you can expect above-average returns. Let’s take a look at a strategy that crudely exploits this:

  • Buy SPY at close if UDIDSRI = 0.
  • Use a 2% trailing stop (close-to-close) to exit.

The trailing stop makes us exit quickly if we haven’t entered at a bottom, but stays with the trend if the entry is good. Here are the stats and equity curve for this strategy applied to SPY, using $100k per trade, without commissions or slippage:

medium term UDIDSRI SPY equity curve

medium term UDIDSRI SPY statistics

Finally here are some trades this strategy would have taken in 2011 and early 2012:

medium term UDIDSRI SPY statistics

The most significant problem with the tight trailing stop is that it exits at pullbacks instead of tops (which is particularly painful during heavy bear markets), so one easy improvement would be to add an indicator for over-extension and use that to time the exit. But I’ll leave that as homework.

All in all I’m quite satisfied with the UDIDSRI. I was really surprised at how it manages to pick bottoms with such accuracy, and I will definitely add it to the repertoire of signals that I use for swing trading.

If you want to play with the UDIDSRI yourself, I have uploaded an excel worksheet as well as the indicator and signal for MultiCharts .NET.

Footnotes
  1. For SPY, UDIDSRI gave signals on both the 2002 and 2009 lowest days[]