Before you read this post, read the (Wagner award winning) Know Your System! – Turning Data Mining from Bias to Benefit through System Parameter Permutation by Dave Walton.
The concept is essentially to use all the results from a brute force optimization and pick the median as the best estimate of out of sample performance. The first step is:
Parameter scan ranges for the system concept are determined by the system developer.
And herein lies the main problem. The scan range will determine the median. If the range is too wide, the estimate will be too low and based on data that is essentially irrelevant because the trader would never actually pick that combination of parameters. If the range is too narrow, the entire exercise is pointless. But the author provides no way of picking the optimal range a priori (because no such method exists). And of course, as is mentioned in the paper, repeated applications of SPP with different ranges is problematic.
To illustrate, let’s use my UDIDSRI post from October 2012. The in sample period will be the time before that post and the “out of sample” period will be the time after it; the instrument is QQQ, and the strategy is simply to go long at the close when UDIDSRI is below X (the value on the x-axis below).
As you can see, the relationship between next-day returns and UDIDSRI is quite stable. The out of sample returns are higher across most of the range but that’s just an artifact of the giant bull market in the out of sample period. What would have been the optimal SPP range in October 2012? What is the optimal SPP range in hindsight? Would the result have been useful? Ask yourself these questions for each chart below.
Let’s have a look at SPY:
Whoa. The optimum has moved to < 0.05. Given a very wide range, SPP would have made a correct prediction in this case. But is this a permanent shift or just a result of a small sample size? Let’s see the results for 30 equity ETFs1:
Well, that’s that. What about SPP in comparison to other methods?
The use of all available market data enables the best approximation of the long-run so the more market data available, the more accurate the estimate.
This is not the case. The author’s criticism of CV is that it makes “Inefficient use of market data”, but that’s a bad way of looking at things. CV uses all the data (just not in “one go”) and provides us with actual estimates of out of sample performance, whereas SPP just makes an “educated guess”. A guess that is 100% dependent on an arbitrarily chosen parameter range. Imagine, for example, two systems: one has stable optimal parameters over time, while the other one does not. The implications in terms of out of sample performance are obvious. CV will accurately show the difference between the two, while SPP may not. Depending on the range chosen, SPP might severely under-represent the true performance of the stable system. There’s a lot of talk about “regression to the mean”, but what mean is that?
SPP minimizes standard error of the mean (SEM) by using all available market data in the historical simulation.
This is true, but again what mean? The real issue isn’t the error of the estimate, it’s whether you’re estimating the right thing in the first place. CV’s data splitting isn’t an arbitrary mistake done to increase the error! There’s a point, and that is measuring actual out of sample performance given parameters that would actually have been chosen.
tl;dr: for some systems SPP is either pointless or just wrong. For some other classes of systems where out of sample performance can be expected to vary across a range of parameters, SPP will probably produce reasonable results. Even in the latter case, I think you’re better off sticking with CV.Footnotes
- ASEA, DXJ, EEM, EFA, EIDO, EPP, EWA, EWC, EWD, EWG, EWH, EWI, EWJ, EWL, EWM, EWP, EWQ, EWS, EWT, EWU, EWY, EZA, FXI, ILF, IWM, QQQ, SPY, THD, VGK, VT