Before you read this post, read the (Wagner award winning) Know Your System! – Turning Data Mining from Bias to Benefit through System Parameter Permutation by Dave Walton.
The concept is essentially to use all the results from a brute force optimization and pick the median as the best estimate of out of sample performance. The first step is:
Parameter scan ranges for the system concept are determined by the system developer.
And herein lies the main problem. The scan range will determine the median. If the range is too wide, the estimate will be too low and based on data that is essentially irrelevant because the trader would never actually pick that combination of parameters. If the range is too narrow, the entire exercise is pointless. But the author provides no way of picking the optimal range a priori (because no such method exists). And of course, as is mentioned in the paper, repeated applications of SPP with different ranges is problematic.
To illustrate, let’s use my UDIDSRI post from October 2012. The in sample period will be the time before that post and the “out of sample” period will be the time after it; the instrument is QQQ, and the strategy is simply to go long at the close when UDIDSRI is below X (the value on the x-axis below).
As you can see, the relationship between next-day returns and UDIDSRI is quite stable. The out of sample returns are higher across most of the range but that’s just an artifact of the giant bull market in the out of sample period. What would have been the optimal SPP range in October 2012? What is the optimal SPP range in hindsight? Would the result have been useful? Ask yourself these questions for each chart below.
Let’s have a look at SPY:
Whoa. The optimum has moved to < 0.05. Given a very wide range, SPP would have made a correct prediction in this case. But is this a permanent shift or just a result of a small sample size? Let’s see the results for 30 equity ETFs1:
Well, that’s that. What about SPP in comparison to other methods?
The use of all available market data enables the best approximation of the long-run so the more market data available, the more accurate the estimate.
This is not the case. The author’s criticism of CV is that it makes “Inefficient use of market data”, but that’s a bad way of looking at things. CV uses all the data (just not in “one go”) and provides us with actual estimates of out of sample performance, whereas SPP just makes an “educated guess”. A guess that is 100% dependent on an arbitrarily chosen parameter range. Imagine, for example, two systems: one has stable optimal parameters over time, while the other one does not. The implications in terms of out of sample performance are obvious. CV will accurately show the difference between the two, while SPP may not. Depending on the range chosen, SPP might severely under-represent the true performance of the stable system. There’s a lot of talk about “regression to the mean”, but what mean is that?
SPP minimizes standard error of the mean (SEM) by using all available market data in the historical simulation.
This is true, but again what mean? The real issue isn’t the error of the estimate, it’s whether you’re estimating the right thing in the first place. CV’s data splitting isn’t an arbitrary mistake done to increase the error! There’s a point, and that is measuring actual out of sample performance given parameters that would actually have been chosen.
tl;dr: for some systems SPP is either pointless or just wrong. For some other classes of systems where out of sample performance can be expected to vary across a range of parameters, SPP will probably produce reasonable results. Even in the latter case, I think you’re better off sticking with CV.Footnotes
- ASEA, DXJ, EEM, EFA, EIDO, EPP, EWA, EWC, EWD, EWG, EWH, EWI, EWJ, EWL, EWM, EWP, EWQ, EWS, EWT, EWU, EWY, EZA, FXI, ILF, IWM, QQQ, SPY, THD, VGK, VT[↩]
Daily Wrap for 5/1/2014 | The Whole Street says:
[…] ………. […]
Your criticisms of the paper are valid, but the same arguments also apply to CV, estimates can change drastically depending on how the data is split. The papers use of ‘regression to the mean’ in my reading simply means the system performance is more likely to be closer to the mean performance (or median in this case), as opposed to the outliers, agreed, this is an odd usage of the term, but by no means opaque.
All of this stuff just avoids the real issue, which is that at some point common sense has to be invoked. The best systems are robust over parameter ranges which have to be picked from experience and a priori knowledge, there is no way of avoiding this. As soon as you’re getting into the position of selecting ‘islands’ of performance, that’s a good time to stop.
>estimates can change drastically depending on how the data is split.
Well, that depends on your sample size. With enough observations and a reasonable number of folds, results will generally be stable.
My issue with the “regression to the mean” is that the paper implies that the median of the optimization trials is somehow necessarily connected to the expected out of sample performance, but this is never justified and in many cases it’s completely untrue (again, depending on the range chosen).
I completely agree about the common sense, but common sense seems to be exceedingly uncommon.
>The best systems are robust over parameter ranges which have to be picked from experience and a priori knowledge, there is no way of avoiding this. As soon as you’re getting into the position of selecting ‘islands’ of performance, that’s a good time to stop.
I guess the problem is that market dynamics change. This SPP assumes there is some correspondence between changes in market dynamics and market parameters which must be proved and as far as DMB it is a way of begging the question actually. Note this statement:
“In order to generate sampling distributions of system variant performance metrics,
the set of parameter ranges under which the trading system is expected to function is
determined ex ante in preparation for optimization. Methods to choose the parameter
ranges and observation points are beyond the scope of this paper;”
This is what can introduce DMB. What does it mean “under which the trading system is expected to function”? This is DMB because if the range is selected and the system does not function as expected you have to change the range, right? The authors admits this will be the case. What about if the range does not capture future market conditions?
Anyway, I think CV also does not avoid DMB as noted in this article: http://www.priceactionlab.com/Blog/2012/06/fooled-by-randomness-through-selection-bias/
>I guess the problem is that market dynamics change.
Indeed, changing market dynamics is one side of the coin. The other is that the optimization is simply an estimate of the optimal parameters, an estimate that comes with error.
The core issue, I think, is that some market dynamics are more stable than others. And so some models will have more stable optimal parameters than others. This is the strength of CV: it can provide very good estimates of actual OOS performance, and will differentiate between “stable” and “unstable” models in a way that SPP cannot.
>This is DMB because if the range is selected and the system does not function as expected you have to change the range, right?
re: Harris, of course if you test “billions” of equity curves you already fucked up. As cheesefunnel said below, at some point you must invoke common sense. CV _does_ protect against DMB, but (as Harris mentions in point #5) as all methods it can’t do anything against data snooping bias (i.e. using the result of previous optimizations to guide future ones). Repeated application of CV until you find something good is obviously problematic. It’s still a useful tool, I think.
For the SPY chart you say that “Whoa. The optimum has moved to < 0.05." What do you mean by that? What "optimum"?
Is SPP actually determining the optmium parameters or just whether a system is sound in general? This is a bit confusing. Thanks,
A Few Notes on System Parameter Permutation | Supernova Capital says:
[…] Whoa. The optimum has moved to < 0.05. Given a very wide range, SPP would have made a correct prediction in this case. But is this a permanent shift or just a result of a small sample size? Let’s see the results for 30 equity ETFs1: […]
My criticism of the SPP:
1. It doesn’t tell you which parameters to use in actual trading;
2. If there are many parameters, exhaustive optimization may not be tractable;
3. If there are too few parameters, the SPP stats don’t have sufficient significance;
4. The SPP isn’t applicable to adaptive/learning systems which derive their parameters from market data.
Finally, the fact that such a paper gets top NAAIM award, makes their competence questionable for me.
Excellent. Good information in this post is useful for each user. سایپا–نمایندگی سایپا
It has high performance. Thanks to this post.لامپ کم مصرف–کرکره برقی–راهبند اتوماتیک
A Few Comments About System Parameter Permutation | Price Action Lab Blog says:
[…] you can find an article about SPP in the QUSMA blog. There are some interesting comments at the […]