I’ve been thinking a lot about candlestick patterns lately but grew tired of trying to generate ideas and instead decided to mine for them. I must confess I didn’t expect much from such a simplistic approach, so I was pleasantly surprised to see it working well. Unfortunately I wasn’t able to discover any short set-ups. The general bias of equity markets toward the upside makes it difficult to find enough instances of patterns that are followed by negative returns.
The idea is to mine past data for similar 3 day patterns, and then use that information to make trading decisions. There are several choices we must make:
- The size of the lookback window. I use an expanding window that starts at 2000 days.
- Once we find similar patterns, how do we choose which ones to use?
- How do we measure the similarity between the patterns?
To fully describe a three day candlestick pattern we need 11 numbers. The close-to-close percentage change from day 1 to day 2, and from day 2 to day 3, as well as the positions of the open, high, and low relative to the close for each day.
To measure the degree of similarity between any two 3-day patterns, I tried both the sum of absolute differences and the sum of the squared differences between those 11 numbers; the results were quite similar. It would be interesting to try to optimize individual weights for each number, as I imagine some are more important than others.
The final step is to select a number of the closest patterns we find, and simply average their next-day returns to arrive at an expected return.
How do we choose which patterns are “close enough” to use? Choose too few and the sample will be too small. Choose too many and you risk using irrelevant data. That’s a number that we’ll have to optimize.
When comparing the results we also run into another problem: the smaller the sample, the more spread out the expected return estimates will be, which means more trades will be chosen given a certain minimum limit for entry. My solution was to choose a different limit for trade entry, such that all sample sizes would generate the same number of days in the market (300 in this case). Here are the walk-forward results:
The trade-off between sample size and relevance is clear, and the “sweet spot” appears to be somewhere in the 50-150 range or so, for both the absolute difference and squared difference approaches. Depending on how selective you want to be, you can decrease the limit and trade off more trades for lower expected returns. For me, 30 bp is a reasonable area to aim for.
A nice little addition is to use IBS by filtering out any trades with IBS > 50%. Using squared differences, I select the 50 closest patterns. When their average next-day return is greater than 0.2%, a long position is taken. The results are predictably great:
The IBS filter removes close to 40% of days in the market yet maintains essentially the same CAGR, while also more than halving the maximum drawdown.
Let’s take a look at some of the actual patterns. Using squared differences, the 50 closest patterns, and a 0.2% limit, the last successful trade was on February 26, 2013. The expected return on that day was 0.307%. Here’s what those 3 days looked like, as well as the 5 closest historical patterns:
As you can see below, even the 50th closest pattern seems to be, based on visual inspection, rather close. The “main idea” of the pattern seems to be there:
Here are the stats from a bunch of different equity index ETFs, using square differences, the 50 closest patterns, 0.2% expected return limit and the IBS < 0.5 filter.
The 0.2% limit seems to be too low for some of them, producing too many trades. Perhaps setting an appropriate limit per-instrument would be a good idea.
The obvious path forward is to also produce 2-day, 4-day, 5-day, etc. versions, perhaps with optimized distance weighting and some outlier filtering, and combine them all in a nice little ensemble to get your predictions out of. The implementation is left as an exercise for the reader.
Bitcoin seems to be all the rage these days, and I’m jumping on the bandwagon. Quandl tweeted about their bitcoin data today so I decided I’d have a look at it. I have tested a bunch of popular/”standard” ideas, and the results aren’t really surprising, though they do illuminate the trend-y (bubbl-y) character of the bitcoin market. BTC prices do not revert like equities but show strong momentum, both in the short and medium term. IBS is useless, while trend following works like it does everywhere else. The (daily) data covers the period from 17/7/2010 to today.
Descriptive Stats
The mean simple daily return has been 1.012%, while the annualized standard deviation has been 121.70%. The distribution of returns is obviously fat-tailed (with a kurtosis of 8.62), though somewhat surprisingly (to me at least), slightly positively skewed (0.76).
Up/Down Streaks
Strong up streaks tend to be followed by high returns over the medium term, and there has been a surprisingly large number of these streaks given the small amount of data available.
IBS
IBS does not appear to have any predictive value when it comes to bitcoin returns.
RSI(3)
No mean reversion to be found here. Using a 3-period Cutler’s RSI, next-day bitcoin returns are 0.392% when RSI(3) is below 20, and 1.763% when it is above 80. The story is pretty much the same if you go for a medium term length for the RSI: high values beget high returns, with no mean reversion in sight.
Simple Trend Following
The strong trends that bitcoin has shown would have been very profitable to any trend followers. Going long at a new 50-day high close (with an exit at a new 25-day low close), and vice-versa for short positions, would have yielded these equity curves:
Day of the Week
Before you jump in, keep in mind that this sort of market can change character very quickly, especially after a big bubble pop. Also consider the fees: Mt. Gox, the most popular exchange, charges an obscene 120 basis points per roundtrip. There are some brokers that will allow you to short bitcoins, and there even appear to be some thinly-traded options and currency futures available…I imagine there are gigantic inefficiencies in the pricing of these instruments (though their legality is probably questionable).
1 Comment »Model risk is the risk that a model is, or will become, unable to perform the tasks it was designed to do. In terms of trading, this can be the risk that a set-up stops working, the risk that a variable loses its predictive power, etc. Ever-changing market conditions mean that model risk is a significant issue for most systematic traders: managing it is an integral part of adapting to new market environments.
Two heuristic rules are commonly used to handle this risk: drawdown-based position sizing, and a maximum drawdown cutoff. The former involves reducing exposure depending on drawdown (e.g. position sizes will be halved below 10% drawdown); the latter technique simply stops the strategy if it ever reaches a specified drawdown cutoff.
To investigate the efficacy of these rules, I’m going to use a simple Monte Carlo approach. The basic strategy has returns drawn from a normal distribution with mean 0.20% and standard deviation 1.5%. Model risk is represented by a small chance (0.05% per cycle) that the returns distribution will permanently change to having a mean of -0.05%.
The rules for dealing with the risk are as follows: equity curve-based position sizing will decrease positions by 25% if the drawdown is below 5%, and by 50% if the drawdown is below 10%. The cutoff simply stops trading if the drawdown ever reaches 25%.
Running 10,000 simulations with 1,000 steps each, the results are shown below:
The first thing to note is the obvious fact that, without model risk, these heuristics have a negative effect on risk-adjusted returns. Yes, maximum drawdowns are decreased on average, but at an unacceptable cost to returns. In the case of equity curve-based position sizing, the average drawdown is deeper and longer as well. The lesson should be obvious a priori but deserves to be stated anyway: if you are confident that a strategy will continue to work well in the future, you should abandon such rules.
Note that these results assume that the returns distribution remains constant; some real-world strategies such as trend following futures exhibit higher than average returns after drawdowns, so decreasing exposure at those times would be even more hurtful. The inverse may be true of other strategies.
Things change when we look at the results after including model risk. Both heuristics improve risk-adjusted returns, with the drawdown cutoff being particularly effective. While equity curve-based sizing improves on the vanilla case, it actually harms returns when combining it with the cutoff. This is presumably because the cutoff already takes care of all the failed strategies (and even more: while 38.5% of strategies failed, 42.6% of them hit the drawdown limit) and the variable sizing only serves to hurt the healthy ones.
Setting the drawdown limit for each particular strategy is a bit trickier. The maximum drawdown of a backtest should serve as a guide. This can be augmented either by assuming normal returns and using the results in On the Maximum Drawdown of a Brownian Motion, or through Monte Carlo simulation.
In the real world there are, of course, infinite states between the model working perfectly and it not working at all, so one must leave some room for deterioration and temporary changes by widening the cutoff point a bit. Finally, a more rigorous approach would perhaps use some sort of regime change detection and stop trading when the mean of the returns is determined to be below a hurdle, at a particular level of confidence.
No Comments »Jaffray Woodriff, who runs QIM, a highly successful systematic fund, has provided enough details about his data mining approach in various interviews (particularly the one in the excellent book Hedge Fund Market Wizards) that I think I can approximate it. Even though QIM has been lagging a bit the last few years, they have an excellent track record, so their approach is certainly worthy of imitation if possible. They trade commodities, currencies, etc. so the approach seems to be highly portable. And while they suffer from significant price impact issues (not to mention being forced into longer holding periods) due to their size, a small trader could probably do far better with the same strategies.
Introduction
The approach, as much as he has detailed it, goes as follows:
- Generate random data.
- Mine it for trading strategies.
- The best strategies resulting from the random data are now the benchmark that you have to beat using real data.
- Mine the real data, discard anything that isn’t better than the best models from the random data (this ensures that you have found an actual edge despite the excessive mining).
- Use cross validation to more accurately estimate the performance of the models and avoid curve fitting.
- Test the model out of sample, and retain it if it performs reasonably well compared to the in-sample results.
The point is essentially to generate an environment in which we know that we have no edge whatsoever, mine the data for the best possible results, and then use those as a benchmark that we have to clear in order to prove that an edge exists in the real data.
What they do after this is also quite interesting and important: checking the correlation between the newly discovered models and the models they already use. This ensures that any new “edge” they incorporate is a novel one and not simply a copy of something they already have. Supposedly this approach has yielded over 1500 different signals which they then use to trade, on medium-term horizons (if I remember correctly their average holding period is roughly one week). The issue of combining the predictions of 1500 signals into a decision to trade or not trade is beyond the scope of this post, but it’s a very interesting “ensemble model” problem.
It is clear that the approach requires not only rigorous statistical work, but also tons and tons of computing power (the procedure is highly parallelizable however, so you can just throw hardware at it to make it go faster). One potentially interesting way of tempering that requirement would be using genetic algorithms instead of brute force to search for new strategies. There are tricky issues with that approach, though: constructing the genome so that it can describe all possible trading models we want to look at, for example. How does one encode a wide array of chart patterns in a genome? There do not seem to be obvious/intuitive solutions.
Generating random data sets
There are several issues that have to be looked at here. Do we randomly sample the real data or do we use the parameters of that data and plug it into a known statistical distribution to generate completely new numbers? How many times do we repeat this procedure? In either case we are bound to lose some features of real financial time series, but this is probably a good thing since those features may result in true exploitable edges. It is important to generate a healthy number of data series. Some are simply going to be “better” than others for any one particular trading model, so testing over a single randomly generated series is not enough.
In general we want at least the semblance of a “real” data series. As such we can’t simply select random OHLC data; it would just result in a nonsensical time series with giant gaps all over the place. Instead I will use the following procedure:
- Start by selecting a random day’s OHLC points. This forms our first data point.
- Select any random day, and compute the day’s (close to close) percentage return from the previous day.
- Use this value to generate the next fake closing price.
- From that same (real) day, calculate the OHL prices in terms relative to the closing price.
- Use those relative prices to generate the fake OHL prices.
I find this approach gives rather good results, producing series that look realistic and give the appearance of trends, different volatility regimes, etc.
The models
Naturally I can’t test the billions upon billions of models that they test at QIM, and taking the model-agnostic approach is currently beyond my abilities. I can kind-of get around the issue by testing a very narrow range of models: moving average crossovers (another simple and interesting thing to test would be 1/2 day candlestick patterns). This still leaves a significant number of parameters to test:
- The type of moving average to use (simple, exponential, or Hull)
- The length of each moving average.
- The values that the moving averages will be based on (open, high, low, or close).
- The holding period. I’ll be using a technical entry, but a partially time-based exit. This may or may not be a good idea, but I’m running with it.
- Trend-following vs contrarian (i.e. trade in the direction of the “fast” moving average or against it).
Evaluating the results
An important question remains: what metric do we use to evaluate the models? The use of cross validation presents unique problems in performance measurement, and we have to take these into account from this stage, because these results will be used for comparison to the real ones later on.
Drawdown is a problematic measure because drawdown extremes tend to be rare. When dividing a set into N folds for cross validation, a set of parameters may be rejected simply because a certain period generated a high drawdown, despite this drawdown being consistent with long-term expectations.
Another issue arises with the use of annualized returns: they may be rather meaningless if the signal fires very frequently. If what we care about is short-term predictability, it may be more prudent to look at average daily returns after a signal, instead of CAGR. This could also be ameliorated by taking trading costs into account, as weak but frequent signals would be filtered out.
In the end, many of these decisions depend on the trader’s choice of style. Every trader must decide for him or her self what risks they care about, and in what proportion to each other. As an attempt at a balanced performance metric, I will be using my Trading System Consistency, Drawdown, Return Asymmetry, Volatility, and Profit Factor Combination Metric (or TRASYCODRAVOPFACOM for short), which is calculated as follows:
St. Dev. is the annualized standard deviation of daily returns, and the profit factor is calculated based on daily returns.
The TRASYCODRAVOPFACOM still has weaknesses: a set of parameters may pick only a tiny amount of trades over the years. If they’re successful enough, it can lead to a high score but a useless signal. To avoid this I’ll also be setting the minimum number of trades to 100, a reasonable hurdle given the 17 years long sample.
The random return results
Using a brute force approach, I collected approximately 704,000 results from 5 randomly generated series. It took several hours on my overclocked i5-2500K, so it’s definitely not a viable “real-world” approach (I am a terrible programmer, so some of the slowness is of my own making). The results look like you’d expect them to, with a few outliers at the top and to bottom:
Here are the best values achieved:
Note that this isn’t a “universal” hurdle: it’s a hurdle for this specific subset of moving average signals, on the GBPUSD pair. I am certain that a wide array of signals and data would generate higher hurdles.
Genetic Algorithm?
Brute force takes ages, even for just 5 return series, which is far too low to draw any conclusions. Are there any faster ways than brute force to find the best possible results from our random data? If this were a “normal” dataset, I would say yes, of course! However I was not sure about this case due to the randomly generated data that we are dealing with.
If the data is random, does it follow that the optimal strategy parameters are also randomly distributed? Are they uniformly distributed or are there “clusters” that, due to somehow exploiting the structure of the time series, perform better or worse than the average? The question is: is the performance slope around local maxima smooth, or not? A simple method to make this thing go faster is to throw the problem into a genetic algorithm, but a GA will offer no performance improvement if the performance is uniformly randomly distributed.
Testing this is simple: I just ran a GA search on the same 5 series I brute forced above. If the GA results are similar to the brute force results, we can use the GA and save a lot of time. As long as there are enough populations, and they are large enough (I settled on 4 populations with 40 chromosomes each), the results are “close enough”: roughly 3-20% lower than the brute force (max CAGR was 7.043%, max avg. daily return was 0.176%). It might be a good idea to scale the GA results by, say, an additional 10-20% in order to make up for this deficit.
I then generated 100 series and put the GA to use. Here are the results:
And here are the distributions of maximum values achieved for each individual series:
These results have set the bar rather high. One might imagine that throwing out everything below this hurdle will leave us with very little (nothing?) in the end. But if Woodriff is to be believed, he has found upwards of 1500 signals that perform better than the hurdle (and that’s 1500 signals that were uncorrelated enough with each other that they were added to their models). So there’s got to be a lot of interesting stuff to find!
In part 2 I will take a look at cross validation and what we can do with the real data.
20 Comments »The VXV is the VIX’s longer-term brother; it measures implied volatility 3 months out instead of 30 days out. The ratio between the VIX and the VXV captures the differential between short-term and medium-term implied volatility. Naturally, the ratio spends most of its time below 1, typically only spiking up during highly volatile times.
It is immediately obvious by visual inspection that, just like the VIX itself, the VIX:VXV ratio exhibits strong mean reverting tendencies on multiple timescales. It turns out that it can be quite useful in forecasting SPY, VIX, and VIX futures changes.
Short-term extremes
A simplistic method of evaluating short-term extremes is the distance of the VIX:VXV ratio from its 10-day simple moving average. When the ratio is at least 5% above the 10SMA, next-day SPY returns are, on average, 0.303% (front month VIX futures drop by -0.101%). Days when the ratio is more than 5% below the 10SMA are followed by -0.162% returns for SPY. The equity curve shows the returns on the long side:
Long-term extremes
When the ratio hits a 200-day high, next-day SPY returns have been 0.736% on average. Implied volatility does not fall as one might expect, however.
More interestingly, the picture is reversed if we look at slightly longer time frames. 200-day VIX:VXV ratio extremes can predict pullbacks in SPY quite well. The average daily SPY return for the 10 days following a 200-day high is -0.330%. This is naturally accompanied by increases in the VIX of 1.478% per day (the front month futures show returns of 1.814% per day in the same period). It’s not a fail-proof indicator (it picked the bottom in March 2011), but I like it as a sign that things could get ugly in the near future. We recently saw a new 200-day high on the 19th of December: since then SPY is down approximately 1%.
This is my last post for the year, so I leave you with wishes for a happy new year! May your trading be fun and profitable in 2013.
3 Comments »Various holiday effects are well documented for developed countries’ stock markets, typically showing abnormal returns around thanksgiving, Christmas, New Year, and Easter. Do similar effects exist in the Chinese stock market? In this post I’ll take a look at returns to the Shanghai Composite Index (SSECI) during the days surrounding the following holidays: New Year, Chinese New Year, Ching Ming Festival, Labor Day, Tuen Ng Festival, Mid-Autumn Festival. The index only has 22 years of history, so statistical significance is difficult to establish. Despite this, I believe the results are quite interesting^{1}.
The charts require a bit of explanation: the error bars are 1.65 standard errors wide on each side. As such, if an error bar does not cross the x-axis, the returns on that day are statistically significantly different from zero at the 5% level (by way of a one-tailed t-test). The most interesting holidays are the New Year, Chinese New Year, and Ching Ming Festival, all of which have several days of quite high returns around them.
New Year
Chinese New Year
Ching Ming Festival
The Ching Ming Festival occurs 15 days after the vernal equinox, which is either April 4th or April 5th.
Labor Day
Tuen Ng Festival
The Tuen Ng Festival (A.K.A. Dragon Boat Festival) occurs on the 5th day of the 5th lunar month in the Chinese calendar.
Mid-Autumn Festival
The Mid-Autumn Festival falls on the 15th day of the 8th lunar month.
Bonus: Day of the Month Effects
Since we’re looking at seasonality effects, why not the day of the month effect as well? Using the walk-forward methodology as in my previous day of the month effect posts (U.S., Europe, Asia), here are the results for the Shanghai Composite Index:
Finally, the average returns for each day of the month over the last 5000 days:
The standard turn of the month effect seems to be present, but only for the first days of the month instead of the last and first days.
And with that, I’d like to you wish you all happy holidays! In eggnog veritas.
1 Comment »I’m writing a paper on the IBS effect, but it’s taking a bit longer than expected so I thought I’d share some of the results in a blog post. The starting point is a paper by Levy & Lieberman: Overreaction of Country ETFs to US Market Returns, in which the authors find that country ETFs over-react to US returns during non-overlapping trading hours, which gives rise to abnormal returns as the country ETFs revert back the next day. In terms of the IBS effect, this suggests that a high SPY IBS would lead to over-reaction in the country ETFs and thus lower returns the next day, and vice versa.
To quickly recap, Internal Bar Strength (or IBS) is an indicator with impressive (mean reversion) predictive ability for equity indices. It is calculated as follows:
Using a selection of 32 equity index ETFs, let’s take a look at next-day returns after IBS extremes (top and bottom 20%), split up by SPY’s IBS (top and bottom half):
The results were the exact opposite of what I was expecting. Instead of over-reacting to a high SPY IBS, the ETFs instead under-react to it. A high SPY IBS is followed by higher returns for the ETFs, while a low SPY IBS is followed by lower returns. These results suggest a pair approach using SPY as the first leg of the pair, and ETFs at IBS extremes as the other. For a dollar-neutral strategy, the rules are the following:
- If SPY IBS <= 50% and ETF IBS > 80%, go long SPY and short the other ETF, in equal dollar amounts.
- If SPY IBS > 50% and ETF IBS < 20%, go short SPY and long the other ETF, in equal dollar amounts.
The results^{1}:
The numbers are excellent: high returns and relatively few trades with a high win rate. Let’s take a look at the alphas and betas from a regression of the excess returns to the pair strategy, using the Carhart 4 factor model:
On average, this strategy generates a daily alpha of 0.037%, or 9.28% annually, with essentially zero exposure to any of the factors. Transaction costs would certainly eat into this, but given the reasonable amount of trades (about 23 trades per year per pair on average) there should be a lot left over. The fact that over 90% of days consist of zero excess returns obscures the features of the actual returns to the strategy. Repeating the regression using only the days in which the strategy is in the market yields the following results:
Unfortunately, these results are pretty much a historical curiosity at this point. Most of the opportunity has been arbitraged away: during the last 4 years the average return per trade has fallen to 0.150%, less than half the average over the entire sample. The parameters haven’t been optimized, so there may be more profitable opportunities still left by filtering only for more extreme values, but it’s clear that there is relatively little juice left in the approach.
In fact if we take a closer look at the differences between the returns before and after 2008, the over-reaction hypothesis seems to be borne out by the data (another factor that may be at play here are the heightened correlations we’ve seen in the last years): low SPY IBS leads to higher next-day returns for the ETFs, and vice versa.
The lesson to take away from these numbers is that cross-market effects can be very significant, especially when global markets are in a state of high correlation. Accounting for the state of US markets in your models can add significant information (and returns) to your IBS approach.
8 Comments »I recently got a mail from IB touting their new lineup of equity index CFDs. As I trade a lot of equity index ETFs I thought I’d take a look at them, in case I could get away with lower margins and/or commissions. Here’s a quick summary of what they offer:
Pros:
- Granularity compared to futures, minimum size $1 x index value (for U.S. indices).
- Reasonable commissions.
- Low margin requirements compared to ETFs.
Cons:
- No MOC/LOC orders, or any way to not pay the spread.
- Local regular trading hours only, sometimes not even that. Can’t trade foreign index CFDs at the end of the US session.
- Trade in local currency, conversion costs if trading foreign indices.
- Pretty low size limits for any one order.
- Interest amounts to an additional cost of ~0.004% per day held.
- Found fills to be somewhat haphazard, even for small order sizes.
Assuming $0.005 per share in commissions and $0.01 in slippage for ETFs, and $1.64 in commission and 1 tick in slippage for futures, here’s how they stack up against each other:
S&P 500 | |||
CFD | ETF (SPY) | Futures (ES) | |
Market (1 tick slippage) | 0.023% | 0.011% | 0.020% |
Limit/On Close (no slippage) | N/A | 0.004% | 0.002% |
NASDAQ 100 | |||
CFD | ETF (QQQ) | Futures (NQ) | |
Market (1 tick slippage) | 0.019% | 0.023% | 0.012% |
Limit/On Close (no slippage) | N/A | 0.008% | 0.003% |
The costs are similar for all three instruments if you’re doing market orders, though CFDs can of course get costlier if held for longer periods of time. The main advantage of CFDs is the ability to control size compared to futures which are not particularly granular. For that privilege you give up the flexibility of trading outside RTH, which can be a significant disadvantage if you want to place protective stops, or want to trade foreign indices at the end of the day.
Another alternative, of course, are leveraged ETFs, which tend to offer a lower commission per unit of exposure, but have other costs associated with them: spreads for ETFs such as QLD (2x) and TQQQ (3x) are generally at 2-3 cents, management fees of around 1% p.a. (or about 0.0027% per day), as well as potential rebalancing costs. They also do not offer any advantage in terms of margin.
No Comments »Introduction
A quick intro to VIX ETPs (some are ETFs, others are ETNs)^{1} before we get to the meat: the VIX itself is not tradable, only futures on the VIX are. These futures do not behave like equity index futures which move in lockstep with the underlying, for a variety of reasons. A way to get exposure to these futures without holding them directly, is by using one or more VIX futures-based ETPs. These come in many varieties (long/short, various target average times to expiration, various levels of leverage).
The problem with them, and the reason they fail so badly at mirroring movements in the VIX, is that they have to constantly roll over their futures holdings to maintain their target average time to expiration. A 1-month long ETP will be selling front month futures and buying 2nd month futures every day, for example. Front month futures are usually priced lower than 2nd month futures, which means that the ETP will be losing value as it makes these trades (it’s selling the cheap futures and buying the expensive ones), and vice versa for short ETPs. This amounts to a transfer of wealth from hedgers to speculators willing to take opposite positions; this transfer can be predicted and exploited.
I’ll be looking at two such VIX futures-based instruments in this post: VIXY (which is long the futures), and XIV (which is short the futures). As you can see in the chart below, while the VIX is nearly unchanged over the period, VIXY has lost over 80% of its value. At the same time, XIV is up 50% (though it did suffer a gigantic drawdown).
There are many different approaches to trading these ETPs (for example Mike Brill uses realized VIX volatility to time his trades). The returns are driven by the complex relationships between the value of the index, the value of the index in relation to its moving average, the value of the futures in relation to the index, and the value of various future maturities in relation to each other. These relationships give rise to many strategies, and I’m going to present two of them below.
I’ll be using different approaches for the long and short sides of the trades. Short based on the ratio between the front and 2nd month contract, and long using the basis. Here are the rules:
Go long XIV at close (“short”) when:
- 2nd month contract is between 5% and 20% higher than the front month contract.
Go long VIXY at close (“long”) when:
- Front month future is at least 2.5% below the index.
Results
First let’s have a look at how these strategies perform without the hedge. Using data from January 2011 to November 2012, here are the daily return stats for these two approaches individually and when combined:
Equity curves & drawdowns:
The biggest issues with VIX ETN strategies are large drawdowns, and large sudden losses when the VIX spikes up (and to a lesser extent when it spikes down; these tend to be less violent though). A spike in implied volatility is almost always caused by large movements in the underlying index, in this case the S&P 500. We can use this relationship in our favor by utilizing SPY as a hedge.
- When long XIV, short SPY in an equal dollar amount.
- When long VIXY, go long SPY in an equal dollar amount.
The stats:
And the equity curves & drawdowns:
The results are quite good. The bad news is that we have to give up about 40% of CAGR. On a risk-adjusted basis, however, returns are significantly improved.
- CAGR / St. Dev. goes from 38.7 to 45.9.
- CAGR / Max Drawdown goes from 4.5 to 4.9.
All risk measures show significant improvement:
- The worst day goes from a painful -12% to a manageable -9%.
- Maximum drawdown goes from -36.5% to -25.7%.
- Daily standard deviation goes from 4.28% to 2.76%.
Of course, just because risk-adjusted returns are improved does not mean it’s necessarily a good idea. Holding SPY results in both direct costs (commissions, slippage, shorting costs) as well as the opportunity cost of the capital’s next-best use. The improvement may not be enough to justify taking away capital from another system, for example.
Another possibility would be to implement this idea using options, which have the benefit of requiring a small outlay. Especially when holding XIV, SPY puts could be a good idea as both implied volatility and price would move in our direction when the VIX spikes up. However, this must be weighted against theta burn. I don’t have access to a dataset for SPY options to test this out, unfortunately (anyone know where I can get EOD options data that is not absurdly expensive?).
If you want to play around with the data yourself, you can download the spreadsheet here.
8 Comments »Just a short post today. Jack Damn has been tweeting about consecutive up/down days lately, which inspired me to go looking for a potential edge. The NASDAQ 100 has posted 6 consecutive up days as of yesterday’s close. Is this a signal of over-extension? Unfortunately there are very few instances of such a high number of consecutive up days, so it’s impossible to speak with certainty about any of the numbers. Let’s take a look at QQQ returns (including dividends) after X consecutive up or down days:
There’s no edge on the short side here if you’re looking for a mean reversion trade. Looking out over longer horizons, it seems like many consecutive up days tend to be followed by above-average returns. Another argument in favor of the idea that the bottom of the current pullback has been reached, but nothing really useful in terms of actual trading.
Finally, a look at the probability of positive returns following X consecutive up or down days:
UPDATE: Here’s the spreadsheet.
7 Comments »
Recent Comments