Part 1 covered the relation between VIX/WVF extreme movements and SPY; here we take a wider look, covering a large number of international equity ETFs.

The main idea behind WVF is that it acts similarly to VIX in high-volatility situations, possibly enough to serve as an implied volatility substitute in cases where such an index does not exist. It can be useful to “confirm” signals based on implied volatility, or to replace them completely in cases where no implied volatility index exists. First of all let’s take a look at an updated VIX & SPY WVF chart:

The first post was a while ago, so let’s check how VIX and WVF have performed for SPY since then. The number of signals is very small, and WVF alone has underperformed compared to its historical results, but once again we see that the combination of VIX and WVF offered by far the best results:

Let’s take a look at how these signals work internationally:

It’s clear that using only VIX is pretty useless. Overall the returns are not significantly different from zero, and are even negative in many cases. Let’s check out WVF, which appears to work far better across most ETFs:

Finally, when VIX and WVF extreme movements coincide, the results look fantastic:

Note that even in cases where WVF alone did not show good results (the VT and WVF ETFs for example), combining VIX and WVF still results in great improvement. There is an important, general lesson here about using non-price data as trade set-ups. With few exceptions, implied volatility, breadth, seasonality, etc. need to be “confirmed” by price to actually be useful.

11 Comments »Half a year ago I posted about writing my own backtesting platform. While it has been even more challenging than I thought it would be, it’s going well: about 95% of “core” functionality has been implemented. Early on I realized I should design a completely separate, centralized, data management system that I could use with all my trading applications.

The QUSMA Data Management System (QDMS) works as a centralized data access point: it connects clients to external historical/real time data sources, manages metadata on instruments, and also provides local storage for historical data.

I was heavily influenced by the MultiCharts approach, though my own system is of course a bit less complex. I based a lot of the instrument metadata management as well as some of the UI design on the MC “QuoteManager” application as I think their approach is quite intuitive.

The system is designed in a modular fashion so it’s trivial to add additional data sources (as well as alternative local storage mechanisms…if I ever start storing tick data I will have to move away from my current relational database storage mechanism). The interfaces for writing external data source modules are very simple right now:

A couple screenshots of the server interface:

There’s also the client side of things, here’s the interface for selecting data series in the backtester:

The client/server approach lets multiple clients use the same data stream. For example, if computations are distributed over multiple boxes and each client needs access to the same real time data, only a single connection to the external data source is required: the data is then distributed by the broker to every client that has requested that stream.

There is also the ability to push data into the local storage. One possible use for this is saving results from a backtest, then using that equity curve as a benchmark in a performance evaluation application.

I’m probably going to open source this project eventually, but right now I’m using a couple of proprietary libraries that prevent me from distributing it. It’ll take a bit of work to “disentangle” those bits. In any case I’m striving to comment well and write in a good style so that opening up the code will be relatively painless.

I learned a ton writing the QDMS because it was an opportunity to use a bunch of interesting libraries and technologies that I had never touched before: ZeroMQ, Protocol Buffers, the Entity Framework, WPF, NLog, and Reactive Extensions. I was amazed at the performance of ZMQ: out of the box, in a simple test using a single socket and a single thread, it managed to transfer nearly 200 OHLC bars per millisecond.

There’s still a bit of work to be done: one major issue is that there is no way to construct lower-frequency bars from higher-frequency data (e.g. daily bars made from 1-minute data), and only time-based bars are possible. The biggest missing piece however is generating continuous futures data. It’s a much harder problem than it seems at first glance because it’s necessary to incorporate a great deal of flexibility both in terms of the futures expiration rules and the rollover rules.

I haven’t done any actual research in quite a while because I’ve been preoccupied with coding but I’ll be back soon! I’ve been accumulating a giant backlog of ideas that are waiting to be tested. Hopefully my new tools will be good enough to give some special insights. In any case, I can’t wait to get started.

8 Comments »Volatility- and drawdown-adjusted returns are the most commonly used values to judge the performance of a trader or a backtest. However, neither of those truly measures the consistency of the returns. Long periods of low-volatility, sideways movement in an equity curve are obviously undesirable, but do are not shown in the Sharpe or MAR ratios. Instead, we need to look at specialized consistency (or “straightness”) metrics.

There are of course some “standard” straightness metrics. R-squared is the most popular, and it works pretty well. I like to raise it to the 4th power or so in order to magnify small differences and make it a bit more “readable”. Another popular metric is the K-Ratio, of which there are at least 3 different versions floating around. The K-Ratio also takes returns into account, so it’s not purely a straightness measure. I prefer the Zephyr version which is calculated as the slope of the equity curve divided by its standard error.

Let’s see if we can construct some alternatives. To start out, we need a benchmark to measure straightness against. That is the “ideal line”. It is the straight line that connects the first and last points of the equity curve. The further away the equity curve is from the ideal, the less desirable it is.

There are certain obvious principles we can derive from this simple analysis:

- We want to minimize the area of deviation from the ideal line.
- The further away we are from the ideal, the worse (non-linearly).
- Being below the ideal is worse than being above it.
- Being below the ideal for long periods of time is undesirable.

We can easily quantify these ideas into a useful measure of equity curve straightness by using numbers such as the total area of deviation from the ideal, the volatility of the deviation, the length of time spent below the ideal, etc.

An interesting heuristic to look at is the number of times the equity curve crosses the ideal line. The closer the equity tracks the ideal line, the more times it will cross it. This metric fails in idealized tests, but works well in real-world scenarios. It also tends to fail when there are few trades in the sample. Divide the number by the total number of observations in the sample to standardize it.

A similar metric is the average drawdown length. Perhaps it is even more useful because presumably long deviations below the ideal are more important than deviations above it, and the number of crosses does not differentiate between the two.

Some other numbers I think may be interesting: the ratio between the area of difference above and below the ideal, the volatility of the difference, the volatility of the difference below the ideal, average absolute deviation, and average absolute deviation below the ideal (both standardized to the magnitude of the curve).

I created a metric that arbitrarily and haphazardly combines some of the above concepts, and I’m calling it the** Q**USMA **E**quity **C**urve **S**traightness, **D**ownward **D**eviation, and **S**tability **M**easure (QECSDDSM, pronounced /keɪks-du-sʌm/). It is intended purely as a straightness measure, and does not take into account returns or the slope of the equity curve. It is calculated as follows (see the excel file at the bottom to make sense of it):

Let’s take a look at some extreme examples:

First of all, note that both the Sharpe ratio and the MAR ratio would select the “wrong” strategy if they were used naively: they both prefer Series 3 & 4 over Series 2. Both the K-Ratio and QECSDDSM correctly prefer the first two. Note that the number of crosses is a useless metric here because the most perfect line has very few of them, simply due to being “too straight”. The ratio of the areas above and below the idealized line is not very useful in these scenarios because they are so extreme.

In general most of the numbers roughly agree with each other in terms of ordering the curves from best to worst, so the actual formulation of QECSDDSM doesn’t really matter all that much.

Let’s look at a slightly more realistic assortment of equity curves:

In this case the intuition behind the number of crosses metric becomes obvious. Interestingly QECSDDSM is the only metric to prefer Series 4 to Series 3, which I think is undesirable. Series 4 highlights a problem with the metrics that measure volatility or focus on the area below the ideal: simply having very few trades “gets around” them and produces an overly-high score. Again the Sharpe and MAR ratios produce an “incorrect” ranking by preferring Series 3 to Series 2. The difference mainly comes from the fact that the curve is not very volatile and does not spend a lot of time below the ideal. Some fine tuning of the parameters should smooth things out pretty easily, though.

Another potentially interesting approach to the issue would be to do some sort of regime change detection on the returns (here’s one simple approach). A straight curve will obviously have fewer changes in the average of the returns.

Finally, here’s an excel file that you can play around with.

9 Comments »The .NET ecosystem is rich with excellent, free libraries that cover pretty much everything you need when writing trading software. So here’s a collection of libraries I use in my applications, mostly focused on stats, math, and machine learning but also including time handling, data structures, and calendars:

### Accord.NET

Based on the Aforge.NET library, it offers tons of useful stuff for traders: matrices, descriptive statistics, probability distributions, optimization methods, regression, PCA, as well as a wide array of machine learning algorithms. I use it all the time.

### ILNumerics

Tons of useful math and stats functions, matrices & linear algebra (very fast), PCA, regression, unsupervised machine learning.

### Math.NET Numerics

Probability distributions & random number generation, linear algebra, simple statistical analysis, various useful math functions.

### Wintellect Power Collections

Various extremely useful data structures: double ended queue, dictionary with multiple values per key, red-black tree, ordered dictionary/list.

### QLNet

A port of quantlib to C#. Not just derivatives, there’s a lot of useful stuff in here such as calendars with holidays for a very wide array of markets. There’s also NQuantLib which I haven’t tried.

### Noda Time

Time handling done right. You need this.

### R.NET

Lets you use R from your .NET applications. Slow, buggy as hell, hard to work with, but some times it’s very useful to have access to some of the more obscure/specialized R libraries.

### ALGLIB

Regression, some machine learning, PCA, optimization, and linear algebra, simple hypothesis testing.

There are several commercial options available as well, such as Extreme Optimization and NMath. I haven’t used either of them so I can’t comment on their quality.

2 Comments »I finally finished the first draft of my IBS paper. The results are quite interesting and extremely relevant if you trade equity ETFs. You can read it here.

### Abstract:

I investigate mean reversion in equity ETF prices at the daily frequency by employing a simple technical indicator, Internal Bar Strength (IBS). IBS is based on the position of the day’s close in relation to the day’s range. I use it to forecast close-to-close returns with statistically and economically significant results for most instruments. A simple strategy based on IBS generates an average alpha of over 30% p.a. before transaction costs. I show that equity index ETFs have had strong and consistent mean reverting tendencies since the 90s, and that these effects can be exploited as part of a profitable trading strategy. The IBS effect is stronger during times of high volatility, in bear markets, after high-range days, after high-volume days, and early in the week.

**Feedback is highly appreciated, either in the comments below or by email to qusmablog at gmail dot com.**

Some of the interesting things you’ll find within:

Update: the comparison chart for the Australian ETF now correctly uses EWA instead of EWO (the Austrian ETF).

19 Comments »The second, and probably final, followup to the Mining for Three Day Candlestick Patterns post. Previously, we improved performance by adding more data to the search. In this post we’ll try to improve the system further by combining multiple predictors. The central question is how to combine the forecasts. I test averaging, weighted averaging, regression, and a voting scheme and compare them against a baseline one-predictor strategy.

### Set-Up

Combining predictors is a standard tactic in machine learning, but the case of k-NN predictors is a bit of an outlier. Typical ensemble methods depend on generating variations in the data set in order to generate different and complementary predictors (as in the cases of boosting and bagging). This doesn’t work very well with nearest neighbor predictors, however, because they tend to be insensitive to variations in the data set. So what can we vary? The choice of *k*, the choice of inputs, the choice of distance measure for the nearest neighbors, and some pre-processing options such as whether to adjust for volatility or not.

I am not going to make any variation in outputs as that’s reserved for a post of its own. The idea is pretty simple: it’s essentially a random forest with k-NN predictors instead of decision trees (here’s an interesting paper on it).

So we’re left with k, sum of absolute or sum of square distances, and volatility adjustment. I picked 10 combinations of these options:

The k values were picked at random and I’m sure it’s possible to do better by optimizing them using cross validation.

The signals obviously overlap significantly, and have similar stats when used one-by-one:

The instrument traded is SPY. Additional data is taken from the following instruments for the pattern search: EWY, EWD, EWC, EWQ, EWU, EWA, EWP, EWH, EWL, EFA, EPP, EWM, EWI, EWG, EWO, IWM, QQQ, EWS, EWT, and EWJ. The thresholds in each case are adjusted to result in a similar length of time spent in the market. Position sizing is done based on the 10-day realized volatility of SPY, as described in this post: leverage is equal to 20% divided by 10-day realized annualized standard deviation, with a maximum leverage of 200%. Finally, an IBS filter is applied that allows long positions only when IBS < 0.5 and short positions only when IBS > 0.5.

The baseline is the PF3 predictor: k = 75, square distance measure, no volatility adjustment. Here’s the equity curve:

### Averaging

The simplest approach is obviously to just average the 10 forecasts and then use the average value to generate trades. A long position is taken when the average forecast is greater than 15 basis points, and a short position when the average is smaller than -12.5 basis points. Here’s what the equity curve looks like:

It’s interesting to note that the dispersion of forecasts is inversely related to the accuracy of the average: the smaller the standard deviation of the forecasts, the more accurate they are. Unfortunately effect is marginal and thus not particularly useful for improving the strategy.

### Weighted Averaging

A simple extension, that generates slightly better stats, is to weigh each forecast before averaging. There’s a wide array of stats one can use here (Sharpe/Sortino/MAR ratios are obvious candidates); I picked the mean square error. The inverse of the MSE becomes the forecast’s weight, so that smaller errors result in greater weights. The same thresholds as above are used to generate signals. The weights provide a slight improvement both in terms of Sharpe and MAR ratios. The equity curve:

### Voting

Using a threshold for each forecast, (>5 basis points for a “long” vote, and <-10 basis points for a “short” vote), each predictor is assigned a long or short vote. The overlap between the votes is significant, between 88% and 97% for different estimators. How many votes should we require for a trade? It quickly becomes obvious that simple majority voting isn’t enough, as only near-unanimous decisions provide worthwhile predictions. The average next-day return when there are between 1 and 8 long votes is 0.4 basis points. The average return after 9 or 10 long votes is 23 basis points.

The resulting equity curve looks like this:

### Ordinary Least Squares

It’s also possible to combine the forecasts using regression, with next-day returns as the dependent variable and the k-NN predictor forecasts as the independent ones.

The distribution of forecasts with OLS is very tightly clustered around 0, and for some reason higher forecasts are not associated with higher next-day returns (as they are for the 3 methods above). I don’t really understand why this is the case. The thresholds for trades are 0.5 basis points for a long trade, and -0.5 basis points for a short trade.

An issue here is, of course, multicollinearity due to the similarity of the independent variables. This can lead to, among other problems, overfitting (which is usually characterized by very large absolute values of the coefficients). Using ridge regression solves that issue by limiting the absolute value of coefficients.

A potentially interesting idea would be to constrain the coefficients to positive values, which might lessen the overfitting effects and also make much more sense on an intuitive level (after all, we know all the forecasts are similarly accurate, so negative coefficients don’t make much sense).

### Ridge Regression

If multicollinearity is a significant problem, we can use ridge regression to solve it. It offer significant improvement over the OLS approach, but it still fares badly compared to the one-predictor case. The same thresholds as in the OLS approach are used. Here’s the equity curve:

### Stats

Here are the stats for the single-predictor base case and all the combination methods:

All of them other than the voting failed horribly. I’m not sure why, but it’s good to know. The improvement provided by the voting system is sizable, however. Not only does the voting-based strategy achieve significantly higher risk-adjusted returns, it does it while spending 15% less time in the market. Those results are also easy to improve on by simply adding more predictors. The marginal gain from each new predictor will be diminishing, but there is definitely more value to wring out of it. And this is just with 3-day patterns: we can easily add 2 and 4 day patterns into the mix as well.

### Other Possibilities

A wide array of machine learning methods can be used to combine predictions. Especially if the number of forecasts grew larger, techniques such as random forests or ANNs would be interesting to investigate. As long as simpler methods work very well I think there is little reason to increase the complexity (not to mention the opaqueness) of the strategy.

7 Comments »This is a followup to the Mining for Three Day Candlestick Patterns post. If you haven’t read the original post, do so now because I’m not going to repeat the basic mechanics of the strategy. While the approach was somewhat fruitful, it also had some obvious problems: it only seems to work in bearish or high volatility market regimes, and it couldn’t produce good short signals. The main idea I had to resolve these issues was simply to get more data.

That is easier said than done. Could we use mutual funds or index values to extend the dataset backwards? No, because the daily high/low values are inaccurate. The only alternative we are left with is using data from other instruments. So I picked a broad selection of equity ETFs to include: EWY, EWD, EWC, EWQ, EWU, EWA, EWP, EWH, EWL, EFA, EPP, EWM, EWI, EWG, EWO, IWM, QQQ, EWS, EWT, and EWJ.

The selection was comprehensive and unoptimized. I think you could do some sort of walk-forward optimization that picks the best combination of securities to include in the data set. I’m not sure how much that would help.

The additional data worked fantastically well, resolving both problems. The number of opportunities to trade increased significantly, long signals work very nicely under all market conditions, and predicting negative returns works far better. There was also an unexpected benefit: far less time is needed before the forecasts become usable. In the original implementation I waited 2000 days before starting to use the forecasts. With the extended data set this can be cut to 500, thus letting the backtest cover a longer period.

Performance-wise there were no problems, as the Accord .NET k-d tree implementation that I use is very quick. Finding the nearest 75 points in a data set of approximately 100,000, in 11 dimensions, takes less than 2 milliseconds on my overclocked 2500K.

The settings used in the search are simple: the length of the patterns is 3 days, the 75 closest ones are used to construct a forecast by averaging their next-day returns, and distance is calculated as the sum of squared distances in every dimension. Trades are taken when the forecast is above/below a certain threshold. They are then passed through a filter which only allows long positions when IBS < 0.5 and short positions only when IBS > 0.5.

It should be noted that using traditional measures of “fit” does not work very well with pattern matching. Adding the above instruments actually increases the RMSE, despite significantly increasing the trading performance of the forecasts.

A look at forecasts vs realized next-day returns:

An important aspect to note is that even marginally positive forecasts work very well. For example, with the extended dataset, forecasts between 5 and 10 basis points resulted in an average 21 bp return the next day. On the other hand, using SPY data only, the return for those forecasts was just 5 basis points. What this means is that there are many more trades to take, which is what allows the strategy to do well in all market environments. Here’s the long-only equity curve:

A couple of charts to analyze the sensitivity of the long-only strategy’s results to changes in inputs (IBS limit and minimum forecast limit):

The additional data also has the benefit of making shorting possible. The equity curve doesn’t look as good, but it’s still a giant improvement over zero predictive ability on the short side:

Finally, the long and short strategies combined, along with the stats:

The concept also seems to work for stocks. For example, I tested a long-only strategy on AAPL, using the same settings as above, both with and without the addition of MSFT data. The Microsoft data improved every aspect of the results, with surprisingly consistent performance over nearly 20 years:

It would be interesting to try to apply this on a more massive scale, by increasing the data set to something like all S&P 500 stocks. Some technical restrictions prevent me from doing that right now, but I’ll come back to the idea in the future.

12 Comments »I have gotten a couple of emails asking me about the topic of analyzing performance, so I decided to detail the tools I use. Measuring performance and attributing success or failure to the right factors is an extremely important part of the trading process. Actually trading a strategy will often reveal aspects that don’t come up in the research stage. Unexpected things happen, revealing previously hidden strengths or weaknesses. Strategies improve or deteriorate through time. Execution issues eat into returns. Patterns emerge that can be exploited to enhance returns or limit risk.

These situations, and performance evaluation in general, are a crucial part of the research/trading/performance loop:

A lack of attention to performance, and the underlying factors that drive it, will have a deleterious effect both on your long-term trading results and the things that you will discover in the research stage.

I’ll demonstrate the tools using two strategies, one of which has been going well, and the other not: 1) a rather generic GTAA momentum/trend-following strategy that has been running for a bit over a year, and 2) an AAPL swing trading strategy that’s been in “trial” mode for the last 6 months or so.

My performance analysis system, the QUSMA Portfolio and Trade Analytics Suite, is primarily based around the concept of a “trade”. A trade is a unit that can contain any number of orders and cash transactions (dividends, taxes, etc.), which are somehow related. A pair trade would include both legs in a single trade, for example. The underlying data is imported using IB’s flex queries which have a very simple and easy to handle XML structure.

Trades are assigned to a “strategy” and can also be assigned any number of tags. Some of the things that I use tags for are: trade direction (long/short/both), trade length, developed/developing country, asset class, etc. Notes with images can also be attached to trades, which is incredibly useful for reviews. Finally, the trades can be filtered on any number of criteria to produce reports, and compared against custom benchmarks.

There are some general principles that summarize my approach to performance measurement:

- Execution and commissions are extremely important.
- Separate timing from sizing.
- Statistics on trades in both dollar terms and % terms.

- Separate capital allocations to strategies from total capital.
- Statistics on returns both on capital allocated to a strategy (ROAC) and on total capital (ROTC).

- Always think probabilistically and in terms of expectations
- The more ways you can find to look at the data, the better.

Simple visual inspection is my starting point, and I think it’s very important. The simple act of staring at charts often leads to new research ideas.

So let’s get started with the graphs and stats. At the top, the standard dollar PnL (daily and close-to-close) and equity curves (both in terms of ROAC and ROTC), which are also plotted against a benchmark:

Next up are the trade statistics. Commissions are right up there, it’s very important to keep in mind how much you are losing in those costs. A few basis points may not seem like much, but they can quickly eat up a significant portion of your profits. Note all the stats are given both in dollar and percentage terms, in order to separate timing effects from sizing effects.

Results by calendar month:

Probably the most important bit, statistics on daily returns, the standard ratios, and so forth. The MAR ratio is probably the most important number for me. The reason is simple: it determines my leverage constraints, and thus my returns. A high Sharpe ratio is meaningless if you can’t lever up. Note how the simple, static benchmark portfolio has destroyed the GTAA approach:

Some simple benchmarking stuff:

Histograms of daily returns, and returns per trade. Again, it’s important to look at both dollar and percentage results:

Also, holding period histogram:

Position sizing vs trade returns. Naive risk parity seems to be doing alright:

Trade length vs returns chart, the relationship here is pretty clear.

The movement capture stats measure how good the strategy is at capturing returns. GU is gross upside, or the gross positive returns during the period. UC% is the percentage of that movement that was captured by being long, UM% is the percentage of the movement that was missed by being flat, while UL% is the percentage of the movement that was lost due to being short. The calculations are repeated for downside movement.

Cumulative percent returns, by instrument. A similar chart with dollar PnL by instrument also exists.

Autocorrelation and partial autocorrelation stats based on daily returns:

Standard value at risk calculations, based on resampled historical data. I’ll be adding the option to use parametric methods in the future.

Monte Carlo simulation. It simply uses historical data, either trades or daily returns (either ROAC or ROTC). Sampling can be done with replacement or without (the latter simply re-orders the existing equity curve). There is also an option to use N consecutive days/trades, which can capture volatility clustering and autocorrelation effects. The analysis returns confidence intervals for the equity curve, as well as the cumulative and point distributions of maximum drawdowns.

Finally, some simple stats and charts on execution. All of my trades are either at the close or the open, so those are the prices I benchmark against. Below are stats from the AAPL strategy’s buy orders around the close.

I think that the biggest weakness in my toolset is the lack of interaction with backtesting results. These can be used in two main ways: 1) comparing theoretical results to real trading results, and 2) as an extended dataset for the risk management functions. Also, I don’t do any stock picking, but if I did that would entail several additions, mainly performance attribution by country, sector, etc. as well as analyzing value/size/momentum factor exposures.

Leave a comment and tell us what you like to use: is the standard stuff enough for you, or do you use any obscure ratios or unique charts?

1 Comment »A simple post on position sizing, comparing three similar volatility-based approaches. In order test the different sizing techniques I’ve set up a long-only strategy applied to SPY, with 4 different signals:

- UDIDSRI.
- 2-day candlestick KNN search, going long if the expected return is > 0.125%.
- Cutler’s RSI(3): long if RSI <= 10, exit if > 50.
- Long at every 15 day low close.

On top of that sits an IBS filter, allowing long positions only when IBS is below 50%. A position is taken if any of the signals is triggered. Entries and exits at the close of the day, no stops or targets. Results include commissions of 1 cent per share.

Sizing based on realized volatility uses the 10-day realized volatility, and then adjusts the size of the position such that, if volatility remains unchanged, the portfolio would have an annualized standard deviation of 17%. The fact that the strategy is not always in the market decreases volatility, which is why to get close to the ~11.5% standard deviation of the fixed fraction sizing we need to “overshoot” by a fair bit.

The same idea is used with the GARCH model, which is used to forecast volatility 3 days ahead. That value is then used to adjust size. And again the same concept is used with VIX, but of course option implied volatility tends to be greater than realized volatility, so we need to overshoot by even more, in this case to 23%.

Let’s take a look at the strategy results with the simplest sizing approach (allocating all available capital):

Returns are the highest during volatile periods, and so are drawdowns. This results in an uneven equity curve, and highly uneven risk exposure. There is, of course, no reason to let the market decide these things for us. Let’s compare the fixed fraction approach to the realized volatility- and VIX-based sizing approaches:

These results are obviously unrealistic: nobody in their right mind would use 600% leverage in this type of trade. A Black Monday would very simply wipe you out. These extremes are rather infrequent, however, and leverage can be capped to a lower value without much effect.

With the increased leverage comes an increase in average drawdown, with >5% drawdowns becoming far more frequent. The average time to recovery is also slightly increased. Given the benefits, I don’t see this as a significant drawback. If you’re willing to tolerate a 20% drawdown, the frequency of 5% drawdowns is not that important.

On the other hand, the deepest drawdowns naturally tend to come during volatile periods, and the decrease of leverage also results in a slight decrease of the max drawdown. Returns are also improved, leading to better risk-adjusted returns across the board for the volatility-based sizing approaches.

The VIX approach underperforms, and the main reason is obviously that it’s not a good measure of expected future volatility. There is also the mismatch between the VIX’s 30-day horizon and the much shorter horizon of the trades. GARCH and realized volatility result in very similar sizing, so the realized volatility approach is preferable due to its simplicity.

4 Comments »Posting has been slow lately because I’ve been busy with a bunch of other stuff, including the CFA Level 3 exam last weekend. I’ve also begun work on a very ambitious project: a fully-featured all-in-one backtesting and live trading suite, which is what prompted this post.

Over the last half year or so I’ve been moving toward more complex tools (away from excel, R, and MATLAB), and generally just writing standalone backtesters in C# for every concept I wanted to try out, only using Multicharts for the simplest ideas. This approach is, of course, incredibly inefficient, but the software packages available to “retail” traders are notoriously horrible, and I have nowhere near the capital I’d need to afford “real” tools like QuantFACTORY or Deltix.

The good thing about knowing how to code is that if a tool doesn’t exist you can just write it, and that’s exactly what I’m doing. Proper portfolio-level backtesting and live trading that’ll be able to easily do everything from intraday pairs trading to long term asset allocation and everything in-between, all under the same roof. On the other hand it’s also tailored to my own needs, and as such contains no plans for things like handling fundamental data. Most importantly it’s my dream research platform that’ll let me go from idea, to robust testing & optimization, to implementation very quickly. Here’s what the basic design looks like:

What’s the point of posting about it? I know there are many other people out there facing the same issues I am, so hopefully I can provide some inspiration and ideas on how to solve them. Maybe it’ll prompt some discussion and idea-bouncing, or perhaps even collaboration.

Most of the essential stuff has already been laid down, so basic testing is already possible. A simple example based on my previous post can showcase some essential features. Below you’ll find the code behind the PatternFinder indicator, which uses the Accord.NET library’s k-d tree and k nearest neighbor algorithm implementation to do candlestick pattern searches as discussed here. Many elements are specific to my system, but the core functionality is trivially portable if you want to borrow it.

Note the use of attributes to denote properties as inputs, and set their default values. Options can be serialized/deserialized for easy storage in files or a database. Priority settings allow the user to specify the order of execution, which can be very important in some cases. Indexer access works with [0] being the current bar, [1] being the previous bar, etc. Different methods for historical and real time bars allow for a ton of optimization to speed up processing when time is scarce, though in this case there isn’t much that can be done.

The VariableSeries class is designed to hold time series, synchronize them across the entire parent object, prevent data snooping, etc. The Indicator and Signal classes are all derived from VariableSeries, which is the basis for the system’s modularity. For example, in the PatternFinder indicator, OHLC inputs can be modified by the user through the UI, e.g. to make use of the values of an indicator rather than the instrument data.

The backtesting analysis stuff is still in its early stages, but again the foundations have been laid. Here are some stats using a two-day PatternFinder combined with IBS, applied on SPY:

Here’s the first iteration of the signal analysis interface. I have added 3 more signals to the backtest: going long for 1 day at every 15 day low close, the set-up Rob Hanna posted yesterday over at Quantifiable Edges (staying in for 5 days after the set-up appears), and UDIDSRI. The idea is to be able to easily spot redundant set-ups, find synergies or anti-synergies between signals, and easily get an idea of the marginal value added by any one particular signal.

And here’s some basic Monte Carlo simulation stuff, with confidence intervals for cumulative returns and PDF/CDF of the maximum drawdown distribution:

Here’s the code for the PatternFinder indicator. Obviously it’s written for my platform, but it should be easily portable. The “meat” is all in CalcHistorical() and GetExpectancy().

/// <summary> /// K nearest neighbor search for candlestick patterns /// </summary> public class PatternFinder : Indicator { [Input(3)] public int PatternLength { get; set; } [Input(75)] public int MatchCount { get; set; } [Input(2000)] public int MinimumWindowSize { get; set; } [Input(false)] public bool VolatilityAdjusted { get; set; } [Input(false)] public bool Overnight { get; set; } [Input(false)] public bool WeighExpectancyByDistance { get; set; } [Input(false)] public bool Classification { get; set; } [Input(0.002)] public double ClassificationLimit { get; set; } [Input("Euclidean")] public string DistanceType { get; set; } [SeriesInput("Instrument.Open")] public VariableSeries<decimal> Open { get; set; } [SeriesInput("Instrument.High")] public VariableSeries<decimal> High { get; set; } [SeriesInput("Instrument.Low")] public VariableSeries<decimal> Low { get; set; } [SeriesInput("Instrument.Close")] public VariableSeries<decimal> Close { get; set; } [SeriesInput("Instrument.AdjClose")] public VariableSeries<decimal> AdjClose { get; set; } private VariableSeries<double> returns; private VariableSeries<double> stDev; private KDTree<double> _tree; public PatternFinder(QSwing parent, string name = "PatternFinder", int BarsCount = 1000) : base(parent, name, BarsCount) { Priority = 1; returns = new VariableSeries<double>(parent, BarsCount); stDev = new VariableSeries<double>(parent, BarsCount) { DefaultValue = 1 }; } internal override void Startup() { _tree = new KDTree<double>(PatternLength * 4 - 1); switch (DistanceType) { case "Euclidean": _tree.Distance = Accord.Math.Distance.Euclidean; break; case "Absolute": _tree.Distance = AbsDistance; break; case "Chebyshev": _tree.Distance = Accord.Math.Distance.Chebyshev; break; default: _tree.Distance = Accord.Math.Distance.Euclidean; break; } } public override void CalcHistorical() { if (VolatilityAdjusted && CurrentBar > 0) returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1); if (VolatilityAdjusted && CurrentBar > 11) stDev.Value = returns.StandardDeviation(10); if (CurrentBar < PatternLength + 1) return; if (CurrentBar > MinimumWindowSize) Value = GetExpectancy(GetCoords()); double ret = Overnight ? (double)(Open[0] / Close[1] - 1) : (double)(AdjClose[0] / AdjClose[1] - 1); double adjret = ret / stDev[0]; if (Classification) _tree.Add(GetCoords(1), adjret > ClassificationLimit ? 1 : 0); else _tree.Add(GetCoords(1), adjret); } public override void CalcRealTime() { if (VolatilityAdjusted && CurrentBar > 0) returns.Value = (double)(AdjClose[0] / AdjClose[1] - 1); if (VolatilityAdjusted && CurrentBar > 11) stDev.Value = returns.StandardDeviation(10); if (CurrentBar > MinimumWindowSize) Value = GetExpectancy(GetCoords()); } private double GetExpectancy(double[] coords) { if (!WeighExpectancyByDistance) return _tree.Nearest(coords, MatchCount).Average(x => x.Node.Value) * stDev[0]; else { var nodes = _tree.Nearest(coords, MatchCount); double totweight = nodes.Sum(x => 1 / Math.Pow(x.Distance, 2)); return nodes.Sum(x => x.Node.Value * ((1 / Math.Pow(x.Distance, 2)) / totweight)) * stDev[0]; } } private static double AbsDistance(double[] x, double[] y) { return x.Select((t, i) => Math.Abs(t - y[i])).Sum(); } private double[] GetCoords(int offset = 0) { double[] coords = new double[PatternLength * 4 - 1]; for (int i = 0; i < PatternLength; i++) { coords[4 * i] = (double)(Open[i + offset] / Close[i + offset]); coords[4 * i + 1] = (double)(High[i + offset] / Close[i + offset]); coords[4 * i + 2] = (double)(Low[i + offset] / Close[i + offset]); if (i < PatternLength - 1) coords[4 * i + 3] = (double)(Close[i + offset] / Close[i + 1 + offset]); } return coords; } }

Coming up Soon™: a series of posts on cross validation, an in-depth paper on IBS, and possibly a theory-heavy paper on the low volatility effect.

12 Comments »
## Recent Comments