I have gotten a couple of emails asking me about the topic of analyzing performance, so I decided to detail the tools I use. Measuring performance and attributing success or failure to the right factors is an extremely important part of the trading process. Actually trading a strategy will often reveal aspects that don’t come up in the research stage. Unexpected things happen, revealing previously hidden strengths or weaknesses. Strategies improve or deteriorate through time. Execution issues eat into returns. Patterns emerge that can be exploited to enhance returns or limit risk.
These situations, and performance evaluation in general, are a crucial part of the research/trading/performance loop:
A lack of attention to performance, and the underlying factors that drive it, will have a deleterious effect both on your long-term trading results and the things that you will discover in the research stage.
I’ll demonstrate the tools using two strategies, one of which has been going well, and the other not: 1) a rather generic GTAA momentum/trend-following strategy that has been running for a bit over a year, and 2) an AAPL swing trading strategy that’s been in “trial” mode for the last 6 months or so.
My performance analysis system, the QUSMA Portfolio and Trade Analytics Suite, is primarily based around the concept of a “trade”. A trade is a unit that can contain any number of orders and cash transactions (dividends, taxes, etc.), which are somehow related. A pair trade would include both legs in a single trade, for example. The underlying data is imported using IB’s flex queries which have a very simple and easy to handle XML structure.
Trades are assigned to a “strategy” and can also be assigned any number of tags. Some of the things that I use tags for are: trade direction (long/short/both), trade length, developed/developing country, asset class, etc. Notes with images can also be attached to trades, which is incredibly useful for reviews. Finally, the trades can be filtered on any number of criteria to produce reports, and compared against custom benchmarks.
A trade and its two associated orders.
There are some general principles that summarize my approach to performance measurement:
- Execution and commissions are extremely important.
- Separate timing from sizing.
- Statistics on trades in both dollar terms and % terms.
- Separate capital allocations to strategies from total capital.
- Statistics on returns both on capital allocated to a strategy (ROAC) and on total capital (ROTC).
- Always think probabilistically and in terms of expectations
- The more ways you can find to look at the data, the better.
Simple visual inspection is my starting point, and I think it’s very important. The simple act of staring at charts often leads to new research ideas.
GTAA strategy: a number of losing trades in TLT.
So let’s get started with the graphs and stats. At the top, the standard dollar PnL (daily and close-to-close) and equity curves (both in terms of ROAC and ROTC), which are also plotted against a benchmark:
GTAA strategy cumulative returns on allocated capital. Chart also comes in ROTC flavor.
AAPL strategy cumulative PnL.
Next up are the trade statistics. Commissions are right up there, it’s very important to keep in mind how much you are losing in those costs. A few basis points may not seem like much, but they can quickly eat up a significant portion of your profits. Note all the stats are given both in dollar and percentage terms, in order to separate timing effects from sizing effects.
Results by calendar month:
AAPL strategy. Also comes in ROTC flavor.
Probably the most important bit, statistics on daily returns, the standard ratios, and so forth. The MAR ratio is probably the most important number for me. The reason is simple: it determines my leverage constraints, and thus my returns. A high Sharpe ratio is meaningless if you can’t lever up. Note how the simple, static benchmark portfolio has destroyed the GTAA approach:
GTAA strategy. Benchmark is a 20/15/15/20/10/10/10 % mix of SPY/EFA/EEM/IEF/LQD/VNQ/DBC respectively. Stats are also available for ROTC.
Some simple benchmarking stuff:
GTAA strategy vs diversified benchmark.
Histograms of daily returns, and returns per trade. Again, it’s important to look at both dollar and percentage results:
Also, holding period histogram:
Position sizing vs trade returns. Naive risk parity seems to be doing alright:
Trade length vs returns chart, the relationship here is pretty clear.
The movement capture stats measure how good the strategy is at capturing returns. GU is gross upside, or the gross positive returns during the period. UC% is the percentage of that movement that was captured by being long, UM% is the percentage of the movement that was missed by being flat, while UL% is the percentage of the movement that was lost due to being short. The calculations are repeated for downside movement.
GTAA strategy. Being long-only, only upside movement has been captured.
Cumulative percent returns, by instrument. A similar chart with dollar PnL by instrument also exists.
Autocorrelation and partial autocorrelation stats based on daily returns:
GTAA strategy. High autocorrelation values can be exploited both to enhance returns and for risk management.
Standard value at risk calculations, based on resampled historical data. I’ll be adding the option to use parametric methods in the future.
GTAA strategy: 10-day value at risk.
Monte Carlo simulation. It simply uses historical data, either trades or daily returns (either ROAC or ROTC). Sampling can be done with replacement or without (the latter simply re-orders the existing equity curve). There is also an option to use N consecutive days/trades, which can capture volatility clustering and autocorrelation effects. The analysis returns confidence intervals for the equity curve, as well as the cumulative and point distributions of maximum drawdowns.
GTAA strategy: there is a 10% chance of a drawdown worse than 18% in the next 500 trading days.
Finally, some simple stats and charts on execution. All of my trades are either at the close or the open, so those are the prices I benchmark against. Below are stats from the AAPL strategy’s buy orders around the close.
Top: slippage vs time difference in seconds from benchmark. Middle: slippage by order type. Bottom: Slippage histogram.
I think that the biggest weakness in my toolset is the lack of interaction with backtesting results. These can be used in two main ways: 1) comparing theoretical results to real trading results, and 2) as an extended dataset for the risk management functions. Also, I don’t do any stock picking, but if I did that would entail several additions, mainly performance attribution by country, sector, etc. as well as analyzing value/size/momentum factor exposures.
Leave a comment and tell us what you like to use: is the standard stuff enough for you, or do you use any obscure ratios or unique charts?
Taking another look at portfolio optimization methods as they apply to relative strength- & momentum-based GTAA portfolios, this time I present a novel method that uses factor momentum in an attempt to tilt the allocation towards the factors behind the momentum, and away from factors that do not have strong momentum. The approach was inspired by a nice little paper by Bhansali et al.: The Risk in Risk Parity: A Factor Based Analysis of Asset Based Risk Parity.
My first idea was something I would have termed “Global Tactical Factor Allocation”: a relative strength & momentum approach based directly on risk factors instead of assets. It failed miserably, but the work showed some interesting alternative paths. One of these was to decompose returns into their principal components (factors) and base a weighting algorithm on factor momentum. Early testing shows promise.
A quick intro to principal component analysis (you can skip to the next section if you’re not interested) before we proceed. PCA is method to linear transform data in a potentially useful way. It re-expresses data such that the principal components are uncorrelated (orthogonal) to each other, and the first component expresses the direction of maximum variance (i.e. it explains the greatest possible part of the total variance of any orthogonal component), and so forth for the 2nd…nth components. It can be used as a method of dimensionality reduction by ignoring the lower-variance components, but that is not relevant to the present analysis.
The principal components can be obtained very easily. In MATLAB:
- Assume r are the de-meaned returns of our assets.
- [V D] = eig(cov(r)); will give us the eigenvectors (V) and eigenvalues (D) of the covariance matrix of r.
- The diagonal of D contains the variance of each principal component. To get the % of total variance explained by each component, simply: 100 * flipud(diag(D)) / sum(diag(D)).
- V contains the linear coefficients of our assets to each component; fliplr(V) to get them in the “right” order.
- Finally, to find the actual principal components we simply multiply the returns with V: r*V.
Data & Methodology
The assets used are the following:
The data covers the period from January 2002 to October 2012 and includes dividends. Some of the assets do not have data for the entire period; they simply become available when they have 252 days of data. The methodology is essentially the standard relative-strength & momentum GTAA approach:
- Rebalance every Friday.
- Rank assets by their 120-day returns and pick the top 5.
- Discard any assets that had negative returns during the period, even if they are in the top 5.
- Apply portfolio optimization algorithm.
- Trade on close.
Commissions are not taken into account (though I acknowledge they are a significant issue due to frequent rebalancing). The algorithms I will be benchmarking against are: equal weights, naïve risk parity (RP), equal risk contribution (ERC), and minimum correlation (MCA). See this post for more on these methods.
A look at the factors
When extracting principal components from asset returns there is typically an a posteriori identification of each component with a risk factor. Looking at stock returns, the first factor is typically identified as the so-called “market” factor. Bhansali et al. identify the two first factors as “growth” and “inflation”. I leave this identification process as homework to the reader.
Using the last 250 days in the data, here is the % of total variance explained by each factor:
And here are the factor loadings (for the top 3 factors) for each asset:
Factor momentum weighting
The general idea is to decompose the returns of our chosen assets into principal components, identify the factors that have relative strength and absolute momentum, and then tilt the weights towards them and away from the low- and negative-momentum factors. We are left with the non-trivial problem of constructing a portfolio that has exposure to the factor(s) we want, and are neutral to the rest. We will do this by setting target betas for the portfolio and then minimizing the difference between the target and realized betas given a set of weights (using a numerical optimizer).
- After selecting the assets using the above steps, decompose their returns into their principal components.
- Rank the factors on their 120-day returns and pick the top 3.
- Discard any factors that had negative returns during the period, even if they are in the top 3.
- Discarded factors have a target beta of 0.
- The other factors have a target beta of 1.
The objective then is to minimize:
where ti is the target portfolio beta to risk factor i, βi is the portfolio beta to risk factor i, and M is the number of risk factors.
Here are the performance metrics of the benchmarks and the factor momentum algorithm:
The selling point of the factor momentum approach is the consistency it manages to achieve, while also maintaining excellent volatility-adjusted and drawdown-adjusted returns. There are two long periods during which the other optimization methods did quite badly (2008-2009 and from the middle of 2011 to the present); factor momentum just keeps going.
One interesting point is that the factor momentum algorithm tends to allocate to fewer holdings than the other approaches (because all the other algorithms will always have non-zero weights for any assets selected, which is not the case for FM). There may be some low-hanging fruit here in terms of diversification.
Some other potentially interesting ideas for the future: is there any value in the momentum of residuals (in a regression against the factor returns), similar to the Blitz, Huij & Martens approach? An interesting extension would be to loosen the factor-neutral constraint to leave room for other objectives. Finding a smarter way to calculate target betas would also be an interesting and probably fruitful exercise; taking each factor’s volatility and momentum into account is probably the most obvious idea.
I was revisiting the choice of portfolio optimization algorithm for the GTAA portion of my portfolio and thought it was an excellent opportunity for another post. The portfolio usually contains 5 assets (though at times it may choose fewer than 5) picked from a universe of 17 ETFs and mutual funds, which are picked by relative and absolute momentum. The specifics are irrelevant to this post as we’ll be looking exclusively at portfolio optimization techniques applied after the asset selection choices have been made.
Tactical asset allocation portfolios present different challenges from optimizing portfolios of stocks, or permanently diversified portfolios, because the mix of asset classes is extremely important and can vary significantly through time. Especially when using methods that weigh directly on volatility, bonds tend to have very large weights. During the last couple of decades this has been working great due to steadily dropping yields, but it may turn out to be dangerous going forward. I aim to test a wide array of approaches, from the crude equal weights, to the trendy risk parity, and the very fresh minimum correlation algorithm. Standard mean-variance optimization is out of the question because of its many and well-known problems, but mainly because forecasting returns is an exercise in futility.
The only restriction on the weights is no shorting; there are no minimum or maximum values.
Risk parity (often confused with equal risk contribution) is essentially weighting proportional to the inverse of volatility (as measured by the 120-day standard deviation of returns, in this case). I will be using an unlevered version of the approach. I must admit I am still somewhat skeptical of the value of the risk parity approach for the bond-related reasons mentioned above.
Minimum volatility portfolios take into account the covariance matrix and have weights that minimize the portfolio’s expected volatility. This approach has been quite successful in optimizing equity portfolios, partly because it indirectly exploits the low volatility anomaly. You’ll need a numerical optimization algorithm to solve for the minimum volatility portfolio.
A note on shrinkage (not that kind of shrinkage!): one issue with algorithms that make use of the covariance matrix is estimation error. The number of covariances that must be estimated grows exponentially with the number of assets in the portfolio, and these covariances are naturally not constant through time. The errors in the estimation of these covariances have negative effects further down the road when we calculate the desired weightings. A partial solution to this problem is to “shrink” the covariance matrix towards a “target matrix”. For more on the topic of shrinkage, as well as a description of the shrinkage approach I use here, see Honey, I Shrunk the Sample Covariance Matrix by Ledoit & Wolf.
- Equal Risk Contribution (ERC)
The ERC approach is sort of an advanced version of risk parity that takes into account the covariance matrix of the assets’ returns (here‘s a quick comparison between the two). This difference results in significant complications when it comes to calculating weights, as you need to use a numerical optimization algorithm to minimize
subject to the standard restrictions on the weights, where xi is the weight of the ith asset, and (Σx)i denotes the ith row of the vector resulting from the product of Σ (the covariance matrix) and x (the weight vector). To do this I use MATLAB’s fmincon SQP algorithm.
For more on ERC, a good overview is On the Properties of Equally-Weighted Risk Contributions Portfolios by Maillard, et. al.
- Minimum Correlation Algorithm (MCA)
A new optimization algorithm, developed by David Varadi, Michael Kapler, and Corey Rittenhouse. The main object of the MCA approach is to under-weigh assets with high correlations and vice versa, though it’s a bit more complicated than just weighting by the inverse of assets’ average correlation. If you’re interested in the specifics, check out the paper: The Minimum Correlation Algorithm: A Practical Diversification Tool.
Moving on to the results, it quickly becomes clear that there isn’t much variation between the approaches. Most of the returns and risk management are driven by the asset selection process, leaving little room for the optimization algorithms to improve or screw up the results.
Predictably, the “crude” approaches such as equal weights or the inverse of maximum drawdown don’t do all that well. Not terribly by any means, but going up in complexity does seem to have some advantages. What stands out is that the minimum correlation algorithm outperforms the rest in both risk-adjusted return metrics I like to use.
Risk parity, despite its popularity, wallows in mediocrity in this test; its only redeeming feature being a bit of positive skew which is always nice to have.
The minimum volatility weights are an interesting case. They do what is says on the box: minimize volatility. Returns suffer consequently, but are excellent on a volatility-adjusted basis. On the other hand, the performance in terms of maximum drawdown is terrible. Some interesting features to note: the worst loss for the minimum volatility weights is by far the lowest of the pack: the worst day in over 15 years was -2.91%. This is accompanied by the lowest average time to recover from drawdowns, and an obscene (though also rather unimportant) longest winning streak of 22 days.
Finally, equal risk contribution weights almost match the performance of minimum volatility in terms of CAGR / St.Dev. while also giving us a lower drawdown. ERC also comes quite close to MCA; I would say it is the second-best approach on offer here.
A look at the equity curves below shows just how similar most of the allocations are. The results could very well be due to luck and not a superior algorithm.
To investigate further, I have divided the equity curves into three parts: 1996 – 2001, 2002-2007, and 2008-2012. Consistent results across these sub-periods would increase my confidence that the best algorithms actually provide value and weren’t just lucky.
As expected there is significant variation in results between sub-periods. However, I believe these numbers solidify the value of the minimum correlation approach. If we compare it to its closest rival, ERC, minimum correlation comes out ahead in 2 out of 3 periods in terms of volatility-adjusted returns, and in 3 out of 3 periods in terms of drawdown-adjusted returns.
The main lesson here is that as long as your asset selection process and money/risk management are good, it’s surprisingly tough to seriously screw up the results by using a bad portfolio optimization approach. Nonetheless I was happily surprised to see minimum correlation beat out the other, more traditional, approaches, even though the improvement is marginal.