Reverse Engineering DynamicHedge’s “Alpha Curves”, Part 2 of 3: K-Medoids Clustering

In the first part of the series we covered dynamic time warping. Here we look at clustering. K-means clustering is probably the most popular method, mainly due to its simplicity and intuitive algorithm. However it has some drawbacks that make it a bad choice when it comes to clustering time series. Instead, we’ll use K-medoids clustering.

The main conceptual difference between K-means and K-medoids is the distance used in the clustering algorithm. K-means uses the distance from a centroid (an average of the points in the cluster), while K-medoids uses distance from a medoid, which is simply a point selected from the data. The algorithms used for arriving at the final clusters are quite different. K-medoid clustering depends on distances from k (in this case 2) points drawn from the data. K-means distances are taken from a centroid which is created by averaging the points in that cluster.

How does K-medoids compare to K-means? It has several features that make it superior for most problems in finance.

• K-medoids is not sensitive to outliers that might “pull” the centroid to a disadvantageous position.
• K-medoids can handle arbitrary distance functions. Unlike K-means, there is no need for a mean to be defined.
• Unlike K-means, K-medoids has no problems with differently-sized clusters.
• K-medoids is also potentially less computationally intensive when the distance function is difficult to solve, as distances only need to be computed once for each pair of points.

Note that there’s no guarantee on the size of each cluster. If you want to, it’s trivial to add some sort of penalty function to force similarly-sized clusters.

The algorithm is simple:

1. Choose k points from the sample to be the initial medoids (see below for specific methods).
2. Give cluster labels to each point in the sample based on the closest medoid.
3. Replace the medoids with some other point in the sample. If the total cost (sum of distances from the closest medoid) decreases, keep this new confirugration.
4. Repeat until there is no further change in medoids.

Initialization, i.e. picking the initial clusters before the algorithm is run, is an important issue. The final result is sensitive to the initial set-up, as the clustering algorithm can get caught in local minimums. It is possible to simply assign labels at random and repeat the algorithm multiple times. I prefer a deterministic initialization method: I use a procedure based on Park et al., the code for which you can find further down. The gist of it is that it selects the first medoid to be the point with the smallest average distance to all other points, then selects the remaining medoids based on maximum distance from the previous medoids. It works best when k is set to (or at least close to) the number of clusters in the data.

An example, using two distinct groups (and two outliers):

The first medoid is selected due to its closeness to the rest of the points in the lower left cluster, then the second one is selected to be furthest away from the first one, thus “setting up” the two obvious clusters in the data.

In terms of practical applications, clustering can be used to group candlesticks and create a transition matrix (also here), to group and identify trading algorithms, or for clustering returns series for forecasting (also here). I imagine there’s some use in finding groups of similar assets for statistical arbitrage, as well.

Something I haven’t seen done but I suspect has potential is to cluster trades based on return, length, adverse excursion, etc. Then look at the average state of the market (as measured by some indicators) in each cluster, the most common industries of the stocks in each cluster, or perhaps simply the cumulative returns series of each trade at a reasonably high frequency. Differences between the “good trades” cluster(s) and the “bad trades” cluster(s) could then be used to create filters. The reverse, clustering based on conditions and then looking at the average returns in each cluster would achieve the same objective.

I wrote a simple K-medoids class in C#, which can handle arbitrary data types and distance functions. You can find it here. I believe there are packages for R and python if that’s your thing.

• 