Today we are releasing refinements that advance our machine learning algorithms' identification of mean-reverting bond pairs. Our model is now considerably more effective at discarding spuriously correlated pairs and disentangling pair motions from bulk market movements. These improvements are encapsulated in two key areas of the model:
- Determining the relevancy of the bonds that form a pair
- Identification of the top performing ideas in the market
In improving the calculation of pair relevancy Katana's algorithm now utilizes a combination of automated and manually engineered features to identify relevant pairs, including:
- (a) a relevancy identification model that has been trained with our partners
- (b) a word2vec model based on Google News Negative 300 vectors, and
- (c) several statistical measures based on the pairs' past performance
These features above are summarized by the new "Similarity" score on the Discovery page. This score can take on values between 0 and 1, with 1 indicating a highly similar pair. These features are convolved with each bond's historic pricing data to produce a segmentation of “related bonds” — bonds that share a sufficiently large fraction of both static and dynamic characteristics that even professional asset managers agree make them a compelling tradable pair.
We have also improved the model's ability to identify reliably mean-reverting pairs by utilizing up to 5 years of overlapping price history between the bonds in any given pair. Over this period we calculate a wide variety of statistical measures in multiple windows and over a range of horizons of different durations. These metrics are aggregated across all intervals in order to construct a holistic measure of how any given pair of bonds have performed in tandem throughout their history. When a pair's dynamics evolve beyond specific thresholds on these metrics we can be reasonably confident that one of the bonds in the recommended pair is over-/undervalued with respect to the other. And we make use of both of the absolute z-score dislocation of the pair, as well as its rate of change, to then identify the potential top-performing ideas.
Using a robust backtesting framework and a 30-day mean reversion horizon, from both IG and EM combined, we find that 91% of the pairs identified as “Top Ideas” revert to their mean separations 30-days later with a median compression of -11 bps, and performs robustly across a wide spectrum of reversion timescales:
This refined approach also allows us to identify and present past mean reversions of a given pair, providing further insight for our users as to a pairs' historical behavior. The figures below are just two examples of pairs our model has identified that exhibit regular mean-reverting behavior (as indicated by the regularly spaced, vertically shaded regions):
We're continuing to develop several new features that will allow our users to gain even more insight into the ideas generated by our models, as well as to furnish still more information about how the ideas generated by the model can be expected to perform.