data Archives

Creating a Data Management System

Half a year ago I posted about writing my own backtesting platform. While it has been even more challenging than I thought it would be, it’s going well: about 95% of “core” functionality has been implemented. Early on I realized I should design a completely separate, centralized, data management system that I could use with all my trading applications.

The QUSMA Data Management System (QDMS) works as a centralized data access point: it connects clients to external historical/real time data sources, manages metadata on instruments, and also provides local storage for historical data.

I was heavily influenced by the MultiCharts approach, though my own system is of course a bit less complex. I based a lot of the instrument metadata management as well as some of the UI design on the MC “QuoteManager” application as I think their approach is quite intuitive.

The system is designed in a modular fashion so it’s trivial to add additional data sources (as well as alternative local storage mechanisms…if I ever start storing tick data I will have to move away from my current relational database storage mechanism). The interfaces for writing external data source modules are very simple right now:

Interfaces for data storage and external data sources.

A couple screenshots of the server interface:

Importing/exporting CSV files is already implemented.

Editing instrument metadata, including custom sessions. Instruments can have custom data sessions, or they can derive their sessions from their exchange, or a template.

There’s also the client side of things, here’s the interface for selecting data series in the backtester:

Selecting data series for a backtesting run.

The client/server approach lets multiple clients use the same data stream. For example, if computations are distributed over multiple boxes and each client needs access to the same real time data, only a single connection to the external data source is required: the data is then distributed by the broker to every client that has requested that stream.

There is also the ability to push data into the local storage. One possible use for this is saving results from a backtest, then using that equity curve as a benchmark in a performance evaluation application.

I’m probably going to open source this project eventually, but right now I’m using a couple of proprietary libraries that prevent me from distributing it. It’ll take a bit of work to “disentangle” those bits. In any case I’m striving to comment well and write in a good style so that opening up the code will be relatively painless.

I learned a ton writing the QDMS because it was an opportunity to use a bunch of interesting libraries and technologies that I had never touched before: ZeroMQ, Protocol Buffers, the Entity Framework, WPF, NLog, and Reactive Extensions. I was amazed at the performance of ZMQ: out of the box, in a simple test using a single socket and a single thread, it managed to transfer nearly 200 OHLC bars per millisecond.

There’s still a bit of work to be done: one major issue is that there is no way to construct lower-frequency bars from higher-frequency data (e.g. daily bars made from 1-minute data), and only time-based bars are possible. The biggest missing piece however is generating continuous futures data. It’s a much harder problem than it seems at first glance because it’s necessary to incorporate a great deal of flexibility both in terms of the futures expiration rules and the rollover rules.

Continuous futures class.

I haven’t done any actual research in quite a while because I’ve been preoccupied with coding but I’ll be back soon! I’ve been accumulating a giant backlog of ideas that are waiting to be tested. Hopefully my new tools will be good enough to give some special insights. In any case, I can’t wait to get started.

data

Tag: data

Creating a Data Management System