Creating a Data Management System

Half a year ago I posted about writing my own backtesting platform. While it has been even more challenging than I thought it would be, it’s going well: about 95% of “core” functionality has been implemented. Early on I realized I should design a completely separate, centralized, data management system that I could use with all my trading applications.

The QUSMA Data Management System (QDMS) works as a centralized data access point: it connects clients to external historical/real time data sources, manages metadata on instruments, and also provides local storage for historical data.

I was heavily influenced by the MultiCharts approach, though my own system is of course a bit less complex. I based a lot of the instrument metadata management as well as some of the UI design on the MC “QuoteManager” application as I think their approach is quite intuitive.

diagram

 

The system is designed in a modular fashion so it’s trivial to add additional data sources (as well as alternative local storage mechanisms…if I ever start storing tick data I will have to move away from my current relational database storage mechanism). The interfaces for writing external data source modules are very simple right now:

Interfaces for data storage and external data sources.

Interfaces for data storage and external data sources.

 

A couple screenshots of the server interface:

Importing/exporting CSV files is already implemented.

Importing/exporting CSV files is already implemented.

Editing instrument metadata

Editing instrument metadata, including custom sessions. Instruments can have custom data sessions, or they can derive their sessions from their exchange, or a template.

There’s also the client side of things, here’s the interface for selecting data series in the backtester:

Selecting data series for a backtesting run.

Selecting data series for a backtesting run.

The client/server approach lets multiple clients use the same data stream. For example, if computations are distributed over multiple boxes and each client needs access to the same real time data, only a single connection to the external data source is required: the data is then distributed by the broker to every client that has requested that stream.

There is also the ability to push data into the local storage. One possible use for this is saving results from a backtest, then using that equity curve as a benchmark in a performance evaluation application.

I’m probably going to open source this project eventually, but right now I’m using a couple of proprietary libraries that prevent me from distributing it. It’ll take a bit of work to “disentangle” those bits. In any case I’m striving to comment well and write in a good style so that opening up the code will be relatively painless.

I learned a ton writing the QDMS because it was an opportunity to use a bunch of interesting libraries and technologies that I had never touched before: ZeroMQ, Protocol Buffers, the Entity Framework, WPF, NLog, and Reactive Extensions. I was amazed at the performance of ZMQ: out of the box, in a simple test using a single socket and a single thread, it managed to transfer nearly 200 OHLC bars per millisecond.

There’s still a bit of work to be done: one major issue is that there is no way to construct lower-frequency bars from higher-frequency data (e.g. daily bars made from 1-minute data), and only time-based bars are possible. The biggest missing piece however is generating continuous futures data. It’s a much harder problem than it seems at first glance because it’s necessary to incorporate a great deal of flexibility both in terms of the futures expiration rules and the rollover rules.

continuous futures

Continuous futures class.

 

I haven’t done any actual research in quite a while because I’ve been preoccupied with coding but I’ll be back soon! I’ve been accumulating a giant backlog of ideas that are waiting to be tested. Hopefully my new tools will be good enough to give some special insights. In any case, I can’t wait to get started.

Comments (9)

  • scott hodson says:

    Pretty cool. I’ve been doing something similar but not gotten as far as you. I’m a C#/SQL/WPF/ASP.NET guy myself. I get data from Kinetick and ZenFire through Ninjatrader via a Ninjascript strategy that writes bar data via ADO into SQL so I don’t have to deal with files, at least for prices. I have written a hand full of SQL functions to act as indicators so I can backtest just with SQL statements. It has it’s limits, like trying to compare different values in a dataseries can be difficult in a SQL result set, but it’s something. I have a set of pre-calculated indicator values stored in tables as well so I can say like

    select *
    from price [p]
    inner join sma50 [s] on s.instrument = p.instrument
    where p.instrument = ‘aapl’
    and p.date = ‘2013-11-26’

    if I want the AAPL price and SMA(50) value on Nov 26th.

    This makes the database very large but querying and quick-modeling strategies in SQL is very rich. Stored procs can do some heavy lifting too but if I want to keep going down this path I’m going to have to write a .NET client for this.

    • qusma says:

      Hi Scott,

      Personally I don’t like the approach of saving indicators in the db…especially in the case of things like moving averages that are extremely “cheap” to compute on the fly. Of course you gain the possibility of testing models with SQL, but it seems like a very limited approach. Couldn’t you just pull the data into a pre-made excel sheet that calculates the indicators?

      How do you get stats when you test a strat with SQL?

      • scott hodson says:

        I like it for some things, but, yes, it is a waste of storage. I’m thinking of moving away from storing indicator values and just leveraging my C# skills more for computation of indicator values. I just like having the SQL-heavy approach available if i want to model an idea quickly (just SQL). Usually I just write it in Ninjatrader but it’s so verbose compared to, say, EasyLanguage or other domain-specific trade strategy modeling languages.

        My SQL backtest stats are all computed using SQL functions. Some I created, some are built-in, others I got from the interwebs.

  • Kazai Mazai says:

    Hello.

    Great job!

    In fact I’m halfway in the same direction… but.. I always get distracted by existing products. Then as usual – blame myself for inventing bicycles, giving up, trying to use some 3rd party software and facing architecture dead end or lack of some essential feature. But never give up hope to find something suitable.

    Recently thought of OpenQuant. But it is about either pay a lot or use 4-years old version…
    By the way, some colleagues had very poor experience with OQ and its bugs that occured in realtime trading…

    Not so long ago I’ve come across RghtEdge. It’s seems quite interesting and price is moderate… Going to try it after some urgent bugfixes in my automated trading bicycle.

    Could you provide some information about your experience of trying 3rd party automated trading/backtesting/researching software? Maybe some pros and cons.

    What did play the major role in your final decision to develop your own enormous framework ?

    Respectfully,
    Kazakov S. aka Kazai Mazai

    • qusma says:

      Hi Kazai,

      I have tried almost every backtesting platform available to “retail” traders and in the end I think they are all very similar. The two main problems are lack of flexibility and very limited possibilities for research.

      Unlike most solutions out there, I have abandoned the single “strategy” as the basic unit of computation. My own approach goes for maximum flexibility. Indicators, signals, etc. are all stand-alone objects that can be freely plugged in to each other. It also allows for incredible flexibility in terms of meta-strategies, taking into account interactions between signals, other instruments, etc. Going from signal to trade passes through several mechanisms: filtering, trade generation (where signals can be rejected for whatever reason, timing and pricing decisions are made, etc.) and position sizing/risk management. There can be any number of trade generators, position sizers, etc. and each one can operate on multiple instruments, so there’s also the ability to create a 2nd level of meta-strategy combining multiple meta-strategies.

      This approach perfectly fits in with the research side, as well. My approach there is heavily inspired by “Business intelligence” applications: heavily data driven, with a ton of different ways of slicing, dicing, combining, contrasting, correlating, etc. This is only possible because all the indicators, signals, etc. are stand-alone objects instead of being hidden inside a self-contained strategy. When I run a backtest I don’t just get a few pretty charts and aggregated statistics, but ALL the data, all the calculations, all the signals, etc.

  • QUSMA C#/.NET-based trading system now open source - NinjaTrader Programming | Big Mike Trading says:

    […] C#/.NET-based trading system now open source They guy(s) at QUSMA have been working on a trading data management and backtesting system in C#/.NET and they recently announce they are open-sourcing it. Please register to view the post […]

  • The QUSMA Data Management System Is Now Open Source says:

    […] Creating a Data Management System […]


Leave a Reply

Your email address will not be published. Required fields are marked *