14 June 2011 By Kirk Wylie
One of the most repeated comments that people have made regarding evaluating the OpenGamma Platform is that out of the box it doesn't do a whole heck of a lot. The problem and solution to that? Data, of course.
tl;dr: We're going to be shipping sample data for evaluation in a maintenance release on the 0.8.x codeline.
Any front-office or risk system is extraordinarily data-heavy, and heavy across a number of dimensions.
And all that? All that is before you even think about the lifeblood of modern quantitative finance: the market data itself. Getting market data, either live or historical, is pretty much a requirement for anything you'd want to use the OpenGamma Engine for.
And we're not giving any of it to you at the moment.
Sounds pretty daft, right?
We've had a number of suggestions on how we could have solved this problem; let me try to address those.
First, internally, we get the majority of our security reference data from Bloomberg (we're part of their developer program), and live and historical market data from our participation in the Bloomberg, Thomson-Reuters, and ACTIV Financial developer programs. So we have databases here with all the data you'd need to run through the same QA process we do daily. First suggestion we got is that we just export parts of our internal databases as "sample" data.
The problem is that we don't actually own that data. Consider the simplest thing: reference data on American listed equities, where Bloomberg doesn't own the data themselves. They still own the way they've organized that into a database, and thus they have rights on the data in our development, test, and QA database instances, which was sourced through that connection. We can only do this if we can source the data (and prove we did) from a provider that permits this type of redistribution (see below; we're trying to do just that), and it never touches any of our other processes simulating a bank or hedge fund's data infrastructure.
We've also had people suggest we should just put some random data in for market data. And that's a great suggestion, but the simple fact is that it won't work; if the data is far outside "real" market parameters, the modern quantitative models used simply won't fit at all.
You actually can produce random data that will fit the models, but you have to back out from the models and make sure that your randomization process will move in consistent ways so that models still fit properly. Basically, you have to produce "random" data that still satisfies certain statistical parameters.
But none of this makes sense until you start to consider the roles of data sources and data providers in this market, and realize why people pay thousands of dollars a month for a Bloomberg or Reuters terminal.
The best way to understand what's going on here is to rethink the role of exchanges and how they make money.
Today, for most exchanges, actually processing trades has such razor-thin margins that in many cases it is a loss-making process. Simply put, the days of an exchange being able to make a significant profit from every trade done through that exchange is long gone.
So what do they make money on? The data. When you get stock market data from Bloomberg or Reuters or Yahoo Finance or anywhere, that data is actually owned by the exchange. Your data source (Bloomberg, Reuters, Yahoo) acts as a distributor for the data provider (the exchange). The data provider may allow the data source to give that away free of charge, it may charge the data source for that distribution (if it makes the data source money through advertising or something similar), or it may require that you have your own sell-through relationship directly with the data provider.
But that data is worth money, and the more up-to-date it is, the more valuable it is. The lower the resolution of the data (down to tick-by-tick in the Level 2 feeds), and the faster it's delivered, the more they can charge.
If we started giving you even a replay of an out-of-date tickstream, you can be sure they'd have their lawyers on us faster than you could say "Intellectual Property Violation."
If you're an exchange these days, data is your business, and therefore worth money. But that's only a small section of the quoted objects out there.
What do you need if you want to build yield curves beyond the next couple of years? Typically, IR Swap or OIS rates. Those aren't exchange traded (although they're soon to be cleared and/or settled), it's an OTC market. FX option trading? You need a good FX volatility surface. Still OTC. Fixed income optionality? While interest rate future options and bond future options are exchange traded, swaption volatilities are OTC yet again. Want to build credit curves? Welcome to OTC Credit Default Swap country.
These markets almost all trade in a very similar way: you have large "flow monsters" sitting in the middle of the market (think: Goldman Sachs, Deutsche Bank, Morgan Stanley, Bank of America Merrill Lynch), and trading with almost all the other counterparties. They're acting as very traditional market makers in quoted markets, and making money on small spreads on each trade. It's a volume business, where more trades in general means more profit.
Now let's say you're a hedge fund. You trade a lot of swaps. But you don't trade anywhere near as many as one of the large counterparties; you're one of hundreds of hedge funds they're trading swaps against. They now have a view of the market that you simply can't get by virtue of being in the middle of so many trades, and they can sharpen every single trade they make by clever use of that data.
And so they don't give it to you. They'll give you enough to enable you to trade, but not so much that you can compete with them. For example, you can get Bloomberg-provided composite par swap rates provided by the brokers, but not the full rates available to the brokers themselves. They're so protective of this data that Markit Partners has a special service just for sell-side institutions to get monthly consensus that internal marks aren't too far off market.
For the large sell-side institutions in these markets, the data isn't just money, it's the machine that drives their entire profit base.
Even if we had this data, do you think they'd like it if we gave it away?
So now that you know how we (as both a vendor and an industry) have gotten into this mess, what is OpenGamma doing to help you go from download-to-live-risk as fast as possible? How are we helping you evaluate the platform on your own without having to contact us?
It's too late to include this in 0.8.0 (we're code complete and in final release validation), but we're going to get this out as a maintenance release on the 0.8 codebase so you don't have to wait for 0.9.
So why in the world didn't we do that in the first place? We hadn't factored in the number of you who have told us that you're not allowed to play around with new technologies at work, or aren't allowed to hook up to any "official" data services without prior approval. So we know there are a lot of you who are doing your OpenGamma evaluations at home, or in stealth mode when nobody's looking over your shoulder. The moment you have to ask your friendly RMDS administrator for an account for a new experimental technology the gig's up.
We hear you, and we've all been there ourselves trying to evaluate new technologies without a manager wondering why you're looking at something new that's unsanctioned.
Expect the double-dot on 0.8.x with the sample data in a few weeks!

Kirk is OpenGamma's Chief Executive Officer and Chief Technology Officer.
Prior to co-founding OpenGamma, Kirk was the head of software architecture for the Front Office Technology division of KBC Financial Products. While there, Kirk was responsible for developing integration and interoperability solutions across KBCFP's disparate lines of business (including Convertible Bonds, Equity Derivatives, Structured Credit and Fund Derivatives).