Monday, 24 December 2012

SVD of HFT

Cant believe it, first post on this blog was exactly 3 years ago wow... seems like such a long time. 3 years ago I couldn`t tell you the difference between buy, sell, bid, ask, long or short yet some how the projects and code I`ve written during this time now checks or trades, millions, tens of millions, hundreds of millions and sometimes billions of dollars every single day. The depressing part? all the time and effort in finance has not translated into George Washingtion`s. in my bank account.... bugger.


So when your imagination and reality are so divergent its a good  times to step back and work out what your missing and not even consider the how or where of the equation - just focus on the what. Lets take a high level look at the game from the top down.

HFT game has 5 major components

1) Speed / Deterministic / Technology
2) Commissions / cost structure
3) Influence with the exchange
4) Capital
5) Strategy

Speed / Deterministic / Technology 

This is the classic arms race, bigger faster, better. Its very true technology is a large component of any HF strategy however the trap technologist (like myself) fall into is, they believe technology alone can beat the market. What we dont realize is our technology edge starts and ends at the cable plugged into our box/switch. Theres so many other moving parts that are out of our control that tips the balance way into the red.
In Round 1 any technology edge I had ended when it entered at the brokers pre-trade risk checks let alone entering the exchanges infra. In Round 2 have best in class access to the market but the edge is gone once the packet has left the NIC. Net result, speed/technology is important but you cant win a trade with it alone.

Commissions / cost structure

HF strategies are unique because the per trade profit is very small, which makes it extremely sensitive to the per trade and monthly recurring costs. For example if your trade makes +100 cents on a win and -100cents on a loss with say 100 shares eg. 1cent/share. Say your broker, the exchange, the SEC takes 25 cents or 1/4 of the profit.

This means you need a 15% edge (65% of the trades profitable) just to break even.

Yet if your total fees is 5 cents vs 25 cents you need only a 3% edge (53%) of trades to break even.

A delta of 12% profitable trades, means you need a really good signal, which is extremely difficult. Thats just to break even!

Influence with the exchange

This one is probably a bit surprising to many as in theory an exchange is equal access for all participants. Thats the theory, yet in practice an exchange has finite manpower and equipment which results in the dude trading $100 / day vs the dude trading $100MM / day gets vastly different levels of service.

Aside from man-power theres always political / boys club games and really is a catch-22 situation. To join the club you need to trade massive volumes, but to trade (profitably) massive volumes you need A class access into the matching engine.

Why is it important? Your technology edge ends at the BGP endpoint (your router/switch) with a shit ton of other crap between that endpoint and a trade thats in the exchanges control.

Capital

For the HF game capital is usually not a problem. The true believers are flat at the end of the day with no overnight / borrowing costs. Intraday your capital just limits the sizes of your position. The one thing capitial does give you is, market access and influence with an exchange. Any tier one prime broker will want a few million dollars deposit before they will even talk to you. Next step down requires a few $100K and so on. For now capital is not an issue.

Strategy

And finally strategy, this is the true leveler. If you have excellent strategy then all the above can become noise. Yet the inverse is not true - you can have the best technology in the world but it will never make a bad strategy good.

This is by far my biggest weakness and the skill I need to develop further, but what exactly needs to be learnt? The what is not an easy question. The first 6months of this expedition was largely wasted doing essentially some form of technical analysis - if signal X crossed signal Y then buy/sell kind of strategy development. Im sure it works for someone but thats not the HF game to play.

So what`s missing? its how to mine the data and what to mine to gain a statistical edge. The how to mine is an easy technology problem, the what to mine is not. Im convinced machine learning is the way forward as its the next logical step beyond stat arb. After all isnt stat arb a form of the most basic machine learning? guess it all depends on your definite of learning.

Whats next? Bayes, Markov, Kernels, Boosting and Kaggle mmmm Kaggle lol

Sunday, 2 December 2012

Top 5 reasons my (your) backtest is bullshit

Alot of the time when building/backtesting strategies you get a result thats just too-good-to-be-true. When this shows up the best way test your brand spanking new million *cough* dollar *cough* strategy a day is to

1) flip the sides/signals - if possible e..g not capturing bid/ask spread..
2) random() is your friend, replace random() with your secret sauce signal gen and see how it goes.

My preference is to use a random() strategy or an "always trade" strategy as its less intrusive. If it generates million dollar backtested profits... you know its complete and utter bullshit. 

So in dave letternman top 10 style..




top 5 reasons my (your) backtested results are bullshit


number 5...  your cost structures are wrong

Part of the shock and horror of HF trading is the realization your being screwed on all angles in all holes by everyone. When ppl think about trading its all about the (sell price - buy price) == money in your bank account. Sadly because the gross margins are so low & with so many fees and taxs along the way, money in the bank isnt always that good..

Be sure to include

1) exchange fees / rebates
2) SEC fees/taxs
2) brokers fees
3) your (not) friendly capitial gains tax man
4) [optional] exchange rates

Got totally screwed by 4) this year - Japanese Yen / USD really hurt

number 4 ....   You have built a time machine, cool can I buy one?

Need to seriously change my trade idea scratchpad / research framework soon to completely eliminate this. For now using essentially arrays of data, with each index being a time/event sample. Problem is its all too easy to do a if (Price[t] > Price[t+1] ) kind of compare which, takes a peek into the future and can heavily bias your results. The more painful and subtle variation is using multiple datasets/signals that are not perfectly synchronized e.g. A signal is one time/event ahead of the others.

The only way around this is to make this impossible, say strategy has no access to the array/precalced data or be very very careful. My approach for the moment is the latter. When I started would make this mistake more often than I want to admit but rarely screw it up these days - too many hours lost chasing wtf is this random() strategy profitable


number 3.... EventSeries != TimeSeries

Most exchanges will pack multiple market data events into a single UDP packet making the market data event stream appear as a time series, when it is not. Simple passive example strategy would be, if BBO price level has less than X orders/qty then replace away/deeper into the book. If you equate the event stream for a time series then the backtest will always avoid deep, aggressive trades. Why? because when each order gets wacked by some massive aggressive trade, each and every passive order that gets filled, an individual market data update is generated.

Now. if you treat each of market data update as a discrete event, your strat/backtest can see each update individually until it reaches the minimum X level and you replace away. aka a perfect 0 latency system.

Whats wrong? In real-time all the orders get wacked at the same time including yours! with a group of market data events actually being an atomic operation.


number 2.... resetting position at start-of-day

when doing multi-day backtesting its easy to forget the strategy is still open at the end of the day. e.g. strategy exit condition has not triggered as its usually in a loss state. Simplest example of this is, for every 1minute buy SPY aggressively then recycle it passively at +X ticks and start the day with no position. What happens is you have 100% winning trades! Why? because when the market moves against you the strategy just sits there drifting away from the bbo. (in a 10ft pool of red) waiting for that passive exit. Then when the market closes, your backtest resets the strat for the next day that loss is simply ignored, disappeared and never existed. If there were a wizard of oz, would certainly ask for such a button.

Always update the position/pnl after every trade - not just at the exit, and never reset between days.


number 1.... aggressively buying at the bid and aggressively selling at the ask.

Its so easy.. just aggressively buy at the bid, and aggressively sell at the ask for every tick, on every instrument for every exchange in the world and you, yourself, can bail Greece, Spain and probably half of the world no worries.

the problem is, in real life your aggressively buying at the *ask* aka paying the spread to fill now. I still screw this up all the time when testing out an idea. Usually by fiddling with different styles to enter/exit a trade then forgetting to revert the code back and preso.... $1M / day in pnl.

--------

 By now your either shaking your head saying how badly I suck, or quietly nodding about those indiscretions you never told anyone, or laughing your ass off saying yeah been there done that, will never do again.

oh the shame... lol