hacking NASDAQ @ 500 FPS: round trip -10us

Saturday, 23 January 2010

round trip -10us

As we found in the previous post our hypothesis is, most of the latency is in the switch from softirq/tasklet to the callee context aka a scheduler problem. So if this is correct, a polling recv instead of blocking should give nice speedups, with of course higher cpu usage, meaning your HVAC and power bill goes up.

TCP 128B A->B->A round trip latency. blocking recv() x2

TCP 128B A->B-A round trip latency. polling recv() x2

... and wow, what a difference with just a few lines of code! and confirms we need to hack on the linux scheduler. Final speedup being around 10,000ns+ so 5,000ns on each side (A recv, B recv) with a very nice, small stddev - woot.

The conventional wisdom is "polling is bad" translating to bad programmer, where your meant to do something fancy/smart as the latency is small. If small means 100us, its a reasonable assumption however 100us isnt small in HFT. Thus for low latency environments, where we are counting nanoseconds, and theres more cycles/core than you can shake a stick at, you really should be using non-blocking, polling socket loops. Maybe ditch traditional interrupt based device drivers too :)

... or hack on the kernel scheduler lol

1 comment:

Anonymous22 April 2010 at 07:28
What is the code for polling recv?

O_NONBLOCK flag for socket and recv(3) in the loop?
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.

hacking NASDAQ @ 500 FPS

Saturday, 23 January 2010

round trip -10us

1 comment:

fmadio 10G packet capture

Blog Archive

About Me