TCP 128B round trip total
NIC Driver time
IP processing time
TCP processing time
Kernel -> User switch
Which is the expected result, TCP processing time becomes the bottleneck, but what is it actually doing? Digging down a bit further we get:
TCP top level processing + prequeue
Which is rather surprising, it appears the top level processing in tcp_v4_rcv() is where the bulk of the time goes! Not what you expect when tcp_rcv_established() is the main work horse. However.. its gets stranger.
TCP before prequeue -> tcp_rcv_establish()
Turns out most of the time goes somewhere between pushing the packet onto the tcp prequeue and actually processing it in tcp_rcv_established(). Not sure whats going on there, but surprisingly its where all the action is.