Previously I found Lisp bindings performance in throughput test is not good enough when message size is small. This performance drawback can be explained be overhead, introduced by high-level wrappers for libzmq functions. At the very top of bindings there are more o less lispish functions, which works with C data structures wrapped in CLOS instances, checks low-level functions for return values and raises error conditions, when needed. Also, CFFI adds one more wrapping level itself. So, bindings contain up to 4 levels of dummy calls from high- to low-level.
Obviously, under certain conditions throughput peformance is literally equal to how many times code can execute zmq_send(). If you need to break through tons of wrappings, throughput will sufffer a lot due to wrappers.
Fortunately, such behaviour (many calls to zmq_send() in blocking mode, small message size) is not typical. But even in this situation it is possible to use low level libzmq functions and manage all the stuff by hands.
I did another test for message sizes of 1, 2, 4, 8, 16, 32 and 64 bytes. Messages with sizes below 30 are Very Small Messages (VSM), and zeromq library embedds VSM space directly into message header. Result of testing wasn't stable, so all tests were run 10 times.
Here's difference between "normal" and "optimized" versions of bindings and test sources. "Optimized" means there's (declaim (optimize speed (safety 1))) in cl-zmq sources, also both local-thr.lisp and remote-thr.lisp use CFFIized functions directly.
ERRATA: There's bug in pictures 1 and 3, second "Optimized Lisp ranges" should be "Normal Lisp ranges". Unfortunately, I lost original data and can't regenerate pictures easily.
If you use VSM messages (30 bytes or less) in your software, which requires maximum throughput, it makes sense to write your software in low-level C-style... Perhaps, it's better to redesign software to use large messages, but it's up to you ;)
To compare C and optimized version of Lisp:
As you see, throughput is almost equal with message size of 16+ bytes.
Finally, all the mess: "normal", "optimized" Lisp and C in one place.
Thursday, December 10, 2009
Subscribe to:
Post Comments (Atom)




3 comments:
From a personal experience I've learned that optimising messages up to 32 bytes of length is pretty tricky. The result tend to differ on different hardware as such minutiae as alignment etc. come into play :(
Have you tried inlining the CFFI functions as well?
No, I haven't. I don't think somebody in real life will write such app, which will send *a lot* of small messages in blocking mode. This is how network server should not be ever written -- context switches overhead is still significant on modern h/w.
Post a Comment