Monday, December 21, 2009

Qpidc trampoline

I know what to do during long three weeks of my days off. Earlier this day I started to type C wrapper for libqpidclient -- the client part of AMQP from Qpidc project. To type, exactly. There's so much copy-and-paste boring work, no room for brains. I suspect, it will take quite a lot of time to implement every class from Qpidc horde...


At the end I want to have something ready to bind in Lisp. May be, other languages also can benefit from this work.

Thursday, December 10, 2009

Throughput test using small messages

Previously I found Lisp bindings performance in throughput test is not good enough when message size is small. This performance drawback can be explained be overhead, introduced by high-level wrappers for libzmq functions. At the very top of bindings there are more o less lispish functions, which works with C data structures wrapped in CLOS instances, checks low-level functions for return values and raises error conditions, when needed. Also, CFFI adds one more wrapping level itself. So, bindings contain up to 4 levels of dummy calls from high- to low-level.

Obviously, under certain conditions throughput peformance is literally equal to how many times code can execute zmq_send(). If you need to break through tons of wrappings, throughput will sufffer a lot due to wrappers.

Fortunately, such behaviour (many calls to zmq_send() in blocking mode, small message size) is not typical. But even in this situation it is possible to use low level libzmq functions and manage all the stuff by hands.

I did another test for message sizes of 1, 2, 4, 8, 16, 32 and 64 bytes. Messages with sizes below 30 are Very Small Messages (VSM), and zeromq library embedds VSM space directly into message header. Result of testing wasn't stable, so all tests were run 10 times.

Here's difference between "normal" and "optimized" versions of bindings and test sources. "Optimized" means there's (declaim (optimize speed (safety 1))) in cl-zmq sources, also both local-thr.lisp and remote-thr.lisp use CFFIized functions directly.

ERRATA: There's bug in pictures 1 and 3, second "Optimized Lisp ranges" should be "Normal Lisp ranges". Unfortunately, I lost original data and can't regenerate pictures easily.


If you use VSM messages (30 bytes or less) in your software, which requires maximum throughput, it makes sense to write your software in low-level C-style... Perhaps, it's better to redesign software to use large messages, but it's up to you ;)

To compare C and optimized version of Lisp:


As you see, throughput is almost equal with message size of 16+ bytes.

Finally, all the mess: "normal", "optimized" Lisp and C in one place.


Friday, December 4, 2009

Performance.... Performance...



Just did some brief latency and throughput testing.

I've used two 64-bit machines, connected by 1Gb switchless network. Machines are:
1. Core2Duo T9600 2.8GHz running Fedora 12
2. Core2Duo T9400 2.53GHz running RHEL-5.4 with realtime kernel.

Here's latencies compared to C:


Latencies are fine. The throughput in term of messages per second:





Not bad also, except small messages. Throughput in megabits per second:

Small messages is problem. Network latency and context switches cost can't compensate overhead added by bindings. Fortunately, I haven't even thought about optimization yet, so there's large field to improve :)

Thursday, December 3, 2009

Wednesday, December 2, 2009

Common Lisp is fast for real business.

I'm reading ØMQ Whitepapers, especially bits related to performance measurements of Java and Python bindings. Java is something 20% slower comparing to C, and Python works almost 2.5x times slower (very expected).


SBCL kicks asses, I got the same latencies over 1Gb network, like in C. Thanks to CFFI library and decent compiler in SBCL.



Currently I have conversation with Martin Sustrik (FastMQ CEO and architector of ØMQ) about merging Common Lisp support into main tree.