Dynamic linking costs two cycles: kristiannielsen

kristiannielsen

Dynamic linking costs two cycles

Oct 03, 2010 18:55

My blog has moved! Read it in the new location here.

It turns out that the overhead of dynamic linking on Linux amd64 is 2 CPU cycles per cross-module call. I usually take forever to get to the point in my writing, so I thought I would change this for once :-)

In MySQL, there has been a historical tendency to favour static linking, in part because to avoid the overhead (in execution efficiency) associated with dynamic linking. However, on modern systems there are also very serious drawbacks when using static linking.
The particular issue that inspired this article is that I was working on MWL#74, building a proper shared libmysqld.so library for the MariaDB embedded server. The lack of a proper libmysqld.so in MySQL and MariaDB has caused no end of grief for packaging Amarok for the various Linux distributions. My patch increases the amount of dynamic linking (in a default build), so I did a quick test to get an idea of the overhead of this.
ELF dynamic linking

The overhead comes from the way dynamic linking works in ELF-based systems like Linux (and many other POSIX-like operating systems). Code in shared libraries must be compiled to be position-independent, achieved with the -fPIC compiler switch. This allows the loader to simply mmap() the image of a shared library into the process address space at whatever free memory space is available, and the code can run without any need for the loader to do any kind of relocations of the code. For a much more detailed explanation see for example this article.

When generating position-independent code for a function call into another shared object, the compiler cannot generate a simple absolute call instruction, as the destination address is not known until run-time. Instead, the call goes via an indirect jump is generated, fetching the destination address from a table called the PLT, short for Procedure Linkage Table. For example:

callq 0x400680 ) ... : jmpq *0x200582(%rip)
The indirect jump resolves at runtime into the address of the real function to be called, so that is the overhead of the call when using dynamic linking: one indirect jump instruction.
Micro-benchmarking

To measure this one-instruction overhead in terms of execution time, I used the following code:
for (i= 0; i < count; i++) v= mylib_myfunc(v);
The function mylib_myfunc() is placed in a library, with the following code:
int mylib_myfunc(int v) {return v+1;}
I tested this with both static and dynamic linking on a Core 2 Duo 2.4 GHz machine running Linux amd64. Here are the results from running the loop for 1,000,000,000 (one billion) operations:

total time (sec.)CPU cycles/iteration Static linking2.546 Dynamic linking3.388
So that is the two CPU cycles of overhead per call that I referred to at the start of this post.

Incidentally, if you try stepping through the call with a debugger, you will see a much larger overhead for the very first call. Do not be fooled by this, this is just because the loader fills in the PLT lazily, computing the correct address of the destination only on the first time the call is made (so addresses of functions that are never called by a process need never be calculated). See above-referenced article for more details.

(Note that this is for 64-bit amd64. For 32-bit x86, the mechanism is similar, but the actual overhead may be somewhat larger, since that architecture lacks program-counter-relative addressing and so must reserve one register %ebx (out of its already quite limited register bank) for this purpose. I did not measure the 32-bit case, I think it is of little interest nowadays for high-performance MySQL or MariaDB deployments (and the overhead of function calls on x86 32-bit is significantly higher anyway, dynamic linking or not, due to the need to push and pop all arguments to/from the stack)).
Conclusion

Two cycles per call is, in my opinion, a very modest overhead. It is hard to imagine high-performance code where this will have a real-life noticeable effect. Modern systems rely heavily on dynamic linking, and static linking is nowadays causing much more problems that it solves. And I think it is also time to put the efficiency argument for static linking to rest.

freesoftware, mariadb, mysql, performance, programming