In my previous entry on AtomicInteger I showed timings on my single CPU ThinkPad which has a 1.7 GHz Pentium M. Today I investigated the performance on my iMac which has a 1.83 GHz Core Duo and like the ThinkPad runs Windows XP SP2.
I've also modified AtomicInteger to take advantage of Interlocked.Increment, Decrement and Exchange. Instead of following the reference implementation which builds everything on top of the compareAndSet operation.
Running the same test produces the following numbers:
|
Time (ms) |
JDK 1.5 HotSpot Server VM |
2282 |
JDK 1.5 HotSpot Client VM |
2922 |
IKVM |
2406 |
C# |
1922 |
Note that the test is significantly slower than on the Pentium M. This is most likely because of the bus locking overhead.
It gets even more interesting when we modify the test to have two threads that concurrently increment the same field:
|
Time (ms) |
JDK 1.5 HotSpot Server VM |
22563 |
JDK 1.5 HotSpot Client VM |
25109 |
IKVM |
15016 |
C# |
12000 |
The first thing to note is the fact that the test now takes an order of magnitude more time. Presumably, this is caused by the communication overhead between the two cores.
The second is that here IKVM significantly outperforms HotSpot Server. The reason for this is simple: IKVM uses Interlocked.Increment which ultimately uses the XADD x86 instruction that atomically performs the load/increment/store. By contrast, the AtomicInteger reference implementation uses the following loop to increment the value:
public final int incrementAndGet() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return next;
}
}
When another thread updates the field between the get and compareAndSet, the compareAndSet will fail and the loop will have to run for another iteration. When multiple CPUs are continuously trying to increment the value this is likely to happen relatively frequently.
Standard disclaimer: As always these microbenchmarks are designed to magnify a particular effect. Be careful about drawing overly large conclusions from them.