# Thursday, 31 January 2008
How to Disassemble an AtomicReferenceFieldUpdater

It's been a while since I've done an in depth investigation of a microbenchmark and the recent work I did on AtomicReferenceFieldUpdater to make it work in partial trust also has a nice performance impact. So let's investigate that.

The Microbenchmark

import java.util.concurrent.atomic.*;

class Test {
  volatile Object field;

  public static void main(String[] args) {
    AtomicReferenceFieldUpdater upd =
      AtomicReferenceFieldUpdater.newUpdater(Test.class, Object.class, "field");
    Test obj = new Test();
    for (int j = 0; j < 5; j++) {
      long start = System.currentTimeMillis();
      for (int i = 0; i < 10000000; i++)
        upd.compareAndSet(obj, null, null);
      long end = System.currentTimeMillis();
      System.out.println(end - start);
    }
  }
}

The Results

       IKVM 0.34      IKVM 0.36      IKVM 0.37      JDK 1.6
.NET 1.1 36808 41453    
.NET 2.0 / x86       75647 5776 561 321
.NET 2.0 / x64   5081 512 245

The Differences

The first thing that jumps out is that the IKVM 0.34 results show that .NET 2.0 reflection is much slower than .NET 1.1 reflection. On IKVM 0.36 the reflection implementation changed to take advantage of DynamicMethod when running on .NET 2.0, so there we see a big improvement in performance when running on .NET 2.0.

IKVM 0.37 has the new AtomicReferenceFieldUpdate optimization that no longer uses reflection (if it can figure out at compile time what to do), this again yields a big performance improvement.

Finally, HotSpot manages to beat IKVM be a factor of two. There is no difference between HotSpot client and server modes for this benchmark (on JDK 1.6).

The Compiler

Let's look at some C# pseudo code that shows what ikvmc 0.37 generates for the above benchmark:

using java.util.concurrent.atomic;
using System.Threading;

class Test {
  volatile object field;

  private sealed class __ARFU_fieldLjava/lang/Object; : AtomicReferenceFieldUpdater {
    public override bool compareAndSet(object obj, object expect, object update) {
      return expect == Interlocked.CompareExchange(ref ((Test)obj).field,
                                                   (object)update, (object)expect);
    }
    // ...other methods omitted...
  }

  static void main(string[] args) {
    AtomicReferenceFieldUpdater udp = new __ARFU_fieldLjava/lang/Object;();
    // ...rest of method omitted...
  }
}

The bytecode compiler only does this optimization if the arguments to newUpdater are constants and match up with a volatile instance reference field in the current class.

The reason this optimization only first showed up in IKVM 0.37 is that it requires the generic version of Interlocked.CompareExchange. In this particular example the non-generic version would have worked, but in the real world nearly all uses of AtomicReferenceFieldUpdater are on fields that have a more specific type than Object.

The Assembly

So why is HotSpot twice as fast? I modified the test slightly to make the generated assembly code easier to read by making it an infinite loop. Here's the x64 code for the loop:

00000000028C2690   mov          r11,qword ptr [r8+10h]
00000000028C2694   mov          r10,1026DD08h
00000000028C269E   cmp          r11,r10
00000000028C26A1   jne          00000000028C2773
00000000028C26A7   mov          r10,qword ptr [r8+20h]
00000000028C26AB   test         r10,r10
00000000028C26AE   jne          00000000028C273B
00000000028C26B4   mov          r10,qword ptr [r8+28h]
00000000028C26B8   mov          r11,r9
00000000028C26BB   add          r11,r10
00000000028C26BE   xor          eax,eax
00000000028C26C0   xor          r10d,r10d
00000000028C26C3   lock cmpxchg qword ptr [r11],r10
00000000028C26C8   sete         r12b
00000000028C26CC   movzx        r12d,r12b
00000000028C26D0   mov          r10,r11
00000000028C26D3   shr          r10,9
00000000028C26D7   mov          r11,589FF80h
00000000028C26E1   mov          byte ptr [r11+r10],0
00000000028C26E6   test         dword ptr [160000h],eax
00000000028C26EC   jmp          00000000028C2690

HotSpot did it's thing and was able to inline the virtual compareAndSet method. I'm pretty sure that HotSpot doesn't have special support for AtomicReferenceFieldUpdater, but this is simply the normal HotSpot devirtualization optimization at work. The lock cmpxchg instruction is the result of HotSpot having intrinsic support for sun.misc.Unsafe.compareAndSwapObject.

Let's go over the assembly instructions in detail:

00000000028C2690   mov          r11,qword ptr [r8+10h]
00000000028C2694   mov          r10,1026DD08h
00000000028C269E   cmp          r11,r10
00000000028C26A1   jne          00000000028C2773

This looks like a HotSpot virtual method inline guard. It's checking to make sure that the object is of the expected type (if it isn't, the inlined virtual method may not be correct anymore).

00000000028C26A7   mov          r10,qword ptr [r8+20h]
00000000028C26AB   test         r10,r10
00000000028C26AE   jne          00000000028C273B

I'm not sure. Some field in the AtomicReferenceFieldUpdater object is tested for null.

00000000028C26B4   mov          r10,qword ptr [r8+28h]

The offset to the field is loaded from the AtomicReferenceFieldUpdater object.

00000000028C26B8   mov          r11,r9

The passed in object reference is moved from r9 to r11.

00000000028C26BB   add          r11,r10

Add the field offset to the object reference. We now have the address of the memory location we want to update in r11.

00000000028C26BE   xor          eax,eax

Clear rax to represent the passed in null value of the expect argument. I'm not sure why the disassembler shows the register as eax, but this instruction clears the full 64 bit rax register.

00000000028C26C0   xor          r10d,r10d

r10 is cleared and represents the passed in null value of the update argument.

00000000028C26C3   lock cmpxchg qword ptr [r11],r10

The actual interlocked compare and exchange instruction. The qword at memory location r11 is compared with rax and if it matches r10 is written to it. Since I'm on a dual core machine, the lock prefix is applied. Locking the bus is expensive, so HotSpot omits it when running on a single core machine.

00000000028C26C8   sete         r12b
00000000028C26CC   movzx        r12d,r12b

The cmpxchg instruction sets the zero flag if it was successful. These two instruction copy the zero flag into the r12 register (it is set to 0 or 255 to represent either false or true). Since the result isn't actually used in this case, this could have been optimized away.

00000000028C26D0   mov          r10,r11
00000000028C26D3   shr          r10,9
00000000028C26D7   mov          r11,589FF80h
00000000028C26E1   mov          byte ptr [r11+r10],0

This is a little interesting. It takes the address of the field that was just (potentially) updated and shifts it to the right by 9 bits and uses that value to index a static table and clear the corresponding byte. This is a GC write barrier. The GC consults the table (known as a card table) to know what objects in older generations it needs to scan when doing a GC of a younger generation.

00000000028C26E6   test         dword ptr [160000h],eax

This seemingly useless test is part of a mechanism used by the VM to suspend the thread at this instruction (a safepoint). When the VM wants to suspend all threads (for a GC) it unmaps the safepoint polling memory page (in this case at 0x160000) and waits for all threads to suspend. Each thread running compiled Java code will eventually run this instruction and cause a page fault, inside the page fault handler it is detected that a safepoint thread suspend is requested and the thread calls the VM to suspend itself.

00000000028C26EC   jmp          00000000028C2690

Branch to the top and start over again.

The Conclusion

The .NET Framework JIT doesn't inline virtual methods and Interlocked.CompareExchange is not a JIT instrisic, so there the story is pretty straightforward. Each loop iteration calls Interlocked.CompareExchange which in turn calls the GC write barrier function. This is why HotSpot is able to beat IKVM 0.37 by a factor of two.

Of course, when you're coding in C# you can write the microbenchmark to call Interlocked.CompareExchange directly:

using System;
using System.Threading;

class Test {
  volatile object field;

  static void Main(string[] args) {
    Test obj = new Test();
    for (int j = 0; j < 5; j++) {
      int start = Environment.TickCount;
      for (int i = 0; i < 10000000; i++)
        Interlocked.CompareExchange(ref obj.field, null, null);
      int end = Environment.TickCount;
      Console.WriteLine(end - start);
    }
  }
}

This runs in 265 milliseconds which goes to show that in this case all the fancy footwork that HotSpot does can almost be matched simply by having by ref argument passing in your language. Of course, the CLR JIT isn't perfect. When you change the field type to string the running time increases to 436 milliseconds because the invocation of a generic method goes through a stub that makes sure that the method instantiation exists. Here it would probably pay to to teach the JIT about the generic methods in System.Threading.Interlocked.

Thursday, 31 January 2008 10:50:35 (W. Europe Standard Time, UTC+01:00)  #    Comments [3]
# Friday, 25 January 2008
IKVM 0.36 Update 1 Release Candidate 2

One more bug fix.

Changes:

  • Changed version to 0.36.0.7.
  • Fix bug in DynamicMethod based serialization for fields typed as ghost interfaces.

Binaries available here: ikvmbin-0.36.0.7.zip
Sources (+ binaries):ikvm-0.36.0.7.zip

The external sources are the same as the previous rc.

Friday, 25 January 2008 07:38:42 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Wednesday, 23 January 2008
How To Get an Explicit Interface Implementation Method

Suppose you have a Type that you know implements IEnumerable, how do you get the GetEnumerator method?

Both the developer(s) at Microsoft and the Mono developer(s) that wrote System.Xml.Serialization managed to find the same wrong answer to this question. Here's the code from Mono, but Microsoft does somethig very similar:

MethodInfo met = type.GetMethod ("GetEnumerator", Type.EmptyTypes);
if (met == null) {
  // get private implemenation
  met = type.GetMethod ("System.Collections.IEnumerable.GetEnumerator",
                        BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance,
                        null, Type.EmptyTypes, null);
}

The reason this is wrong is that the name of the private method is an implementation detail of the compiler that was used to compile the code. C# happens to name the private method this way, but other languages may not.

For example, try the following VB webservice:

<%@ WebService Language="VB" Class="Service1" %>

Imports System.Collections
Imports System.Web.Services

Public Class Service1
  Inherits WebService

  <WebMethod()> Public Function HelloWorld() As Frob
    Return New Frob()
  End Function
End Class

Public Class Frob
  Implements IEnumerable

  ' Note that this method is private
  Private Function GetEnumerator() As IEnumerator Implements IEnumerable.GetEnumerator
    Return "frob".GetEnumerator()
  End Function

  Public Sub Add(ByVal obj As Object)
    Throw New NotSupportedException()
  End Sub
End Class

If you run this webservice on .NET, you'll an exception inside System.Xml.Serialization.TypeScope.GetTypeDesc(), because it expects to find a public GetEnumerator method or a private System.Collections.IEnumerable.GetEnumerator method. However, VB allows you to pick the method name (in this case simply GetEnumerator). If you make the GetEnumerator method public, the webservice will work.

The Right Way

Here's the right way to get the method:

MethodInfo getEnumerator = typeof(IEnumerable).GetMethod("GetEnumerator", Type.EmptyTypes);
InterfaceMapping map = type.GetInterfaceMap(typeof(IEnumerable));
int index = Array.IndexOf(map.InterfaceMethods, getEnumerator);
MethodInfo meth = map.TargetMethods[index];

Meta Question

Of course, the real question remains unanswered. Why do we need the MethodInfo in the first place? Wouldn't Xml serialization be better off by simply using the IEnumerable interface?

Wednesday, 23 January 2008 07:37:35 (W. Europe Standard Time, UTC+01:00)  #    Comments [8]
# Monday, 14 January 2008
IKVM 0.36 Update 1 Release Candidate 1

As I've said when I released 0.36, this is a version that will be maintained for a while because it is the last version to support .NET 1.1. This means that I will strive to release updates relatively soon after bugs are reported (in the supported areas). This is the first of such update releases.

Changes:

  • Changed version to 0.36.0.6.
  • Fix for reflection bug on .NET generic nested types that caused this.
  • Fix for bug #1865922.
  • java.awt.image.Raster fix.

Binaries available here: ikvmbin-0.36.0.6.zip
Sources (+ binaries):ikvm-0.36.0.6.zip

The external sources are the same as the ones with the first 0.36 release + java.awt.image.Raster patch referenced above.

Monday, 14 January 2008 06:27:59 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Monday, 07 January 2008
New Development Snapshot

The major change is that I dropped support for .NET 1.1 and started taking advantage of .NET 2.0 specific features. The most significant result of that is that it is now possible to run (some) applications in partial trust and that IKVM.OpenJDK.ClassLibrary.dll and IKVM.Runtime.dll can be put into the GAC and called from partially trusted code. Note that if you want to put the binaries in the GAC you'll have to build your own binaries (to get them strong named) as the development snapshot binaries are not strong named.

Changes:

  • .NET 1.1 is no longer supported.
  • Removed .NET 2.0 warnings (except for the "unreachable code" ones).
  • Removed most .NET 1.1 specific code.
  • Removed conditional compilation of .NET 2.0 specific code.
  • Updated FileChannelImpl to use SafeFileHandle and GC.Add|RemoveMemoryPressure.
  • Added GC.KeepAlive to "native" methods of MappedByteBuffer.
  • Improved VFS.
  • The VFS file "lib/security/cacerts" is now dynamically generated from the .NET X509 store.
  • Implemented native methods of java.io.Console (some Win32 only).
  • Implemented support for InternalsVisibleToAttribute.
  • Moved JNI implementation into a separate assembly (IKVM.Runtime.JNI.dll) to make IKVM.Runtime.dll verifiable.
  • Made all "native" method classes internal.
  • Restructured VM <-> Library interface to take advantage of InternalsVisibleTo to remove public methods and reflection usage.
  • Added support for adding constructors in map.xml.
  • Changed <clinit> method to SmartConstructorMethodWrapper to support calling it from map.xml
  • Removed call to unnecessary call to MatchTypes (to support partial trust)
  • Ignore SecurityException in CanonicalizePath.
  • Moved some calls to methods with a LinkDemand (that fails in partial trust) to a separate methods.
  • Added stuff to map.xml to remove the need for reflection in VM / Library bootstrap.
  • Inverted IKVM.Runtime.JNI dependency in stack walking code to avoid needlessly loading the JNI assembly.
  • Fixed bug that caused nested .NET generic types to report a declaring type.
  • Added -skiperror option to ikvmstub (by Kevin Grigorenko).
  • Replaced GCHandle in java.lang.ref.Reference with WeakReference to support partial trust.
  • Added check to prevent sun.misc.Unsafe.objectFieldOffset() from working on static fields.
  • Added support for registering a delegate with a DynamicTypeWrapper that gets called after the type is baked.
  • Intrinsified AtomicReferenceFieldUpdater.newUpdater().
  • Added virtual opcode to explicitly trigger class initialization from map.xml.
  • Added accessor methods for "slot" to Method & Constructor.
  • Implemented System.setIn0, setOut0, setErr0 in map.xml.
  • Hacked sun.misc.SharedSecrets in map.xml to replace Unsafe.ensureClassInitialize() with direct calls.
  • Replaced java.nio.Bits.byteOrder() in map.xml with simple System.BitConver.IsLittleEndian based implementation.
  • Disabled DynamicMethodSupport when running in partial trust.
  • Don't trigger load of JNI assembly when "loading" a fake system library.
  • Added ikvmc -platform option.
  • Added SecurityCritical and AllowPartiallyTrustedCallers annotation to IKVM.OpenJDK.ClassLibrary.dll.
  • Added SecurityCritical and AllowPartiallyTrustedCallers attributes to IKVM.Runtime.dll.

WARNING: THIS IS A DEVELOPMENT SNAPSHOT, NOT AN OFFICIAL RELEASE.

Development snapshots are intended for evaluating and keeping track of where the project is going, not for production usage. The binaries have not been extensively tested and are not strong named.

Binaries available here: ikvmbin-0.37.2928.zip

Monday, 07 January 2008 08:56:57 (W. Europe Standard Time, UTC+01:00)  #    Comments [4]