# Monday, 25 October 2010
« MS10-077 Vulnerability Details | Main | How to Hack Your Own JIT Intrinsic »
Memory Model Fix

In last week's update I fixed a memory model bug. In retrospect it was a pretty dumb bug, but in my defence, even the easy parts of memory models are still pretty subtle and when I started IKVM.NET both the Java and CLR memory models were not very well documented.

First of all, I should thank Staffan Ulfberg for filing an exceptionally high quality bug report.

Finding the Problem

After spending some quality time in Windbg looking at the crash dump and with the java.util.concurrent sources, I was eventually able to reproduce the problem on a single CPU quad core Xeon (but not on my Core 2 laptop). After it was reproducable I started pruning the repro to find the troublespot and eventually found this method:

public final boolean release(int arg) {
  if (tryRelease(arg)) {
    Node h = head;
    if (h != null && h.waitStatus != 0)
    return true;
  return false;

My pruned version looked like this:

public final void release() {
  state = 0;
  Node h = head;
  if (h != null && h.waitStatus != 0)

Both state and head are volatile fields. With this modified version the hang usually happened with in a second and I had also added a timeout to park so the problem could be seen repeatedly without restarting the process. After seeing this and thinking about it for a bit, I realized that if the read of head could be reordered with the write of state, that could cause the observed hang. To test that theory, I used Windbg to patch the running code to add an sfence instruction after the state = 0 store. That did indeed make the problem go away and replacing the sfence with three nop instructions made it reappear.

Time to read up on the memory models.

Java Memory Model

For the Java memory model, the JSR 133 Cookbook is a good place to start. It has a nice table that confirms that the reordering isn't allowed in Java:

Can Reorder 2nd operation
1st operation Normal Load
Normal Store
Volatile Load
Volatile Store
Normal Load
Normal Store

Volatile Load
No No No
Volatile store

No No

The yellow box clearly says No :-)

CLR Memory Model

The CLR memory model is summarized nicely in this blog post by Joe Duffy and he explicitly calls out: "With this model, the only true case where you’d truly need the strength of a full-barrier provided by Rule 4 is to prevent reordering in the case where a store is followed by a volatile load. Without the barrier, the instructions may reorder."

So that confirmed that I did indeed need to emit a memory barrier between a volatile store and a subsequent load. I modified the bytecode compiler to emit a call to Thread.MemoryBarrier() after every volatile store.

CLI Memory Model

I explicitly chose not to support the CLI memory model, because it is not very well specified and is impossible to test against. I don't know what memory model Mono implements, but I did find that Thread.MemoryBarrier() was not implemented for x86.

Thread.MemoryBarrier() Implementation Issues

I picked Thread.MemoryBarrier() for 0.44 because it was the easiest and lowest risk fix. An alternative would be to use Interlocked.Exchange() to write to a volatile field. On .NET 2.0, Thread.MemoryBarrier() is implemented as a native method that does more work than just the memory barrier, it also polls for GC and acts as a safe point. On .NET 4.0 it has been turned into a JIT intrinsic and the overhead is lower. Unfortunately, on my system, the .NET 4.0 memory barrier is still significantly slower than the HotSpot memory barrier (which uses mfence), but apparently the trade-off between mfence and a locked instruction is not trivial.


In the 0.45 code (where there is now an MSIL optimization step), I added an optimization to remove redundant memory barriers. If multiple volatile stores are done in succession, only the last one will get a memory barrier.


Finally, here's a small test that reproduces the problem (on my Core 2 laptop):

class Rendezvous extends java.util.concurrent.atomic.AtomicInteger {
  private static final int PARTIES = 2;

  public final void await() {
    if (incrementAndGet() == PARTIES) {
      compareAndSet(PARTIES, 0);
    while (get() != 0) ;

public class test {
  static volatile int p1;
  static volatile int p2;
  static volatile int r1;
  static volatile int r2;
  static final Rendezvous rv1 = new Rendezvous();
  static final Rendezvous rv2 = new Rendezvous();

  public static void main(String[] args) {
    Thread t = new Thread() {
      public void run() {
        for (; ; ) {
          p1 = 0;

          p1 = 1;
          r1 = p2;


    for (int i = 0; i < 1000000; i++) {
      p2 = 0;

      p2 = 1;
      r2 = p1;


      if (r1 == 0 && r2 == 0)
        System.out.println("Oops! i = " + i);


Monday, 25 October 2010 08:58:57 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]