# Tuesday, 12 June 2007
« OpenJDK Library/VM Integration | Main | New Hybrid Snapshot »
Array Initialization

One of the many limitations of the Java class file format is that it doesn't support an efficient way of initializing an array. Take the following class for example:

class Foo
{
  private static final int[] values = { 1, 2, 3, 4 };
}

This looks harmless enough, but it is compiled almost exactly the same as this:

class Foo
{
  private static final int[] values;

  static
  {
    int[] temp = new int[4];
    temp[0] = 1;
    temp[1] = 2;
    temp[2] = 3;
    temp[3] = 4;
    values = temp;
  }
}

If you have an array with a few thousand entries, this is starting to look pretty inefficient. Here's how C# compiles the equivalent code (but with an array with more elements, for small arrays the C# compiler does the same as the javac):

using System.Runtime.CompilerServices;

class
Foo
{
  private static readonly int[] values;

  static Foo()
  {
    int[] temp = new int[128];
    RuntimeHelpers.InitializeArray(temp, LDTOKEN($$field-0))
    values = temp;
  }
}

This is pseudo C#, because there's no C# language construct that allows you to directly load a field's RuntimeFieldHandle as the ldtoken CIL instruction does. What happens here is that the C# compiler emits a global read-only data field and then uses RuntimeHelpers.InitializeArray() to copy this field into the array. If you're on a little endian machine, this is simply a memcpy (after checking that the sizes match).

On occasion I've toyed with the idea of making ikvmc recognize array initialization sequences and then converting them into this more efficient form. The reason that I never got around to it is that many people know that array initialization is inefficient and use a workaround: String literals.

Here's a fragment from OpenJDK's java.lang.CharacterData00 class:

static final char X[] = (
  "\000\020\040\060\100\120\140\160...
  ...
  ...
  ...\u1110\u1120\u1130").toCharArray();

I've omitted most of the data, but the string has 2048 characters (and it's not the only one in this class). String literals can be efficiently stored in the class file format, so this is more efficient than explicitly initializing the array elements.

However, on IKVM there's a downside to this approach. Java requires that all string literals are interned and on .NET the string intern table keeps strong references to the interned strings (the Sun JVM uses weak references to keep track of interned strings). So that means that the 2048 character string above that is only used once will stay in memory for the entire lifetime of the application.

Fortunately, this particular scenario is easily detected in the bytecode compiler and can be optimized to use RuntimeHelper.InitializerArray().

Unfortunately, it turns out that the Reflection.Emit API that takes care of creating these read-only global data fields is a little too helpful and hence broken. What I didn't mention earlier is that these read-only global data fields need to have a type and that type has to be an explicit layout value type of the right size. ModuleBuilder.DefineInitializedData() automatically defines this type, but instead of making it private it creates a public type.

The type created by ModuleBuilder.DefineInitializedData() is named $ArrayType$<n> where <n> is the size of the value in bytes. The size is encoded in the type name to make it easy to reuse the same type for other fields that have the same size. It turns out we can abuse this knowledge to work around the limitations of ModuleBuilder.DefineInitializedData() by pre-creating the value type with the right name. Now we can have full control over the properties of the generated type (as long as it's compatible with the requirements of the read-only global field, of course).

Final note: While writing this blog entry I found that Mono's implementation of RuntimeHelpers.InitializeArray() leaves something to be desired.

Tuesday, 12 June 2007 08:45:48 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]