# Wednesday, 04 July 2007
New Hybrid Snapshot

I decided to take a break from integrating new OpenJDK packages and instead focus on running some test cases and work on stabilization. Contrary to the previous hybrid snapshots, this version should be fairly usable. Feedback is appreciated.

Changes:

  • OpenJDK: Implemented java.util.concurrent.locks.LockSupport.
  • OpenJDK: Fixed race condition in Thread.interrupt() that could cause cli.System.Threading.ThreadInterruptedException to be thrown from interruptable waits/sleep.
  • OpenJDK: Imported and modified java.util.concurrent.locks.AbstractQueuedSynchronizer to make it more efficient and to remove the use of ReflectionFactory & Unsafe to reduce initialization order dependencies.
  • OpenJDK: Changed unsafe to use more efficient internal helper class to copy java.lang.reflect.Field and make it accessible (this also reduces initialization order dependencies).
  • OpenJDK: Added lib/logging.properties to VFS and implemented an additional VFS operation required for reading it.
  • OpenJDK: Changed VFS ZipEntryStream.Read() to always try to read the requested number of bytes, instead of returning earlier. For maximum compatibility with real file i/o.
  • OpenJDK: Commented out system property setting in sun.misc.Version that was setting some version properties to bogus values.
  • OpenJDK: Fixed ObjectInputStream.latestUserDefinedLoader() to skip mscorlib stack frames.
  • OpenJDK: Added support to VFS for the VFS root directory.
  • OpenJDK: Fixed FieldAccessorImpl to check the type of the object passed in.
  • OpenJDK: "Implemented" ClassLoader.retrieveDirectives() by returning empty AssertionStatusDirectives object. Enabling assertions on the ikvm.exe command line is still not implemented.
  • OpenJDK: Switched to javac compiler for building OpenJDK sources.
  • Added workaround for .NET 1.1 reflection bug that causes methods that explicitly override a method to show up twice.
  • Changed handling of ldfld & lfsfld opcodes in remapper to bypass "magic".
  • Restructured handling of fields defined in map.xml to enable referencing them from map.xml method bodies.
  • Fixed java.security.VMAccessController to make sure that if there is only system code on the stack, the resulting AccessControlContext isn't empty.
  • Updated to compile with GNU Classpath HEAD.

Binaries available here: ikvmbin-hybrid-0.35.2741.zip.

Wednesday, 04 July 2007 15:04:10 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Wednesday, 27 June 2007
New Hybrid Snapshot

I'm getting tired of writing the disclaimers and warnings, but they still apply.

The most important change in this snapshot is that there's now a virtual file system for the java.home directory. This has been a long time coming, but the proverbial straw was the fact that the OpenJDK timezone code reads files from the java.home/lib/zi/ directory (IMHO they really should be using resources for these things).

Currently the virtual java.home directory is C:\.virtual-ikvm-home\ on Windows and /.virtual-ikvm-home/ on Unix, but this is subject to change (please let me know if you have thoughts on this). The only contents in there so far is the /lib/zi/ directory tree and only a few file operations are supported (notably the ones required by the timezone code), but expect that eventually all (read-only) file system operations will be supported and more virtual files to appear in there.

Why a Virtual File System Instead Of a Real Java Home Directory?

The main reason is that I want IKVM to behave like a .NET library as much as possible. That means it should be possible to install it into the GAC and support the versioning and side-by-side capabilities of .NET, that's very hard to do when you have to manage real directories.

Changes:

  • OpenJDK: Integrated java.util.spi, java.util.prefs and java.util.logging packages.
  • OpenJDK: Integrated java.text and java.text.spi packages (except for java.text.Bidi class, for which Sun uses native code, so we'll continue to use GNU Classpath's pure Java version.)
  • OpenJDK: Changed build script to include all resources from OpenJDK generated resources.jar.
  • OpenJDK: Integrated java.rmi package.
  • OpenJDK: Changed system/extension class loader creation to make sure that an extension class loader always exists if there is a non-assembly system class loader.
  • OpenJDK: Improved exception handling in java.io.FileDescriptor.
  • OpenJDK: Removed AccessController.doPrivileged() call in Unsafe.fieldOffset(), to work around Mauve brokenness.
  • OpenJDK: Implemented the beginnings of a virtual file system for the java.home directory.
  • Changed JVM.IsUnix to use Environment.OSVersion.Platform.

Binaries available here: ikvmbin-hybrid-0.35.2734.zip.

Wednesday, 27 June 2007 08:57:35 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]
# Thursday, 21 June 2007
New Hybrid Snapshot

Another hybrid snapshot update. Just a reminder again: These snapshots have not been tested extensively and are known to be broken (and are much more broken than the non-hybrid snapshots I used to release). They are only intended to be used for testing and getting a feel of how things are going. If you find a bug specific to this snapshot, please don't file a bug on SourceForge, simply send a message to the ikvm-developers list or to me directly.

I'm not going to list everything that's known to be broken, but I will say that Eclipse 3.2 still runs, so at least some parts do work ;-)

This build also includes several GNU Classpath fixes that I haven't yet checked in, so if you're trying to do a hybrid build from cvs you'll end up with something even more broken than this build. [Update: I checked them in.]

Changes:

  • OpenJDK: Upgraded to OpenJDK bundle b13.
  • OpenJDK: Added more resources.
  • OpenJDK: Integrated java.lang.annotation and java.lang.ref packages.
  • OpenJDK: Integrated java.io and java.util packages.
  • Added -serialver option to ikvmstub that forces stub generator to include serialVersionUID field for all serializable classes (to make Japi results more accurate).
  • Fixed GetParameterAnnotations() to return the correct array length for instancehelper__ methods (static methods that represent instance methods on remapped types).
  • Fixed ikvm.io.InputStreamWrapper.available() to return non-zero when more data is available (as suggested by Mark Reinhold). Removed the NormalizerDataReader specific fix from map.xml.
  • Fixed ikvmstub to better handle private interface implementations. (Fixes System.Web.UI.Control subclassing issue).

Binaries available here: ikvmbin-hybrid-0.35.2728.zip.

Thursday, 21 June 2007 07:23:42 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]
# Tuesday, 19 June 2007
Five Years

Wow. Today it's five years ago that I started blogging about IKVM. I'd write more, but I'm having too much fun working on integrating the OpenJDK libraries :-)

Tuesday, 19 June 2007 06:32:18 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]
# Thursday, 14 June 2007
First OpenJDK Bug

Yesterday I encountered the first OpenJDK "bug". Sun uses IBM's ICU library for Unicode and Globalization support and it does a couple of DataInputStream.read(byte[]) calls without checking the return value. On IKVM the InputStream that ClassLoader.getSystemResourceAsStream() returns is an ikvm.io.InputStreamWrapper and that always returns 0 from available() when it is wrapping a non-seekable stream and the LZInputStream that decompresses compressed resources is a non-seekable stream. The reason this is significant is that ICU wraps a BufferedInputStream around the InputStreamWrapper and BufferedInputStream uses available() to figure out if it should read more data from the underlying stream if there's still room in the byte array passed in to read(byte[]) after the data from the buffer has been copied. So as long as the underlying stream has an implementation of available() that never returns 0, ICU works fine. I filed a bug report.

Issues like this demonstrate why cloning a large platform is so incredibly difficult. Real world code is bound to have dependencies on implementation details like this. So simply reimplementing a spec isn't sufficient.

For the time being I've added a workaround to map.xml (using the new <replace-method-call /> support) to modify the ICU method to call readFully(byte[]) instead of read(byte[]), but I'm considering rewriting the resource decompression to have available() return the number of bytes still available to read from the stream. There may be other code out there that depends on it.

Thursday, 14 June 2007 12:26:19 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]
# Wednesday, 13 June 2007
New Hybrid Snapshot

A new snapshot. I integrated all classes in the java.lang package (but not the sub-packages) and this means I have now resolved most VM/Library initialization issues that I was a bit worried about before embarking on this. I had to include one new hackish (or AOP if you will) feature in map.xml: The ability to replace method invocations with arbitrary CIL code sequences. This was in particular necessary to "comment out" the loadLibrary("zip") call in System.initializeSystemClass(). Here's what that looks like in map.xml:

<class name="java.lang.System">
  <method name="initializeSystemClass" sig="()V">
    <replace-method-call class="java.lang.System" name="loadLibrary" sig="(Ljava.lang.String;)V">
      <code>
        <pop />
      </code>
    </replace-method-call>
  </method>
</class>

This simply means that all System.loadLibrary() invocations in System.initializeClass() will be replaced with a pop instruction (to discard the string argument).

Changes:

  • OpenJDK: Integrated java.lang package..
  • OpenJDK: Integrated java.util.regex package.
  • OpenJDK: Integrated java.text.Normalizer and support classes.
  • OpenJDK: New StringHelper.java based on OpenJDK's String.java.
  • OpenJDK: Implemented using the entry assembly's class loader as system class loader.
  • OpenJDK: Replaced zip library loading hack with "replace-method-call" hack.
  • OpenJDK: Implemented the hooks to set the system class loader to the entry assembly's class loader if java.class.path and java.ext.dirs properties aren't set.
  • OpenJDK: Various fixes.
  • Added leave CIL opcode support to remapper.
  • Added support for replacing constructor and static initializer method bodies in map.xml.
  • Added support for locally (i.e. per method) replacing method calls in map.xml.
  • Added optimization to ikvmc for large string literals that are only used to call .toCharArray() on.

Binaries available here: ikvmbin-hybrid-0.35.2720.zip.

Wednesday, 13 June 2007 08:35:28 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Tuesday, 12 June 2007
Array Initialization

One of the many limitations of the Java class file format is that it doesn't support an efficient way of initializing an array. Take the following class for example:

class Foo
{
  private static final int[] values = { 1, 2, 3, 4 };
}

This looks harmless enough, but it is compiled almost exactly the same as this:

class Foo
{
  private static final int[] values;

  static
  {
    int[] temp = new int[4];
    temp[0] = 1;
    temp[1] = 2;
    temp[2] = 3;
    temp[3] = 4;
    values = temp;
  }
}

If you have an array with a few thousand entries, this is starting to look pretty inefficient. Here's how C# compiles the equivalent code (but with an array with more elements, for small arrays the C# compiler does the same as the javac):

using System.Runtime.CompilerServices;

class
Foo
{
  private static readonly int[] values;

  static Foo()
  {
    int[] temp = new int[128];
    RuntimeHelpers.InitializeArray(temp, LDTOKEN($$field-0))
    values = temp;
  }
}

This is pseudo C#, because there's no C# language construct that allows you to directly load a field's RuntimeFieldHandle as the ldtoken CIL instruction does. What happens here is that the C# compiler emits a global read-only data field and then uses RuntimeHelpers.InitializeArray() to copy this field into the array. If you're on a little endian machine, this is simply a memcpy (after checking that the sizes match).

On occasion I've toyed with the idea of making ikvmc recognize array initialization sequences and then converting them into this more efficient form. The reason that I never got around to it is that many people know that array initialization is inefficient and use a workaround: String literals.

Here's a fragment from OpenJDK's java.lang.CharacterData00 class:

static final char X[] = (
  "\000\020\040\060\100\120\140\160...
  ...
  ...
  ...\u1110\u1120\u1130").toCharArray();

I've omitted most of the data, but the string has 2048 characters (and it's not the only one in this class). String literals can be efficiently stored in the class file format, so this is more efficient than explicitly initializing the array elements.

However, on IKVM there's a downside to this approach. Java requires that all string literals are interned and on .NET the string intern table keeps strong references to the interned strings (the Sun JVM uses weak references to keep track of interned strings). So that means that the 2048 character string above that is only used once will stay in memory for the entire lifetime of the application.

Fortunately, this particular scenario is easily detected in the bytecode compiler and can be optimized to use RuntimeHelper.InitializerArray().

Unfortunately, it turns out that the Reflection.Emit API that takes care of creating these read-only global data fields is a little too helpful and hence broken. What I didn't mention earlier is that these read-only global data fields need to have a type and that type has to be an explicit layout value type of the right size. ModuleBuilder.DefineInitializedData() automatically defines this type, but instead of making it private it creates a public type.

The type created by ModuleBuilder.DefineInitializedData() is named $ArrayType$<n> where <n> is the size of the value in bytes. The size is encoded in the type name to make it easy to reuse the same type for other fields that have the same size. It turns out we can abuse this knowledge to work around the limitations of ModuleBuilder.DefineInitializedData() by pre-creating the value type with the right name. Now we can have full control over the properties of the generated type (as long as it's compatible with the requirements of the read-only global field, of course).

Final note: While writing this blog entry I found that Mono's implementation of RuntimeHelpers.InitializeArray() leaves something to be desired.

Tuesday, 12 June 2007 08:45:48 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Monday, 04 June 2007
OpenJDK Library/VM Integration

In Merging the First OpenJDK Code and Another x64 CLR Bug I briefly mentioned the three different approaches that are available for integrating the OpenJDK classes with IKVM:

  1. Use existing native interface
  2. Use map.xml tricks
  3. Change the code (i.e. fork a specific source file)

This triggered a couple of questions in the comments.

Morten asked:

about your work of [i]ntegrating the whole of OpenJDK and [y]our 3 options. Could you provide more details? From what you write, it sounds like using aspectj to inject annotations could be useful. Codegeneration might also be useful?

Yes, that's essentially what option 2 is. The map.xml infrastructure allows me to add methods and fields to existing classes, or allows me to replace method bodies.

Albert Strasheim asked:

It's too bad that the Sun and Classpath code aren't compatible down to the native methods called by the pure Java code.

As Andrew pointed out, it would not have been possible or practical to reverse engineer the Sun native interface. Besides that, the Sun and GNU Classpath libraries have very different goals, so it actually makes a lot of sense for the interfaces to differ.

I figured that you'd go the "native methods in C#" route, so I'd be interested to know why you think that the "IKVM specific modifications" route is cleaner and more efficient (efficient in terms of performance or time to implement?).

It would seem that implementing the native methods in C# would be the best solution, but that's not always the case. Hopefully that will become clear below.

Examples

Let's look at some examples and the downsides and cost associated with each of the three options:

1. Use existing native interface

In terms of long term development cost, this is easily the best option. Once you understand the ikvmc native method mapping that is used by the class library, it is very easy to find to corresponding native method. Sun is not very likely to change the native method semantics without also modifying the method name or signature. When a new native method is added, the build process will fail due to the fact that the new native method isn't implemented (and hence generates an unverifiable JNI method stub).

In terms of runtime performance, this method can be very efficient, but only if the method parameters contains all the required data to work on, or if the data can be easily obtained in another efficient way. This is often not the case, because many of the Sun native methods use JNI reflection to access private fields in the object. Here are two real world examples, one where the native methods works really well and another one where it's not as efficient as I'd like:

namespace IKVM.NativeCode.java.lang.reflect
{
  public sealed class Array
  {
    public static bool getBoolean(object arrayObj, int index)
    {
      if (arrayObj == null)
      {
        throw new jlNullPointerException();
      }
      bool[] arr = arrayObj as bool[];
      if (arr != null)
      {
        if (index < 0 || index >= arr.Length)
        {
          throw new jlArrayIndexOutOfBoundsException();
        }
        return arr[index];
      }
      throw new jlIllegalArgumentException("argument type mismatch");
    }
  }
}

Clearly this is very efficient (and it makes you wonder why it's a native method at all). Here's one that's less efficient:

namespace IKVM.NativeCode.java.lang
{
  public sealed class Class
  {
    public static bool isInterface(object thisClass)
    {
      return TypeWrapper.FromClass(thisClass).IsInterface;
    }
  }
}

The thisClass parameter is a reference to the java.lang.Class object, but the VM wants to use its internal representation of a class (TypeWrapper), so here I need to use TypeWrapper.FromClass() to get the corresponding TypeWrapper object. Currently that's implemented by using reflection to access a private field in java.lang.Class (that was added through map.xml).

To contrast, here's the equivalent method for use with the GNU Classpath library:

namespace IKVM.NativeCode.java.lang
{
  public sealed class VMClass
  {
    public static bool IsInterface(object wrapper)
    {
      return ((TypeWrapper)wrapper).IsInterface;
    }
  }
}

Here only a downcast is required, because Class.isInterface() calls VMClass.isInterface(), which reads the vmdata field from Class and passes that to the native IsInterface method.

2. Use map.xml tricks

This is very powerful, but also much more expensive because it is much less obvious what's going on. In the case of a native method that is implemented, if you can't find it in the C# code, you'll probably eventually find it in map.xml, but when replacing an existing method it's very easy to miss the fact that the method is replaced. I've used method replacement in java.lang.ClassLoader to make bootstrap resource loading work. ClassLoader has two private methods getBootstrapResource() and getBootstrapResources() that are used to load resources from the boot class path. These methods are implemented by parsing the sun.boot.class.path system property and then using internal classes to load from these jars or directories. Obviously that won't do for IKVM, because the boot classes and resources are contained in the core library .NET assembly (currently named IKVM.Hybrid.GNU.Classpath.OpenJDK.dll). So I've had to replace these two methods:

<class name="java.lang.ClassLoader">
  <field name="wrapper" sig="Ljava.lang.Object;" modifiers="private" />
  <method name="getBootstrapResource" sig="(Ljava.lang.String;)Ljava.net.URL;">
    <body>
      <ldarg_0 />
      <call class="java.lang.LangHelper"
            name="getBootstrapResource" sig="(Ljava.lang.String;)Ljava.net.URL;" />
      <ret />
    </body>
  </method>
  <method name="getBootstrapResources" sig="(Ljava.lang.String;)Ljava.util.Enumeration;">
    <body>
      <ldarg_0 />
      <call class="java.lang.LangHelper"
            name="getBootstrapResources" sig="(Ljava.lang.String;)Ljava.util.Enumeration;" />
      <ret />
    </body>
  </method>
</class>

These method bodies simply call the real implementations in LangHelper, because it's obviously easier to implement them in Java than in CIL.

3. Change the code (i.e. fork a specific source file)

In some cases, the changes required are so substantial that the map.xml trickery doesn't work or isn't practical. For an example of that we need to look at how reflection is implemented in OpenJDK.

Here's a fragment of java.lang.reflect.Method (not the actual code, but it gets the idea across):

package java.lang.reflect;

public final class Method
{
  private static final ReflectionFactory reflectionFactory = (ReflectionFactory)
    AccessController.doPrivileged(new sun.reflect.ReflectionFactory.GetReflectionFactoryAction());
  private MethodAccessor methodAccessor;

  public Object invoke(Object obj, Object... args)
  {
    // ... access checks omitted ...
    if (methodAccessor == null)
    {
      methodAccessor = reflectionFactory.newMethodAccessor(this);
    }
    return methodAccessor.invoke(obj, args);
  }
}

So the actual method invocation is delegated to an object that implements the MethodAccessor interface. By default Sun's ReflectionFactory returns an object that counts the number of invocations and delegates to another MethodAccessor object that uses JNI to use the VM reflection infrastructure, if the invocation counter crosses a certain value the inner object is replaced by an instance of a dynamically generated class that contains custom generated byte codes to efficiently call the target method (and this generated bytecode is not verifiable, so it won't work as-is on IKVM). In this case I've decided to fork ReflectionFactory and implement my own version that simply returns MethodAccessor instances that reuse the IKVM reflection infrastructure.

The cost of forking a source file should be obvious. When improvements are made to the original, you don't automatically inherit these. In this particular case ReflectionFactory is a fairly small and simple class (all of the complexity is in the classes it calls), so the risk seems low.

Structural Issues

Besides the issues already pointed out, there are also things of a more structural nature. A good example can again be found in java.lang.reflect.Method. Here's its constructor:

Method(Class declaringClass,
       String name,
       Class[] parameterTypes,
       Class returnType,
       Class[] checkedExceptions,
       int modifiers,
       int slot,
       String signature,
       byte[] annotations,
       byte[] parameterAnnotations,
       byte[] annotationDefault)
{
  // ...
}

The constructor expects the generics signature and annotations to be passed in. This poses a dillema. On IKVM looking up the custom attributes that contain this data is relatively expensive and in many cases a method object may be constructed while the code is not interested in generics information or annotations at all, so it might be more desireable to lazily get this data. Another problem is that the annotations are passed in as byte arrays that contain the annotation data as it exists in the class file. Code compiled with ikvmc obviously doesn't have a class file. Despite these two issues, I've chosen to not fork Method. At the moment I think that the performance cost of eagerly getting the generics signature and the annotations is an acceptable trade-off for not having to fork Method. Currently, the way annotations are handled is pretty horrible, because I've left the existing annotation mechanisms in place and reuse the stub class generator code to recreate the annotation byte arrays (and the corresponding constant pool entries) on the fly to pass them into the Method constructor. I've got some ideas to change annotation storage in ikvmc compiled code to use a mechanism that's more compatible with this approach and that may ultimately turn out to be more space efficient as well.

Compatibility

One aspect I haven't talked about yet is compatibility. For example, there's code out there that directly uses ReflectionFactory. This is evil, but it's also reality. So I try to weigh the compability impact as well when I'm trying to find the right approach.

Monday, 04 June 2007 10:46:34 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Thursday, 31 May 2007
First Hybrid OpenJDK / GNU Classpath Snapshot

The first step on a long journey. I've integrated OpenJDK's java.lang.reflect.* a few classes from java.lang.* and a whole bunch of sun.* support classes. Take a look at allsources.lst to see which sources are used from where. I'll try to write a detailed blog entry discussing the various integration points (and answering the questions asked here) over the weekend.

Other changes:

  • Added some conditionally compiled code to work around .NET 2.0 obsoletion warnings.
  • Optimized lcmp, fcmpl, fcmpg, dcmpl and dcmpg bytecodes (by Dennis Ushakov).
  • Fixed ikvmc not to crash if map.xml doesn't contain any <class> entries.
  • Various refactorings to make OpenJDK integration easier.

Known regressions:

  • System class loader is not set correct for ikvmc compiled code.

This is based on OpenJDK drop b12, which appears not to be available for download anymore. The ikvm changes are in cvs. The snapshot binaries are available here: ikvmbin-hybrid-0.35.2707.zip.

As always, snapshots are not intended to be used in production, only for testing and getting a flavor of where development is going.

Thursday, 31 May 2007 16:08:38 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Wednesday, 30 May 2007
IKVM 0.34 Update

I've made an updated 0.34 release that includes the most important fixes I've done in the last couple of weeks.

Changes:

  • Fixed exception mapping memory leak.
  • Fixed verifier/compiler to support dup_x2 form 2.
  • Fixed for ArrayIndexOutOfBoundsException in generic metadata reflection on dynamically compiled types.
  • Fixed infinite recursion bug when DotNetTypeWrapper.GetName() was used on DerivedType where DerivedType extends BaseType<DerivedType>.
  • Fixed ikvmc generic .NET type loading bug.
  • Fixed Throwable.printStackTrace() to call Throwable.printStackTrace(OutputStream) to support exception classes that only override printStackTrace(OutputStream).
  • Fixed Throwable.printStackTrace(…) to use PrintWriter/PrintStream.println() to trigger flushing on auto-flush writers/streams.
  • Fixed Throwable constructor to set cause correctly if an exception was instantiated but not thrown immediately.

Files are available here: ikvm-0.34.0.3.zip (source + binaries) and ikvmbin-0.34.0.3.zip (binaries).

Wednesday, 30 May 2007 14:33:14 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]