# Sunday, 29 February 2004
Object Model Mapping

After suffering from coder's block (if that's the programmer's equivalent of writer's block) for weeks, I finally got started the past week on the new object model remapping infrastructure. I spent most of the week going in circles. Writing code, deleting it, writing more code, deleting it again. However, I think I figured it out now. To test that theory I decided to write a blog entry about it. Explaining something is usually a good way to find holes in your own understanding.

I'm going to start out by describing the problem. Then I'll look at the existing solution and it's limitations. Finally I'll explain the new approach I came up with. If everything goes well, by end the of the entry you'll be wondering "why did it take him so long to come up with this, it's obvious!". That means I came up with the right solution and I explained it well.

Note that I'm ignoring ghost interfaces in this entry. The current model will stay the same. For details, see the previous entries on ghost interfaces.

What are the goals?

  • We need a way to have the Java object model fit into the .NET model
  • We would like to enable full interoperability between the two models
  • Performance should be acceptable
  • Implementation shouldn't be overly complex

The .NET model (relevant types only):

For comparison, here is the Java model that we want to map to the .NET model:

There are several possible ways to go (I made up some random names):

  • Equivalence
    This is the current model. The Java classes are mapped to the equivalent .NET types. This works because System.Object has mostly the same virtual methods as java.lang.Object. For the java.lang.Throwable to System.Exception, a little more work is needed and that is where the java.lang.Throwable$VirtualMethods interface comes in. When a virtual method on Throwable is called through a System.Exception reference, the compiler calls a static helper method in Throwable$VirtualMethodsHelper that checks if the passed object implements the Throwable$VirtualMethods interface and if so, it calls the interface method, if not, it calls the Throwable implementation of the method (i.e. it considers the method not overridden by a subclass). A downside of using an interface for this is that all interface methods must be public, at the moment this isn't a problem because all virtual methods (except for clone and finalize derived from Object) in Throwable are public, but it could become a problem later on.
  • Extension
    This is a fairly straightforward approach where java.lang.Object extends System.Object and the Java array, String and Throwable classes are simply subclasses of java.lang.Object. It is easy to implement. The obvious downsides are that arrays will be slow (extra indirection), Strings need to be wrapped/unwrapped when doing interop with .NET code and Throwable is not a subclass of System.Exception (the CLI supports this, but once again not a good idea for interop).
  • Wrapping
    I apologise in advance, because I probably can't explain this one very well (because it doesn't make any sense to me). Many people have actually suggested this model. In the model java.lang.Object extends System.Object, but arrays, String and Throwable do not extend java.lang.Object, instead whenever an instance of those types is assigned to a java.lang.Object reference, it is wrapped in an instance of a special java.lang.Object wrapper class. The downside of this model is that wrapping and unwrapping is expensive and (and this is why I don't like this approach at all) that the expense is paid in ways that are very unexpected to the Java programmer (who expects simple assignment to be expensive?).
  • Mixed
    This is the new model. Explanation follows below.

What's wrong with equivalance?

Both J# and the current version of IKVM use equivalence (although many of the details differ and J# doesn't consider Throwable and System.Exception to be equivalent) and it works well. So why change it? There are four advantages to the mixed model:

  • Interop works better
    In the current model, if you subclass a Java class from C# and you want to override Object.equals, depending on whether any of the base classes overrides Object.equals you need tot override either Equals or equals. If you want to call Throwable.getStackTrace() from C# on a reference of type System.Exception there is no obvious way to do that.
  • More efficient representation of remapping model
    Currently every subclass of java.lang.Object overrides toString() to call ObjectHelper.toStringSpecial, this is needless code bloat. More importantly, before any classes can be resolved the map.xml remapping information needs to be parsed and interpreted (to know about java.lang.Object, java.lang.String and java.lang.Throwable). In the new model, java.lang.Object, java.lang.String and java.lang.Throwable will be in the classpath.dll, so they can be resolved on demand. The classes will be decorated with custom attributes to explain to the runtime that they are special, but no other metadata will need to be parsed or interpreted.
  • Cleaner model
    java.lang.ObjectHelper and java.lang.StringHelper no longer need to be public and the various $VirtualMethod helper types aren't needed anymore.
  • Easier to get right
    There are a few subtle bugs in the current implementation. Try the following for example:
    class ThrowableToString
    {
      public static void main(String[] args) throws Exception
      {
        String s = "cli.System.NotImplementedException";
        Object o = Class.forName(s).newInstance();
        Throwable t = (Throwable)o;
        System.out.println(o.toString());
        System.out.println(t.toString());
      }
    }
    
    It prints out:
    System.NotImplementedException: The method or operation is not implemented.
    cli.System.NotImplementedException: The method or operation is not implemented.
     

    Obvously, both lines should be the same. Another (at the moment theoretical) problem is that it is legal for code in the java.lang package to call Object.clone or Object.finalize (both methods are protected, but in Java, protected also implies package access), currently that wouldn't work.

Here is the mixed model I ended up with:

I called it mixed because it combines some features of equivalence and extension. For example, references of type java.lang.Object are still compiled as System.Object (like in the equivalence model), but the non-remapped Java classes extend java.lang.Object (like in the extension model).

java.lang.Object will contain all methods of the real java.lang.Object and in addition to those also a bunch of static helper methods that allow you to call java.lang.Object instance methods on System.Object references. The helper methods will test if the passed object is either a java.lang.Object or a java.lang.Throwable (for virtual methods) and if so, it will downcast and call the appropriate method on those classes, if not, it will perform an alternative action (that was specified in map.xml when this classpath.dll was compiled).

Object.finalize requires some special treatment since we don't want java.lang.Object.finalize to override System.Object.Finalize because that would cause all Java objects to end up on the finalizer queue and that's very inefficient. So the compiler will contain a rule to override System.Object.Finalize when a Java class overrides java.lang.Object.finalize.

I glossed over a lot of details, but those will have to wait for next time.

FOSDEM 2004

Finally a short note on FOSDEM (Free and Open Source Software Developer's Meeting). Last weekend I visisted FOSDEM in Brussels. I enjoyed seeing Dalibor, Chris, Mark, Sascha and Patrik again and I also enjoyed meeting gjc hackers Tom Tromey and Andrew Haley for the first time. Mark wrote up a nice report about it. If you haven't read it yet, go read it now. All in all a very good and productive get-together.

Sunday, 29 February 2004 15:43:01 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]
# Wednesday, 18 February 2004
F.A.Q.

Stuart pointed out the F.A.Q. was out of date, so I updated it a little bit. He also asked:

Speaking of which, I noticed while perusing the FAQ that the JIT compiler is included in IK.VM.NET.dll which means it's required for running even statically-compiled code. For apps that don't use clever classloading tricks, the JIT isn't needed at all when everything's been statically compiled. Would it be possible to separate the JIT out into a different DLL to reduce the necessary dependencies for a statically-compiled Java app?

Sure, the 275K of IK.VM.NET.dll is miniscule compared to the 3Mb of classpath.dll, but it's the principle of the thing ;)

This is definitely something I want to do. In fact, I would also like to have the option to only include parts of the Classpath code when statically compiling a library. So instead of having a dependency on classpath.dll, you'd suck in only the required classes.

Wednesday, 18 February 2004 09:36:11 (W. Europe Standard Time, UTC+01:00)  #    Comments [4]
# Monday, 16 February 2004
Jikes 1.19, Bytecode Bug, Serialization and a New Snapshot
 

Jikes

I upgraded to Jikes 1.19 that was released recently. It didn't like the netexp generated stub jars (which is good, because it turns out they were invalid), so I fixed netexp to be better behaved in what it emits. Jikes didn't like the fact that the inner interfaces that I created had the ACC_STATIC modifier set at the class level, rightly so, but the error message it came up with was very confusing. Along the way I also discovered that it is illegal for constructors to be marked native (kind of odd, I don't really see why you couldn't have a native constructor). So I made them non-native and have a simple method body that only contains a return. That isn't correct either (from the verifier's point of view) and I guess I should change it to throw an UnsatifiedLinkError. That would also be more clear in case anyone tries to run the stubs on a real JVM.

Jikes 1.19 has a bunch of new pedantic warnings (some enabled by default). I don't think this is a helpful feature at  the moment. Warnings are only useful if you can make sure you don't get any (by keeping your code clean), but when you already have an existing codebase, this is very hard and in the case of Classpath, where you have to implement a spec, you often don't have the option to do the right thing. So I would like to have to option to have lint like comment switches to disable specific warnings in a specific part of the code.

Bytecode Bug

I also did some work to reduce the number of failing Mauve testcases on IKVM and that caused me to discover that the bit shifting instructions were broken (oops!). On the JVM the shift count is always masked by the number of bits (-1) in the integral type you're shifting. So for example:

int i = 3;
System.out.println(i << 33);

This prints out 6 ( 3 << (33 & 31)). On the CLI, if the shift count is greater than the number of bits in the integral type, the result is undefined. I had to fix the bytecode compiler to explicitly do the mask operation.

Serialization

Brian J. Sletten reported on the mailing list that deserialization was extremely slow. That was caused by the fact that reflection didn't cache the member information for statically compiled Java classes or .NET types. I fixed that and after that I also made some improvements to GNU Classpath's ObjectInputStream to speed it up even more. It's still marginally slower than the Sun JRE, but the difference shouldn't cause any problems.

Snapshot

I made a new snapshot. Here's what's new:

  • Changed classpath.build to disable jikes warnings (I know it's lame, but I grew tired of the useless warnings). I also added the -noTypeInitWarning option to ikvmc, to get rid of all the warning about running type initializers.
  • Implemented accessibility checks for Java reflection.
  • Cleaned up socket code and implemented all of the socket options (well, except for IP_MULTICAST_IF2).
  • Implemented Get###ArrayRegion/Set###ArrayRegion and GetPrimitiveArrayCritical/SetPrimitiveArray JNI functions.
  • Added all the 1.4 functions to the JNIEnv vtable.
  • Implemented support for field name overloading (a single class can have several different fields with the same name, if the types are different).
  • Changed the class format errors thrown by ClassFile.cs to .NET exception, instead of Java exception, to have better error handling in ikvmc.
  • Changed VMClass.getWrapperFromClass to use a delegate instead of reflection, to speed up reflection.
  • Fixed the compiler to mask shift counts for ishl, lshl, iushr, lushr, ishr, lshr bytecodes.
  • Fixed a bug in ghost handling (the bug could cause a "System.NotSupportedException: The invoked member is not supported in a dynamic module." exception).
  • Added EmitCheckcast and EmitInstanceOf virtual functions to TypeWrapper.
  • Added a LazyTypeWrapper base class for DotNetTypeWrapper and CompiledTypeWrapper, to cache the member information to speed up reflection.
  • Improved error handling in ikvmc.
  • Fixed netexp to generate valid (or less invalid) classes.
  • Regenerated (and checked in) mscorlib.jar, System.jar and System.Xml.jar with the new (fixed) version of netexp.

I didn't get around yet to removing the "virtual helpers" and introducing base classes for non-final remapped types (java.lang.Object and java.lang.Throwable).

New snapshots: just the binaries and source plus binaries.

Monday, 16 February 2004 16:33:14 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]