# Tuesday, 28 October 2003
Whidbey

Whidbey was unveiled this morning. It has lots of nice new features, but I'm particularly interested in some of the improvements to reflection.

  • Finally, there is support for custom modifiers in Reflection.Emit
  • DynamicMethod allows methods to be generated that "implement" a delegate and the code gets garbage collected along with the delegate when it is no longer reachable.
  • There is (some) support for doing more of Reflection.Emit based on tokens (a token is an Int32 that refers to metadata in a module), instead of the more heavy weight MethodInfo/FieldInfo/etc. objects.
  • Reflection objects (e.g. MethodInfo) are now kept in a weak reference cache, so that they can be garbage collected if they aren't needed anymore.

I need to do some rewriting to take advantage of the last two, but once I do, the memory usage of IKVM should (hopefully) go down significantly. I'm not entirely sure that the changes they made are powerful enough, but it is an important step in the right direction.

I know I promised I wasn't going to move to Whidbey anytime soon, but I am going to introduce some conditional compilation for Whidbey specific features, so that I can play around with them. Mainstream development will continue to be on 1.1

Tuesday, 28 October 2003 21:23:21 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Monday, 27 October 2003
Eclipse Reception

Last night I visisted the Eclipse reception at OOPSLA 2003. Chris Laffra from IBM Canada had prepared a poster on Alternative Eclipse Execution Models and he included IKVM on it. He also invited me to the Eclipse Reception. There was a fair amount of interest in IKVM and I talked to a bunch of people, including Erich Gamma and Doug Lea. That was cool.

This morning I atttended the PDC keynote by Bill Gates and Jim Allchin. See the other PDC blogs for details ;-)

If you're at the PDC, I'm wearing a red/white/blue (vertical bars, as in the Dutch flag) polo shirt with a small Java logo in the front. If you see me, say hi.

Monday, 27 October 2003 21:15:03 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Sunday, 26 October 2003
PDC

I arrived in LA last Friday. No internet access in the hotel room, but fortunately the conference center has excellent connectivity.

I'm not going to be PDC blogging, but expect more non-technical entries this week than usual.

Tonight I'm going to the Eclipse reception at OOPSLA 2003. It'll be fun to chat with some of the Eclipse developers. May be I can get them to fix their class loading ;-)

Sunday, 26 October 2003 18:07:35 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]
# Wednesday, 22 October 2003
Private Super Classes

While trying to get Stuart Ballard's Japitools to work on a netexp generated export of IKVM's classpath.dll, I encountered java.awt.BufferCapabilities.FlipContents. This public (inner) class has non-public base class (java.awt.AttributeValue). What kind of a bizarre design is that?

I had never realised that the Java language allowed this. Fortunately, the C# designers didn't make the same mistake.

Note that a similar issue exists for fields (and method arguments). In Java it is legal to have a public field of a non-public type, or a public method that takes an argument of a non-public type. C# prohibits both of these.

I fixed netexp to export non-public base classes (and I should do the same for interfaces, fields and method arguments/return type).

Wednesday, 22 October 2003 18:36:30 (W. Europe Daylight Time, UTC+02:00)  #    Comments [4]
# Friday, 17 October 2003
Ghosts

It took a bit longer than I anticipated, but I finally managed to put together a new snapshot and check in the changes I made in last month and a half.

What's new?

  • Fixed bug in class load failure diagnostics code.
  • Instead of having a single Type property on TypeWrapper, we now have different properties for different usages. I already know this isn't the final way I'm going to handle things, but for now it is a nice improvement that makes it easier to treat types differently based on where they are appearing (e.g. field, local, argument, base type).
  • Simple ghost references are translated as value type wrappers. See below for details.
  • Reflection support for ghost types.
  • Fixed a few ILGenerator.Emit() bugs where incorrect argument types were passed (int instead of short or byte).
  • Added a NoPackagePrefixAttribute to allow .NET types to not be prefixed with the "cli" prefix. (Suggested by Stuart Ballard)
  • Fix to prevent non-public delegates from being visible.
  • Several fixes in the handling of unloadable types.
  • Fixed java.lang.Comparable interface and method attributes to be identitical to the real interface.
  • Fixed java.lang.Runtime.exec() support for quoted arguments.

Ghost References

First of all, why did I feel the need to change it? Imagine that java.lang.StringBuffer had the following methods:

public void append(Object o) { ... }
public void append(CharSequence cs) { ... }

Previously, CharSequence would be erased to Object, so you'd have two methods with an identical signature, thus requiring name mangling. Using the method from C# would become very inconvenient, both because of the unexpected name, but also because of the fact that it isn't at all clear what argument type is expected (and passing an incorrect type will give odd results, like ClassCastException or IncompatibleClassChangeError).

Enter the value type wrapper. Here is how the java.lang.CharSequence interface is compiled now (pseudo code):

public struct CharSequence {
  public interface __Interface {
    char charAt(int i);
    // ... other methods ...
  }
  public object __ref;
  public static CharSequence Cast(object o) {
    CharSequence s;
    if(o is string) {
      s.__ref = o;
      return s;
    }
    s.__ref = (__Interface)o;
    return s;
  }
  public static bool IsInstance(object o) {
    return o is string || o is __Interface;
  }
  public object ToObject() {
    return __ref;
  }
  public static
    implicit operator CharSequence(string s) {
    CharSequence seq;
    seq.__ref = s;
    return seq;
  }
  public char charAt(int i) {
    if(__ref is string) {
      return StringHelper.charAt((string)__ref, i);
    }
    return ((__Interface)__ref).charAt(i);
  }
  // ... other methods ...
}


All types that implement CharSequence will be compiled to implement CharSequence.__Interface and will also get an implicit conversion operator to convert to CharSequence. This allows C# code to easily call any methods that take a CharSequence argument, because any valid type will be implicitly convertible to the CharSequence value type.

What Are The Downsides?

There are a few cons:

  • This trick doesn't work for arrays. So a CharSequence[] is still compiled as Object[]. Hopefully arrays are far less common, so this will not be a big issue.
  • C# code implementing CharSequence, needs to implement CharSequence.__Interface and also needs to provide an implicit conversion operator to the CharSequence value type.
  • The static methods Cast and IsInstance need to be used instead of the normal language features.
  • It is very easy to accidentally box the CharSequence value type in C#, when you pass this to Java things will get very confusing. The proper way to convert CharSequence to Object is by calling ToObject() on it.

Update: I forgot to mention that the reason that you have to implement the implicit conversion on all types that implement the interface is because C# doesn't allow implicit conversion from/to interfaces, otherwise the CharSequence value type could just have an implicit conversion from CharSequence.__Interface and we'd be done. Funnily enough, while C# doesn't allow you to define them, it turns out it does consume them, but I don't know if this is guaranteed behavior or a bug. I need to find this out. If it turns out to be correct, then I'll add the implicit conversion operator to the value types, that makes life for C# implementers a tiny bit easier.

All in all, I think this is an improvement, but obviously not perfect. As always, feedback is appreciated.

The new snapshots: binaries and complete.

Friday, 17 October 2003 15:57:05 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Thursday, 16 October 2003
GNU Classpath Meeting

We had a small GNU Classpath meeting on Tuesday at the Linux Kongress in Saarbr├╝cken, Germany. I drove down there for just that day. There were 6 others there: Dalibor Topic (Kaffe), Chris Gray (Wonka), Mark Wielaard (GNU Classpath), Sascha Brawer (GNU Classpath), Jean Daniel Fekete (Agile2D), Patrik Reali (JAOS).

I enjoyed it very much. The discussions were interesting and it was nice to finally meet some of the people involved with GNU Classpath.

We talked about how to implement Java2D. Jean Daniel has worked on Agile2D a Java2D implementation based on OpenGL. This is a nice starting point and OpenGL is available on virtually every platform. This would work for IKVM.NET as well. Sascha pointed out that I should really implement Java2D on top of the .NET System.Drawing classes (based on GDI+). That would be cool, but I think it would be too much work (for me).

I also had an interesting discussion with Patrik about JAOS, a JVM on top of Oberon (a sort of object oriented OS, if I understand it correctly). He faces some of the same problems as I do, with the integration of the two object models.

One of the conclusions of the meeting was that GNU Classpath is alive and kicking (and doing very well in terms of progress being made), but we need a little more publicity. So consider this my small drop in the bucket. If you are a Java developer and want to improve the state of free JVMs, please visit the GNU Classpath web site and consider helping out. In the near future there will be a new task list detailing some of the work that needs to be done (and it will include estimated required effort and skill set).

Thursday, 16 October 2003 14:36:50 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]
# Friday, 03 October 2003
Vacation

I'm going on vacation again :-) I'll be back on the 12th.

I just realised I haven't blogged for a whole month. It's not because there hasn't been any progress. I've been experimenting with a way to make ghost interfaces more usable from other languages. More on that when I get back.

Friday, 03 October 2003 21:53:33 (W. Europe Daylight Time, UTC+02:00)  #    Comments [8]
# Monday, 01 September 2003
Constructing Inner Classes

Alan Macek pointed out an interesting bug in the verifier.

First, some background. Let's look at this code fragment:

class Test {
    Test() {
      new Runnable() {
          public void run() {}
      };
    }
    public static void main(String[] args) {
      new Test();
    }
}

Inner classes aren't supported by the VM, it's a compiler fiction, so when you compile the above code you get the equivalent of the following:

class Test {
    Test() {
      new Test$1(this);
    }
    public static void main(String[] args) {
      new Test();
    }
}
 
class Test$1 implements Runnable {
    private Test outer;
    Test$1(Test outer) {
      this.outer = outer;
    }
    public void run() {}
}

The interesting bit here is that the outer field is initialized after the base class constructor runs. In this case the base class is Object so this isn't significant, but when you have a base class that calls virtual methods it is:

abstract class Base {
    Base() {
      run();
    }
    public abstract void run();
}
 
class Test {
    Test() {
      new Base() {
          public void run() {
            System.out.println(Test.this);
          }
      };   
    }
    public static void main(String[] args) {
      new Test();
    }
}

When you compile this and run it, it prints out: null. The outer this reference is not yet initialized when the Base constructor runs.

However, when you compile it with the -target 1.4 option, it prints out: Test@17182c1 (or something similar). So clearly an interesting change was made to the compiler.

Let's take a look at the bytecode of the constructor of the original code fragment:

<init>(LTest;)V
0  aload_0
1  invokespecial java/lang/Object/<init>()V
4  aload_0
5  aload_1
6  putfield Test$1/this$0 Test
9  return

First, the base class constructor is called and then the this$0 field is initialized.

Now take a look at the same method, but now as compiled with the -target 1.4 option:

<init>(LTest;)V
0  aload_0
1  aload_1
2  putfield Test$1/this$0 Test
5  aload_0
6  invokespecial java/lang/Object/<init>()V
9  return

The order of the two steps is reversed. Now it first initializes the this$0 field and then calls the base class constructor.

I had missed the fact that it is legal to initialize fields in an unitialized object. So when the latter code was run, the verifier compiled that an uninitialized object reference was used.

The big question is why the compiler only does this when the -target 1.4 option is specified. This rule of allowing instance fields of unitialized object to be written has always been in the JVM specification, as far as I can tell.

New Snapshots

Friday, I put up new snapshots. Source and binaries and binaries only.

Changes since the last (on the blog announced) snapshot:

  • New version of netexp based on Java reflection (all .NET to Java mapping is now in the IKVM reflection runtime).
  • To support netexp, the .NET type reflection is now much better. This includes support for delegates and byref arguments. Instead of lower casing the .NET namespaces, all .NET types are now prefixed with the "cli." package name. Remapped this are also available and appear as final classes without constructors, but with static methods for all instance methods (taking the remapped type as the first argument). Constructors are available as static __new methods.
  • Bootstrap class loader is now a flat type space. All loaded assemblies are searched for bootstrap types when Class.forName is used.
  • Enums are now treated as other value types (i.e. boxed). This allows better accesses to overloaded .NET methods.
  • Lots of clean up and refactoring to support the new reflection capabilities.
  • Various bug fixes.
  • Updated to work with latest GNU Classpath CVS.
Monday, 01 September 2003 16:32:56 (W. Europe Daylight Time, UTC+02:00)  #    Comments [9]
# Sunday, 17 August 2003
More discussion on netexp and Class.forName

Stuart commented:

Well, I think it's really ugly, but I've run out of better proposals ;) I have lots of questions and few answers - here are some...

Is it possible for the same class to have multiple names? (There must already be *some* concessions to this, because, for example, "System.Exception" is also known as "java.lang.Throwable" - similarly for "Object" and "String").

That's right. In general it is not possible for a class to have multiple names, but the three classes you name are special cases. They can have multiple names because Java code will never encounter instances of them (instances will always appear as java.lang.Object, java.lang.String and java.lang.Throwable). The IKVM reflection code knows this and can make it appear so that System.Object, System.String and System.Exception are final classes without constructors and with only static methods. That way Java code will be able to call (almost) all .NET methods on these types.

I wonder if there could be some concessions for "well-known" assemblies, such as corlib, System, and the various System.* and Microsoft.* assemblies that ship with the framework.

How does this whole thing work with Mono which doesn't support strong names? What about unsigned assemblies?

It really isn't about strong names. The real issue is that in Java class identities are resolved based on the class name and class loader hierarchy, while in .NET type identities are resolved based on type name, assembly name and binding policy. Those two models are very different and the trick is to find a way to map one onto the other.

How does this "round-trip"? In other words, if I use ikvmc to compile some Java code into a .NET DLL, use netexp to export that DLL, and try to use its classes, do I now need to fully-qualify them?

No, you wouldn't need to fully qualify them, but you would need to make sure that the first DLL gets loaded into the AppDomain before you do any Class.forName() on it. This is essentially the problem I'm trying to solve for .NET types, but for round-tripped code I don't think it is solvable.

I guess my primary feeling is that all these solutions seem to make things worse than they currently are, where for the *most* part things "just work", and I don't have to understand strong naming or any of the other details of the .NET assembly loading model to be able to seamlessly interoperate between the two languages. I wish there were some way to solve the internal issues while still preserving that niceness-to-use.

I agree. Also my previous proposal also has a problem, for one thing, it still has problems with assembly versioning. If you run on a later version of the .NET framework the system assembly version no longer match the ones in the Java class name and thus Class.getName() will return a different name then the one used to load the class.

Obmoloc wrote:

A agree with Stuart. I think that the following code must work always:

assert Class.forName(x).getName() == x.intern();

Assuming you mean: Class.forName(x).getName().equals(x), I agree, but that is the easy one, it should also hold that for a Class c with a no-arg constructor: c.newInstance().getClass() == c

Given that, Class.forName("NET.System.String") should throw a ClassNotFoundException, as for Class.forName("NET.System.Exception"), and for Class.forName("NET.System.Object").

I believe that there is little or no need to call those methods, and so to import such classes.

I already explained how in response to Stuart's question, let me address the why here. In order to implement java.lang.String I have to write some code (System.String doesn't have equivalent methods of everything java.lang.String has). It's easiest to write this code in Java, because if I write it in C# I have to manually handle some special cases that the compiler handles for Java code. Hence I need to have access to the System.String methods. See StringHelper.java for how it currently works. This isn't the right way, while writing that I thought of the remapping of instance methods (and constructors) to static methods, but I haven't implemented that.

If one needs to know, in a generic way, wich Java class represents a .NET class, a new method should be enough for that. For example, Class.NETforName("NET.System.String") should return the same value that Class.forName("java.lang.String"). Another method could be Class.javaNameFromNETName, such as Class.javaNameFromNETName("NET.System.Exception") returns "java.lang.Throwable".

This wasn't the problem I was trying to solve and I don't think there is any need to know this.

So, I agree about using a prefix, but I suggest not using "NET.", and use "org.cli." instead, or something like this. "NET." is just too generic.

Using someone else's domain name isn't that great either. How about simply "cli"?

Jonathan Pierce commented:

I guess I don't fully understand the problem but I really dislike the idea of requiring prefixes when referencing or importing classes.

Do you mean prefixing in general or just the very long assembly name goo?

Why does the netexp implementation require that the class literal (which is compiled using Class.forName()) for statically linked netexp generated classes be in the classpath?

Currently, the class name doesn't contain enough information to resolve to a type, only when the netexp generated class is loaded IKVM notices the attribute in there that tells it what assembly the type lives in. I didn't want to search all loaded assemblies because that makes the behavior non-deterministic (sort of). If you run the class literal before the assembly gets loaded it would fail, but after it gets loaded it would work.

An Alternate Proposal

I've pretty much given up hope that there is a perfect solution to this problem, but I would like to suggest this simplified solution and see how everyone feels about this.

Basic idea: As soon as an assembly gets loaded into the AppDomain it becomes part of the boot classpath and they are "loaded" by the bootstrap class loader (to be clear, each assembly does not live in its own class loader).

Pros:

  • Simple model.
  • It mostly just works.
  • Consistent with the way precompiled Java code is treated.
  • It's basically the existing model.

Cons:

  • No way to deal with class name clashes.
  • Non-deterministic. In case of name clashes, the first one encountered is returned. Class.forName() only works if the assembly was already loaded somehow. However, to make it more usable, Class.forName() will also allow the class name to be an assembly qualified type name. In this case the class returned will have a different name from the one requested.

Why not have a class loader per assembly?

It turns out that having a class loader per assembly doesn't really solve anything. For the system to be usable, all assembly class loaders would have to be linked together lineairly. Here is a diagram:

            bootstrap class loader
                      |
                   mscorlib
                      |
                   System
                      |
              (other assemblies)
                      |
             extension class loader
                      |
            application class loader

Each class loader must always ask its parent to load a particular class first, so when a type is defined in mscorlib there can never be another type with that same name, even if it would be loaded by a different class loader. As an aside, as some of you may know, some (most?) J2EE application servers get around this problem by violating the class loader rules (not calling the parent class loader first), in this case however that wouldn't work (and arguably in the J2EE case it doesn't work either).

Using a tree shaped class loader hierarchy does allow multiple classes with the same name, but it only works when each leaf of the hierarchy is basically a separate application.

Comments?

Sunday, 17 August 2003 13:41:13 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Friday, 15 August 2003
More on interaction with .NET types

In yesterday's item I didn't do a good job of explaining the issue, so today I'll try again and respond to the comments as well.

First, let me start by explaining the current situation. There are two ways to get access to a .NET type:

  • netexp
    When you run netexp on a .NET assembly it generates Java classes for all public .NET types in the assembly. The Java classes contain no code, they're just stubs so that you can use a regular Java compiler to code against .NET types. When IKVM.NET loads such a stub, it notices that the class file contains a special attribute that says "this is a netexp stub class, please redirect to .NET type such-and-such". Note that the attribute contains an assembly qualified type name, because in general you can only load a type in .NET when you have its assembly qualified name.
  • Class.forName()
    Calling Class.forName() with an assembly qualified name will cause a .NET type to be loaded. The name of the class that is returned is the full name (i.e. not including the assembly part).

There are several problems with the current approach:

  • The name of a class loaded doesn't match with the name it was loaded with.
  • A type can be loaded under different names.
  • The class literal (which is compiled using Class.forName()) for netexp generated classes only works if the netexp generated class is in the classpath (even if the code was statically compiled).

Stuart commented:

Have you considered only lowercasing the "system" namespace and no others? Or perhaps (to be really safe) only lowercasing top-level namespaces that match the name of a class in java.lang?

Also, would it be possible for the implementation of Class.forName to do remapping of classnames from normal java ones to assembly qualified ones? If so, would that solve the problem? (I'm not entirely sure I understand what the problem is, so I'm not sure)

Only lowercasing the system namespace isn't really a generic solution, because any .NET namespace that is the same as a Java class in the java.lang package would cause problems. Treating all names in the java.lang package specially feels like a kludge. Using a special prefix like Brian suggets would be a better idea.

In regards to the second point, in .NET you really need to have an assembly qualified name for a type. Searching all available assemblies isn't feasible.

I responded to Stuart:

One of the problems is that all .NET types are loaded by the bootstrap class loader, so they have to have distinct names.

I just had an another idea. It might be a good idea to encode the assembly qualified name in the package. For example for the System.String type:

assembly.mscorlib.1.0.5000.0.neutral.b77a5c561934e089.System.String

This has the advantage that you can use "import" in most cases (although not for String) to use the short name.

I have to think about it, but I think I like it.

After thinking about it some more, I still really like this idea. It has three huge advantages: 1) There is a natural bi-directional mapping between the class name and the .NET type name, 2) Both netexp and Class.forName can use the same naming scheme and 3) The netexp generated classes aren't needed at runtime (or ikvmc compile time).

In response to my response Stuart commented:

Hmmm... how fundamental a rearchitecting would it take to have a classloader per assembly? Then there would be a trivial isomorphism between Java's "names are unique within a classloader" and .NET's "names are unique within a strongly-named assembly", if I'm understanding correctly.

Having a class loader per assembly is tempting, but it doesn't solve the issue of finding the .NET type (you still need to know in what assembly a type lives). This would be solvable by introducing a new API to get a class loader corresponding to an assembly, but that obviously has the downside that it doesn't interoperate very well with existing Java code. For example, the IKVM.NET AWT implementation lives in a .NET assembly (written in C#) and is loaded by GNU Classpath using Class.forName(). This only works if Class.forName() has enough information about the assembly.

Brian Sullivan commented:

I don't care for the occasional lowercasing ideas.

It could be that all .NET classes would begin with NET. from the Java perspective.

NET.System.String
NET.Some.Other.Class

import NET.*; // import all of .NET

This makes it clear where the class is defined and clearly separates all of .NET into its own package.

The prefixing is a good idea to get around the System name clash. BTW, import NET.* doesn't hierarchically import all packages, but only the classes in the NET package.

Does anyone have anything against the newly proposed name mangling scheme? Obviously the details need to be worked out (my proposed package name above doesn't actually compile), but what about the general idea?

Friday, 15 August 2003 15:09:27 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]