# Sunday, 17 August 2003
More discussion on netexp and Class.forName

Stuart commented:

Well, I think it's really ugly, but I've run out of better proposals ;) I have lots of questions and few answers - here are some...

Is it possible for the same class to have multiple names? (There must already be *some* concessions to this, because, for example, "System.Exception" is also known as "java.lang.Throwable" - similarly for "Object" and "String").

That's right. In general it is not possible for a class to have multiple names, but the three classes you name are special cases. They can have multiple names because Java code will never encounter instances of them (instances will always appear as java.lang.Object, java.lang.String and java.lang.Throwable). The IKVM reflection code knows this and can make it appear so that System.Object, System.String and System.Exception are final classes without constructors and with only static methods. That way Java code will be able to call (almost) all .NET methods on these types.

I wonder if there could be some concessions for "well-known" assemblies, such as corlib, System, and the various System.* and Microsoft.* assemblies that ship with the framework.

How does this whole thing work with Mono which doesn't support strong names? What about unsigned assemblies?

It really isn't about strong names. The real issue is that in Java class identities are resolved based on the class name and class loader hierarchy, while in .NET type identities are resolved based on type name, assembly name and binding policy. Those two models are very different and the trick is to find a way to map one onto the other.

How does this "round-trip"? In other words, if I use ikvmc to compile some Java code into a .NET DLL, use netexp to export that DLL, and try to use its classes, do I now need to fully-qualify them?

No, you wouldn't need to fully qualify them, but you would need to make sure that the first DLL gets loaded into the AppDomain before you do any Class.forName() on it. This is essentially the problem I'm trying to solve for .NET types, but for round-tripped code I don't think it is solvable.

I guess my primary feeling is that all these solutions seem to make things worse than they currently are, where for the *most* part things "just work", and I don't have to understand strong naming or any of the other details of the .NET assembly loading model to be able to seamlessly interoperate between the two languages. I wish there were some way to solve the internal issues while still preserving that niceness-to-use.

I agree. Also my previous proposal also has a problem, for one thing, it still has problems with assembly versioning. If you run on a later version of the .NET framework the system assembly version no longer match the ones in the Java class name and thus Class.getName() will return a different name then the one used to load the class.

Obmoloc wrote:

A agree with Stuart. I think that the following code must work always:

assert Class.forName(x).getName() == x.intern();

Assuming you mean: Class.forName(x).getName().equals(x), I agree, but that is the easy one, it should also hold that for a Class c with a no-arg constructor: c.newInstance().getClass() == c

Given that, Class.forName("NET.System.String") should throw a ClassNotFoundException, as for Class.forName("NET.System.Exception"), and for Class.forName("NET.System.Object").

I believe that there is little or no need to call those methods, and so to import such classes.

I already explained how in response to Stuart's question, let me address the why here. In order to implement java.lang.String I have to write some code (System.String doesn't have equivalent methods of everything java.lang.String has). It's easiest to write this code in Java, because if I write it in C# I have to manually handle some special cases that the compiler handles for Java code. Hence I need to have access to the System.String methods. See StringHelper.java for how it currently works. This isn't the right way, while writing that I thought of the remapping of instance methods (and constructors) to static methods, but I haven't implemented that.

If one needs to know, in a generic way, wich Java class represents a .NET class, a new method should be enough for that. For example, Class.NETforName("NET.System.String") should return the same value that Class.forName("java.lang.String"). Another method could be Class.javaNameFromNETName, such as Class.javaNameFromNETName("NET.System.Exception") returns "java.lang.Throwable".

This wasn't the problem I was trying to solve and I don't think there is any need to know this.

So, I agree about using a prefix, but I suggest not using "NET.", and use "org.cli." instead, or something like this. "NET." is just too generic.

Using someone else's domain name isn't that great either. How about simply "cli"?

Jonathan Pierce commented:

I guess I don't fully understand the problem but I really dislike the idea of requiring prefixes when referencing or importing classes.

Do you mean prefixing in general or just the very long assembly name goo?

Why does the netexp implementation require that the class literal (which is compiled using Class.forName()) for statically linked netexp generated classes be in the classpath?

Currently, the class name doesn't contain enough information to resolve to a type, only when the netexp generated class is loaded IKVM notices the attribute in there that tells it what assembly the type lives in. I didn't want to search all loaded assemblies because that makes the behavior non-deterministic (sort of). If you run the class literal before the assembly gets loaded it would fail, but after it gets loaded it would work.

An Alternate Proposal

I've pretty much given up hope that there is a perfect solution to this problem, but I would like to suggest this simplified solution and see how everyone feels about this.

Basic idea: As soon as an assembly gets loaded into the AppDomain it becomes part of the boot classpath and they are "loaded" by the bootstrap class loader (to be clear, each assembly does not live in its own class loader).

Pros:

  • Simple model.
  • It mostly just works.
  • Consistent with the way precompiled Java code is treated.
  • It's basically the existing model.

Cons:

  • No way to deal with class name clashes.
  • Non-deterministic. In case of name clashes, the first one encountered is returned. Class.forName() only works if the assembly was already loaded somehow. However, to make it more usable, Class.forName() will also allow the class name to be an assembly qualified type name. In this case the class returned will have a different name from the one requested.

Why not have a class loader per assembly?

It turns out that having a class loader per assembly doesn't really solve anything. For the system to be usable, all assembly class loaders would have to be linked together lineairly. Here is a diagram:

            bootstrap class loader
                      |
                   mscorlib
                      |
                   System
                      |
              (other assemblies)
                      |
             extension class loader
                      |
            application class loader

Each class loader must always ask its parent to load a particular class first, so when a type is defined in mscorlib there can never be another type with that same name, even if it would be loaded by a different class loader. As an aside, as some of you may know, some (most?) J2EE application servers get around this problem by violating the class loader rules (not calling the parent class loader first), in this case however that wouldn't work (and arguably in the J2EE case it doesn't work either).

Using a tree shaped class loader hierarchy does allow multiple classes with the same name, but it only works when each leaf of the hierarchy is basically a separate application.

Comments?

Sunday, 17 August 2003 13:41:13 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Friday, 15 August 2003
More on interaction with .NET types

In yesterday's item I didn't do a good job of explaining the issue, so today I'll try again and respond to the comments as well.

First, let me start by explaining the current situation. There are two ways to get access to a .NET type:

  • netexp
    When you run netexp on a .NET assembly it generates Java classes for all public .NET types in the assembly. The Java classes contain no code, they're just stubs so that you can use a regular Java compiler to code against .NET types. When IKVM.NET loads such a stub, it notices that the class file contains a special attribute that says "this is a netexp stub class, please redirect to .NET type such-and-such". Note that the attribute contains an assembly qualified type name, because in general you can only load a type in .NET when you have its assembly qualified name.
  • Class.forName()
    Calling Class.forName() with an assembly qualified name will cause a .NET type to be loaded. The name of the class that is returned is the full name (i.e. not including the assembly part).

There are several problems with the current approach:

  • The name of a class loaded doesn't match with the name it was loaded with.
  • A type can be loaded under different names.
  • The class literal (which is compiled using Class.forName()) for netexp generated classes only works if the netexp generated class is in the classpath (even if the code was statically compiled).

Stuart commented:

Have you considered only lowercasing the "system" namespace and no others? Or perhaps (to be really safe) only lowercasing top-level namespaces that match the name of a class in java.lang?

Also, would it be possible for the implementation of Class.forName to do remapping of classnames from normal java ones to assembly qualified ones? If so, would that solve the problem? (I'm not entirely sure I understand what the problem is, so I'm not sure)

Only lowercasing the system namespace isn't really a generic solution, because any .NET namespace that is the same as a Java class in the java.lang package would cause problems. Treating all names in the java.lang package specially feels like a kludge. Using a special prefix like Brian suggets would be a better idea.

In regards to the second point, in .NET you really need to have an assembly qualified name for a type. Searching all available assemblies isn't feasible.

I responded to Stuart:

One of the problems is that all .NET types are loaded by the bootstrap class loader, so they have to have distinct names.

I just had an another idea. It might be a good idea to encode the assembly qualified name in the package. For example for the System.String type:

assembly.mscorlib.1.0.5000.0.neutral.b77a5c561934e089.System.String

This has the advantage that you can use "import" in most cases (although not for String) to use the short name.

I have to think about it, but I think I like it.

After thinking about it some more, I still really like this idea. It has three huge advantages: 1) There is a natural bi-directional mapping between the class name and the .NET type name, 2) Both netexp and Class.forName can use the same naming scheme and 3) The netexp generated classes aren't needed at runtime (or ikvmc compile time).

In response to my response Stuart commented:

Hmmm... how fundamental a rearchitecting would it take to have a classloader per assembly? Then there would be a trivial isomorphism between Java's "names are unique within a classloader" and .NET's "names are unique within a strongly-named assembly", if I'm understanding correctly.

Having a class loader per assembly is tempting, but it doesn't solve the issue of finding the .NET type (you still need to know in what assembly a type lives). This would be solvable by introducing a new API to get a class loader corresponding to an assembly, but that obviously has the downside that it doesn't interoperate very well with existing Java code. For example, the IKVM.NET AWT implementation lives in a .NET assembly (written in C#) and is loaded by GNU Classpath using Class.forName(). This only works if Class.forName() has enough information about the assembly.

Brian Sullivan commented:

I don't care for the occasional lowercasing ideas.

It could be that all .NET classes would begin with NET. from the Java perspective.

NET.System.String
NET.Some.Other.Class

import NET.*; // import all of .NET

This makes it clear where the class is defined and clearly separates all of .NET into its own package.

The prefixing is a good idea to get around the System name clash. BTW, import NET.* doesn't hierarchically import all packages, but only the classes in the NET package.

Does anyone have anything against the newly proposed name mangling scheme? Obviously the details need to be worked out (my proposed package name above doesn't actually compile), but what about the general idea?

Friday, 15 August 2003 15:09:27 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]
# Thursday, 14 August 2003
Workarounds

This week I finished the support for missing classes. This allows Eclipse to run without the -Xbootclasspath workaround.

To run Eclipse (on Windows), you can now simply go to the Eclipse directory and type:

   eclipse -vm \ikvm\bin\ikvm.exe

The verifier and compiler had to be changed to support, what I call in the code, unloadable classes (probably should have called it missing classes). Whenever the verifier encounters a reference of a type that cannot be loaded, it treats it as a special type (kind of like the null-type) and allows (almost) any operation to succeed. When the compiler encounters this type, it instead of generating CIL instruction to implement a particular instruction, it generates a call to a helper method in ByteCodeHelper that implements the instruction using reflection. At the moment the reflection results are not cached, so it could be made more efficient by adding caching, but since this shouldn't happen that often, this is not a high priority.

java.lang.CharSequence

I also finally implemented support for (what I've termed) ghost interfaces. Ghost interfaces are interfaces that are implemented by remapped types. So, for example, java.lang.String (really System.String) appears to implement java.lang.CharSequence, thus java.lang.CharSequence is a ghost interface. When a reference of a ghost interface type is passed around, it really is an object reference (so that it can contain references to types that actually implement the interface, as well as types that only appear to implement the type).

Here is an example:

CharSequence toUpperCase(CharSequence seq) {
  StringBuffer buf = new StringBuffer();
  int len = seq.length();
  for(int i = 0; i < len; i++) {
    char c = seq.charAt(i);
    c = Character.toUpperCase(c);
    buf.append(c);
  }
  return buf;
}

This is compiled as:

Object toUpperCase(Object seq) {
  StringBuffer buf = new StringBuffer();
  int len;
  if(seq instanceof String) {
    len = ((String)seq).length();
  } else {
    len = ((CharSequence)seq).length();
  }
  for
(int i = 0; i < len; i++) {
    char c;
    if(seq instanceof String) {
      c = ((String)seq).charAt(i);
    } else {
      c = ((CharSequence)seq).charAt(i);
    }
    c = Character.toUpperCase(c);
    buf.append(c);
  }
  return buf;
}

There are some downsides to this approach:

  • Performance cost of the type check and the cast. BTW, since System.String is a sealed type, the type check can (theoretically) be very efficient.
  • Type is erased in the method signature, causing problems if an identifical signature already exists (the CLR supports a nice mechanism to workaround this, but unfortunately Reflection.Emit currently doesn't expose this functionality).
  • When this method is statically compiled and called from, for example, C#, the signature is confusing.

An alternative approach would be to wrap each String object when it needs to be treated as a CharSequence, but is harder to implement (object identity has to be preserved) and it isn't clear to me that it would be more efficient or elegant.

On the upside, this is a totally generic solution, there is no special support for String, any remapped type (i.e. type in map.xml) can declare to support any interface, and the compiler will automatically do the right thing. It is also used to make java.lang.Throwable (i.e. System.Exception) appear to implement java.io.Serializable. At the moment it isn't used for arrays (which should appear to implement java.io.Serializable and java.lang.Cloneable), but it would be easy to add this.

Oh, and since java.lang.String now implements CharSequence, I can now use the StringBuffer implementation from GNU Classpath instead of remapping it to System.Text.StringBuilder.

Reflecting on .NET types

Another thing I've been working on is integration with .NET types. When you want to use a .NET type from within Java you have two options: 1) use netexp to generate a jar containing stubs for the .NET classes, so you can statically compile against the .NET types, or 2) use reflection against the .NET type.

Something I dislike about netexp is that it includes remapping logic to map .NET types and signatures to Java compatible stuff. For example, when it encounters an enum, it generates a final Java class with public static final members or when it encounters a signature with a byref argument, it turns that into an array argument. Now this is all very nice, because it allows Java code to use most of the .NET features that Java doesn't really support, but the part I don't like is that this remapping is also done in the IKVM.NET runtime, because when it compiles Java code that was compiled against a netexp generated jar, it needs to do some of the same translations. This duplication of the remapping logic is obviously not a good thing. So I want to rewrite netexp in Java and make it use Java reflection to interrogate the .NET types, that way all the remapping is done in one place, the IKVM.NET runtime.

However, while thinking about this I realized that there is a problem with respect to type identity. There are two ways a .NET type can become visible in Java, netexp and Class.forName. Both have problems with the class name. Netexp converts to namespaces to lowercase, because Java compiler don't like the System namespace (System binds to the java.lang.System class and is not considered as a package name) and Class.forName requires the assembly qualified type name (e.g. "System.String, mscorlib, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"). There are several ways to handle this:

  1. Only allow .NET types to be visible through netexp generated classes. Obviously, I don't like this, because it would be very limiting and my plan to rewrite netexp in Java wouldn't work.
  2. Allow .NET types to be visible through netexp with one name (e.g. system.String) and through Class.forName with another name (i.e. the assembly qualified name). The problem this causes is that once you create an instance of a class and call getClass on that instance, which of the two Class objects should it then return?
  3. Use a universal name mangling scheme. Both netexp and Class.forName would represent System.String as, for example, "System_String__mscorlib__Version_1_0_5000_0__ Culture_neutral__PublicKeyToken_b77a5c561934e089". I think this would work pretty well, but the obvious downside is that it makes the Java source code totally unreadable. Something that still would be an issue is how Java code reacts when it tries to load one class and then gets another (in the face of .NET binding policy). For example, it would be possible to load the above String, but then get a Class object that returns "..._Version_2_0_..." as its name, because the app is running on some future version of the CLR.

Maybe there are other solutions I haven't thought of, but at the moment I'm thinking of going with number 2.

I updated the snapshots on Tuesday. Binaries and source.

Thursday, 14 August 2003 12:06:29 (W. Europe Daylight Time, UTC+02:00)  #    Comments [5]
# Wednesday, 06 August 2003
PDC 2003

I'm attending the PDC in October. I'd love to have a chat with any readers of my blog, so if you'll be there as well, send me an e-mail.

Thanks to Jeff Sandquist for the graphic.

Wednesday, 06 August 2003 18:01:59 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]