# Sunday, 17 August 2003
More discussion on netexp and Class.forName

Stuart commented:

Well, I think it's really ugly, but I've run out of better proposals ;) I have lots of questions and few answers - here are some...

Is it possible for the same class to have multiple names? (There must already be *some* concessions to this, because, for example, "System.Exception" is also known as "java.lang.Throwable" - similarly for "Object" and "String").

That's right. In general it is not possible for a class to have multiple names, but the three classes you name are special cases. They can have multiple names because Java code will never encounter instances of them (instances will always appear as java.lang.Object, java.lang.String and java.lang.Throwable). The IKVM reflection code knows this and can make it appear so that System.Object, System.String and System.Exception are final classes without constructors and with only static methods. That way Java code will be able to call (almost) all .NET methods on these types.

I wonder if there could be some concessions for "well-known" assemblies, such as corlib, System, and the various System.* and Microsoft.* assemblies that ship with the framework.

How does this whole thing work with Mono which doesn't support strong names? What about unsigned assemblies?

It really isn't about strong names. The real issue is that in Java class identities are resolved based on the class name and class loader hierarchy, while in .NET type identities are resolved based on type name, assembly name and binding policy. Those two models are very different and the trick is to find a way to map one onto the other.

How does this "round-trip"? In other words, if I use ikvmc to compile some Java code into a .NET DLL, use netexp to export that DLL, and try to use its classes, do I now need to fully-qualify them?

No, you wouldn't need to fully qualify them, but you would need to make sure that the first DLL gets loaded into the AppDomain before you do any Class.forName() on it. This is essentially the problem I'm trying to solve for .NET types, but for round-tripped code I don't think it is solvable.

I guess my primary feeling is that all these solutions seem to make things worse than they currently are, where for the *most* part things "just work", and I don't have to understand strong naming or any of the other details of the .NET assembly loading model to be able to seamlessly interoperate between the two languages. I wish there were some way to solve the internal issues while still preserving that niceness-to-use.

I agree. Also my previous proposal also has a problem, for one thing, it still has problems with assembly versioning. If you run on a later version of the .NET framework the system assembly version no longer match the ones in the Java class name and thus Class.getName() will return a different name then the one used to load the class.

Obmoloc wrote:

A agree with Stuart. I think that the following code must work always:

assert Class.forName(x).getName() == x.intern();

Assuming you mean: Class.forName(x).getName().equals(x), I agree, but that is the easy one, it should also hold that for a Class c with a no-arg constructor: c.newInstance().getClass() == c

Given that, Class.forName("NET.System.String") should throw a ClassNotFoundException, as for Class.forName("NET.System.Exception"), and for Class.forName("NET.System.Object").

I believe that there is little or no need to call those methods, and so to import such classes.

I already explained how in response to Stuart's question, let me address the why here. In order to implement java.lang.String I have to write some code (System.String doesn't have equivalent methods of everything java.lang.String has). It's easiest to write this code in Java, because if I write it in C# I have to manually handle some special cases that the compiler handles for Java code. Hence I need to have access to the System.String methods. See StringHelper.java for how it currently works. This isn't the right way, while writing that I thought of the remapping of instance methods (and constructors) to static methods, but I haven't implemented that.

If one needs to know, in a generic way, wich Java class represents a .NET class, a new method should be enough for that. For example, Class.NETforName("NET.System.String") should return the same value that Class.forName("java.lang.String"). Another method could be Class.javaNameFromNETName, such as Class.javaNameFromNETName("NET.System.Exception") returns "java.lang.Throwable".

This wasn't the problem I was trying to solve and I don't think there is any need to know this.

So, I agree about using a prefix, but I suggest not using "NET.", and use "org.cli." instead, or something like this. "NET." is just too generic.

Using someone else's domain name isn't that great either. How about simply "cli"?

Jonathan Pierce commented:

I guess I don't fully understand the problem but I really dislike the idea of requiring prefixes when referencing or importing classes.

Do you mean prefixing in general or just the very long assembly name goo?

Why does the netexp implementation require that the class literal (which is compiled using Class.forName()) for statically linked netexp generated classes be in the classpath?

Currently, the class name doesn't contain enough information to resolve to a type, only when the netexp generated class is loaded IKVM notices the attribute in there that tells it what assembly the type lives in. I didn't want to search all loaded assemblies because that makes the behavior non-deterministic (sort of). If you run the class literal before the assembly gets loaded it would fail, but after it gets loaded it would work.

An Alternate Proposal

I've pretty much given up hope that there is a perfect solution to this problem, but I would like to suggest this simplified solution and see how everyone feels about this.

Basic idea: As soon as an assembly gets loaded into the AppDomain it becomes part of the boot classpath and they are "loaded" by the bootstrap class loader (to be clear, each assembly does not live in its own class loader).

Pros:

  • Simple model.
  • It mostly just works.
  • Consistent with the way precompiled Java code is treated.
  • It's basically the existing model.

Cons:

  • No way to deal with class name clashes.
  • Non-deterministic. In case of name clashes, the first one encountered is returned. Class.forName() only works if the assembly was already loaded somehow. However, to make it more usable, Class.forName() will also allow the class name to be an assembly qualified type name. In this case the class returned will have a different name from the one requested.

Why not have a class loader per assembly?

It turns out that having a class loader per assembly doesn't really solve anything. For the system to be usable, all assembly class loaders would have to be linked together lineairly. Here is a diagram:

            bootstrap class loader
                      |
                   mscorlib
                      |
                   System
                      |
              (other assemblies)
                      |
             extension class loader
                      |
            application class loader

Each class loader must always ask its parent to load a particular class first, so when a type is defined in mscorlib there can never be another type with that same name, even if it would be loaded by a different class loader. As an aside, as some of you may know, some (most?) J2EE application servers get around this problem by violating the class loader rules (not calling the parent class loader first), in this case however that wouldn't work (and arguably in the J2EE case it doesn't work either).

Using a tree shaped class loader hierarchy does allow multiple classes with the same name, but it only works when each leaf of the hierarchy is basically a separate application.

Comments?

Sunday, 17 August 2003 13:41:13 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Friday, 15 August 2003
More on interaction with .NET types

In yesterday's item I didn't do a good job of explaining the issue, so today I'll try again and respond to the comments as well.

First, let me start by explaining the current situation. There are two ways to get access to a .NET type:

  • netexp
    When you run netexp on a .NET assembly it generates Java classes for all public .NET types in the assembly. The Java classes contain no code, they're just stubs so that you can use a regular Java compiler to code against .NET types. When IKVM.NET loads such a stub, it notices that the class file contains a special attribute that says "this is a netexp stub class, please redirect to .NET type such-and-such". Note that the attribute contains an assembly qualified type name, because in general you can only load a type in .NET when you have its assembly qualified name.
  • Class.forName()
    Calling Class.forName() with an assembly qualified name will cause a .NET type to be loaded. The name of the class that is returned is the full name (i.e. not including the assembly part).

There are several problems with the current approach:

  • The name of a class loaded doesn't match with the name it was loaded with.
  • A type can be loaded under different names.
  • The class literal (which is compiled using Class.forName()) for netexp generated classes only works if the netexp generated class is in the classpath (even if the code was statically compiled).

Stuart commented:

Have you considered only lowercasing the "system" namespace and no others? Or perhaps (to be really safe) only lowercasing top-level namespaces that match the name of a class in java.lang?

Also, would it be possible for the implementation of Class.forName to do remapping of classnames from normal java ones to assembly qualified ones? If so, would that solve the problem? (I'm not entirely sure I understand what the problem is, so I'm not sure)

Only lowercasing the system namespace isn't really a generic solution, because any .NET namespace that is the same as a Java class in the java.lang package would cause problems. Treating all names in the java.lang package specially feels like a kludge. Using a special prefix like Brian suggets would be a better idea.

In regards to the second point, in .NET you really need to have an assembly qualified name for a type. Searching all available assemblies isn't feasible.

I responded to Stuart:

One of the problems is that all .NET types are loaded by the bootstrap class loader, so they have to have distinct names.

I just had an another idea. It might be a good idea to encode the assembly qualified name in the package. For example for the System.String type:

assembly.mscorlib.1.0.5000.0.neutral.b77a5c561934e089.System.String

This has the advantage that you can use "import" in most cases (although not for String) to use the short name.

I have to think about it, but I think I like it.

After thinking about it some more, I still really like this idea. It has three huge advantages: 1) There is a natural bi-directional mapping between the class name and the .NET type name, 2) Both netexp and Class.forName can use the same naming scheme and 3) The netexp generated classes aren't needed at runtime (or ikvmc compile time).

In response to my response Stuart commented:

Hmmm... how fundamental a rearchitecting would it take to have a classloader per assembly? Then there would be a trivial isomorphism between Java's "names are unique within a classloader" and .NET's "names are unique within a strongly-named assembly", if I'm understanding correctly.

Having a class loader per assembly is tempting, but it doesn't solve the issue of finding the .NET type (you still need to know in what assembly a type lives). This would be solvable by introducing a new API to get a class loader corresponding to an assembly, but that obviously has the downside that it doesn't interoperate very well with existing Java code. For example, the IKVM.NET AWT implementation lives in a .NET assembly (written in C#) and is loaded by GNU Classpath using Class.forName(). This only works if Class.forName() has enough information about the assembly.

Brian Sullivan commented:

I don't care for the occasional lowercasing ideas.

It could be that all .NET classes would begin with NET. from the Java perspective.

NET.System.String
NET.Some.Other.Class

import NET.*; // import all of .NET

This makes it clear where the class is defined and clearly separates all of .NET into its own package.

The prefixing is a good idea to get around the System name clash. BTW, import NET.* doesn't hierarchically import all packages, but only the classes in the NET package.

Does anyone have anything against the newly proposed name mangling scheme? Obviously the details need to be worked out (my proposed package name above doesn't actually compile), but what about the general idea?

Friday, 15 August 2003 15:09:27 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]
# Thursday, 14 August 2003
Workarounds

This week I finished the support for missing classes. This allows Eclipse to run without the -Xbootclasspath workaround.

To run Eclipse (on Windows), you can now simply go to the Eclipse directory and type:

   eclipse -vm \ikvm\bin\ikvm.exe

The verifier and compiler had to be changed to support, what I call in the code, unloadable classes (probably should have called it missing classes). Whenever the verifier encounters a reference of a type that cannot be loaded, it treats it as a special type (kind of like the null-type) and allows (almost) any operation to succeed. When the compiler encounters this type, it instead of generating CIL instruction to implement a particular instruction, it generates a call to a helper method in ByteCodeHelper that implements the instruction using reflection. At the moment the reflection results are not cached, so it could be made more efficient by adding caching, but since this shouldn't happen that often, this is not a high priority.

java.lang.CharSequence

I also finally implemented support for (what I've termed) ghost interfaces. Ghost interfaces are interfaces that are implemented by remapped types. So, for example, java.lang.String (really System.String) appears to implement java.lang.CharSequence, thus java.lang.CharSequence is a ghost interface. When a reference of a ghost interface type is passed around, it really is an object reference (so that it can contain references to types that actually implement the interface, as well as types that only appear to implement the type).

Here is an example:

CharSequence toUpperCase(CharSequence seq) {
  StringBuffer buf = new StringBuffer();
  int len = seq.length();
  for(int i = 0; i < len; i++) {
    char c = seq.charAt(i);
    c = Character.toUpperCase(c);
    buf.append(c);
  }
  return buf;
}

This is compiled as:

Object toUpperCase(Object seq) {
  StringBuffer buf = new StringBuffer();
  int len;
  if(seq instanceof String) {
    len = ((String)seq).length();
  } else {
    len = ((CharSequence)seq).length();
  }
  for
(int i = 0; i < len; i++) {
    char c;
    if(seq instanceof String) {
      c = ((String)seq).charAt(i);
    } else {
      c = ((CharSequence)seq).charAt(i);
    }
    c = Character.toUpperCase(c);
    buf.append(c);
  }
  return buf;
}

There are some downsides to this approach:

  • Performance cost of the type check and the cast. BTW, since System.String is a sealed type, the type check can (theoretically) be very efficient.
  • Type is erased in the method signature, causing problems if an identifical signature already exists (the CLR supports a nice mechanism to workaround this, but unfortunately Reflection.Emit currently doesn't expose this functionality).
  • When this method is statically compiled and called from, for example, C#, the signature is confusing.

An alternative approach would be to wrap each String object when it needs to be treated as a CharSequence, but is harder to implement (object identity has to be preserved) and it isn't clear to me that it would be more efficient or elegant.

On the upside, this is a totally generic solution, there is no special support for String, any remapped type (i.e. type in map.xml) can declare to support any interface, and the compiler will automatically do the right thing. It is also used to make java.lang.Throwable (i.e. System.Exception) appear to implement java.io.Serializable. At the moment it isn't used for arrays (which should appear to implement java.io.Serializable and java.lang.Cloneable), but it would be easy to add this.

Oh, and since java.lang.String now implements CharSequence, I can now use the StringBuffer implementation from GNU Classpath instead of remapping it to System.Text.StringBuilder.

Reflecting on .NET types

Another thing I've been working on is integration with .NET types. When you want to use a .NET type from within Java you have two options: 1) use netexp to generate a jar containing stubs for the .NET classes, so you can statically compile against the .NET types, or 2) use reflection against the .NET type.

Something I dislike about netexp is that it includes remapping logic to map .NET types and signatures to Java compatible stuff. For example, when it encounters an enum, it generates a final Java class with public static final members or when it encounters a signature with a byref argument, it turns that into an array argument. Now this is all very nice, because it allows Java code to use most of the .NET features that Java doesn't really support, but the part I don't like is that this remapping is also done in the IKVM.NET runtime, because when it compiles Java code that was compiled against a netexp generated jar, it needs to do some of the same translations. This duplication of the remapping logic is obviously not a good thing. So I want to rewrite netexp in Java and make it use Java reflection to interrogate the .NET types, that way all the remapping is done in one place, the IKVM.NET runtime.

However, while thinking about this I realized that there is a problem with respect to type identity. There are two ways a .NET type can become visible in Java, netexp and Class.forName. Both have problems with the class name. Netexp converts to namespaces to lowercase, because Java compiler don't like the System namespace (System binds to the java.lang.System class and is not considered as a package name) and Class.forName requires the assembly qualified type name (e.g. "System.String, mscorlib, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"). There are several ways to handle this:

  1. Only allow .NET types to be visible through netexp generated classes. Obviously, I don't like this, because it would be very limiting and my plan to rewrite netexp in Java wouldn't work.
  2. Allow .NET types to be visible through netexp with one name (e.g. system.String) and through Class.forName with another name (i.e. the assembly qualified name). The problem this causes is that once you create an instance of a class and call getClass on that instance, which of the two Class objects should it then return?
  3. Use a universal name mangling scheme. Both netexp and Class.forName would represent System.String as, for example, "System_String__mscorlib__Version_1_0_5000_0__ Culture_neutral__PublicKeyToken_b77a5c561934e089". I think this would work pretty well, but the obvious downside is that it makes the Java source code totally unreadable. Something that still would be an issue is how Java code reacts when it tries to load one class and then gets another (in the face of .NET binding policy). For example, it would be possible to load the above String, but then get a Class object that returns "..._Version_2_0_..." as its name, because the app is running on some future version of the CLR.

Maybe there are other solutions I haven't thought of, but at the moment I'm thinking of going with number 2.

I updated the snapshots on Tuesday. Binaries and source.

Thursday, 14 August 2003 12:06:29 (W. Europe Daylight Time, UTC+02:00)  #    Comments [5]
# Wednesday, 06 August 2003
PDC 2003

I'm attending the PDC in October. I'd love to have a chat with any readers of my blog, so if you'll be there as well, send me an e-mail.

Thanks to Jeff Sandquist for the graphic.

Wednesday, 06 August 2003 18:01:59 (W. Europe Daylight Time, UTC+02:00)  #    Comments [1]
# Thursday, 24 July 2003
GridBagLayout and more AWT

I did some work on the GNU Classpath GridBagLayout and it is now almost done (a large part was already done by Michael Koch). Obviously, writing the GridBagLayout code requires a good understanding of how it works. I never bothered to really understand the GridBagLayout, but I did use it quite frequently, a couple of years ago. I remember it was mostly a process of trial and error. After taking the time to figure out how it works, I must say that I actually really like it. It has a somewhat bad reputation in the Java community, but I think it's quite nice.

I can't leave it on that positive note, of course ;-) I have to criticize it as well. Sun has this really nasty habit of renaming virtual methods (adding aliases). This is not a good idea, to put it mildly. In JDK 1.4 they introduced: getLayoutInfo, adjustForGravity, getMinSize and arrangeGrid. These methods offer no new functionality, they are "replacements" for GetLayoutInfo, AdjustForGravity, GetMinSize and ArrangeGrid. So just because someone didn't abide by the method naming rules, some idiot now decides to enormously complicate the class.

You might think, what's the big deal? The problem is that when you're overriding a virtual method that has two names, you don't know which one you should override. Overriding both usually doesn't work either because you usually want to call the super class implementation one of which in turn usually calls the other version of the method, thus resulting in infinite recursion.

Anyway, I made new snapshots. Source and binaries.

What's new?

  • More AWT support, still nowhere near usable. But see these screenshots for some non-trivial stuff that is (partially) working.
  • Statically compiled classes are now annotated with a custom attribute for each interface they implement. This enables the Class.getInterfaces() to work correctly for statically compiled classes.
  • Fixed a bug in Method.getModifiers() for constructors in statically compiled classes, that caused them to appear final.
Thursday, 24 July 2003 17:38:07 (W. Europe Daylight Time, UTC+02:00)  #    Comments [2]
# Monday, 21 July 2003
Abstract Windowing Toolkit

Originally I had titled this item "Afwul Windowing Toolkit", but then I decided to look up the dictionary definition of abstract. It's one of those words, like its close relative virtual, that we (programmers) like to use a lot using our own private (to our group) definition, but what else does it mean?

Here are some definitions courtesy of yourDictionary.com

ab·stract
adj.

  1. Considered apart from concrete existence: an abstract concept.
  2. Not applied or practical; theoretical. See Synonyms at theoretical.
  3. Difficult to understand; abstruse: abstract philosophical problems.
  4. Thought of or stated without reference to a specific instance: abstract words like truth and justice.
  5. Impersonal, as in attitude or views.
  6. Having an intellectual and affective artistic content that depends solely on intrinsic form rather than on narrative content or pictorial representation: abstract painting and sculpture.

I always thought that the abstract in AWT referred to meaning 4, which is closest to our programmer definition, but maybe the joke's on us and did the original designers (and I use the term loosely) of AWT have meanings 2, 3 and 6 in mind ;-)

As you probably guessed from the above, I did a little work on AWT support. Here are two trivial test apps that I got working:


AwtTest.java. Getting FontMetrics, Graphics and Frame insets working. 

 
FlowLayoutTest.java. Button, TextField, Label, Panel, FlowLayout and BorderLayout. Also helped me figure out how ambient properties (don't) work.

Please note that only a very small percentage of AWT is implemented at the moment, so don't expect any useful applications to actually work.

What else is new?

  • ikvmc now uses local variable name debugging information (if available) to name method arguments.
  • Updated to work with current GNU Classpath CVS.
  • Added SO_TIMEOUT and SO_REUSEADDR support to PlainDatagramSocketImpl.

BTW, the included classpath.dll contains a few Classpath AWT patches that haven't been committed to Classpath CVS yet.

Updated the binaries and source snapshots.

Monday, 21 July 2003 14:13:45 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Thursday, 10 July 2003
Benchmarks, strictfp and other floating point stuff

Mark Wielaard did some benchmarks of some of the free JVMs available.

I noted on the Classpath mailing list that the IKVM floating point performance is probably overstated, because IKVM doesn't implement FP correctly. It uses the .NET framework FP instructions and methods and those in turn are implemented using the x86 FP instructions.

The original JVM specification was very strict in specifying floating point operations. Basically, it mapped 100% onto the Sparc floating point model and this caused spec compliant FP code to be slow on Intel. I don't think any of the early VMs correctly implemented the spec so this wasn't really a problem. I don't know why early the VM implementers didn't implement the spec, maybe they felt the performance cost was too high or maybe they just didn't find it an interesting issue (the "problem" is that the Intel FP results are actually too accurate).

In JDK 1.2 Sun introduced the strictfp keyword and corresponding JVM method and class access flag. Interestingly, they loosened the default FP requirements and specified the original FP behavior for methods (or classes) marked with the strictfp keyword.

IKVM doesn't implement strictfp at the moment, but this isn't the whole story. For trigonometric* functions (e.g. Math.sin()) IKVM uses the equivalent .NET Math functions and those are (probably) implemented using the x86 FP instructions and these are not compliant with the JVM specification and I think that in this case the Intel instructions are not more accurate than required, but actually less accurate. So this is a real problem that should be fixed at some point in the future.

*This probably also applies to other Math functions, like Math.exp, Math.log, Math.sqrt, etc.

Thursday, 10 July 2003 11:08:38 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Thursday, 19 June 2003
One Year Ago Today

It's been exactly one year since I started blogging about the development of IKVM.NET. I actually started development about a month earlier on the 22nd of May.

To celebrate this, I've created a time-line of significant (or interesting) events that happened in the past year.

2002-05-22

Started development on a project called "bytecode". I started by porting my Java class file reader from Java to C#.

2002-06-19

Started my IK<<VM.NET Radio weblog about "The Development of a Java VM for .NET"

2002-06-27

"Hello, World!" runs for the very first time. Replaced the << in the name with a dot, because Radio doesn't escape the title properly.

2002-08-07

First version of netexp is released. Java code can now directly use .NET types.

2002-08-12

The static compiles makes its appearance. For the first time the binary snapshot contains a statically compiled classpath.dll.

2002-09-03

Very basic AWT support is introduced.

2002-09-09

Class loader support makes its first appearance.

2002-10-25

James runs!

2002-11-01

Zoltan Varga starts to work on getting IKVM to run on Mono.

2002-12-19

Created the IKVM.NET SourceForge project. Dropped the first dot from the name.

2002-12-28

Eclipse starts up!

2003-01-17

Zoltan Varga gets IKVM to run HelloWorld on Mono.

2003-03-13

Moved blog from Radio to BlogX.

2003-04-05

Added support for using value types from Java.

2003-05-10

Zoltan Varga gets IKVM to run Eclipse on Mono.

2003-06-19

The IKVM.NET blog celebrates its first birthday.

Thanks to everyone who contributed to IKVM.NET in the past year. It's been great fun and I hope the coming year will be as productive as this past year has been.

Thursday, 19 June 2003 11:22:49 (W. Europe Daylight Time, UTC+02:00)  #    Comments [4]
# Monday, 02 June 2003
Invokespecial

One of the more interesting bytecode instructions is invokespecial. During Java's early days this instruction was called invokenonvirtual, but in JDK 1.0.2 it was renamed to invokespecial to indicate it has some very special semantics.

Invokespecial is used in three ways, to call instance initializers (constructors),  to call base class methods non-virtually and to call private methods. It's worth mentioning that, unlike the CLR call instruction, invokespecial cannot be used to call arbitrary methods, it can only call methods in the current class or in a base class of of the current class (and then only on references of the type of the current class, or subclasses of it). The JVM's invokevirtual is very similar to the CLR's callvirt instruction. In addition to invokespecial and invokevirtual, the JVM also has invokestatic and invokeinterface to invoke static methods and interface methods, respectively. The CLR has no special instructions for that, it uses the call and callvirt instructions to call static and interface methods.

Prior to JDK 1.1, invokespecial called the exact method specified in the instruction. In JDK 1.1 this behavior was changed, because it caused versioning problems. Here is an example of what could go wrong:

Component A - version 1

public class GrandParent {
  protected void myMethod() {
    // ...
  }
}
public class Parent extends GrandParent {
}

Component B

public class Child extends Parent {
  protected void myMethod() {
    // ...
    super.myMethod();
  }
}

The compiler would compile the super.myMethod() call in Child to invokespecial GrandParent.myMethod(). Now suppose a new version of Component A is released:

Component A - version 2

public class GrandParent {
  protected void myMethod() {
    // ...
  }
}
public class Parent extends GrandParent {
  protected void myMethod() {
    // ...
    super.myMethod();
  }
}

When Component B is used (without recompiling) with this new version of Component A, the super.myMethod call in Child will still go directly to GrandParent.myMethod and this is probably not what the author of Component A had intended.

To fix this, invokespecial was changed to search the class hierarchy if the called class is a base class of the caller (from the caller's base class on up), but only if the caller's class has the ACC_SUPER bit set in the class' access_flags mask. All Java compilers since JDK 1.1 always set the ACC_SUPER flag. It's interesting to note that the current Sun JRE 1.4.1 still honors a cleared ACC_SUPER flag.

Why doesn't the CLR have an equivalent of the ACC_SUPER flag? The reason is that it isn't needed. When, for example, the C# compiler compiles a base method call, it emits a call instruction to the immediate base class of the caller, even if that class isn't the one that implements the called method[1]. When the JIT is resolving the call it searches up the class hierarchy to find the actual method.

Comparison JVM and CLR call instructions

JVM

CLR

Notes

invokespecial

call

invokespecial checks the object reference for null, call doesn't.

invokevirtual

callvirt

invokeinterface

callvirt

Like in other places where the JVM consumes interface references, the object reference doesn't have to statically implement the called interface type.

invokestatic

call

One final issue relevant to IKVM here is that invokespecial requires the object reference to be checked for null, the CLR call instruction doesn't do this (for this reason the C# compiler uses callvirt when calling non-virtual methods, except when explicitly calling a base class method of course, it always makes sure the object reference isn't null). The IKVM compiler has to insert an explicit check for null references when it is compiling invokespecial. It took me a while to come up with an efficient way of doing this. Javac faces a similar issue when it compiles the explicit instantation of an inner class:

public class Foo {
  public class Inner {}
  static void method() {
    // next line throws a NullPointerException
    ((Foo)null).new Inner();
  }
}

Why does this throw a NullPointerException? The answer may be surprising, but it is because Javac inserts a call to ((Foo)null).getClass() to make it throw a NullPointerException if the outer this is null. If it didn't do this, you'd run into a NullPointerException later on when one of Inner's methods tried to use the outer this and this would be a very surprising and hard to find bug. However, I didn't want to use this trick because it isn't particularly efficient. What I came up with is the following trick:

ldvirtftn instance string System.Object::ToString()
pop

The JIT compiles this to:

mov   eax,dword ptr [ecx]
mov   eax,dword ptr [eax+28h]

If the reference in question is null, the first instruction causes an x86 trap that the CLR translates into a NullReferenceException. The second instruction is overhead and hopefully a future version of the JIT will stop emitting it, but it isn't very expensive anyway.

Note to self: Consider adding an optimization to the IKVM compiler to detect Javac's null reference check:

invokevirtual java/lang/Object.getClass()
pop

and replace it with the more efficient ldvirtftn based check above. This could actually be an important optimization, because getClass() on IKVM is pretty expensive.

[1] When the base classes are in the same module, the C# actually emits a call to the class the defines the method, instead of to the direct base class. This is an optimization, because it saves the JIT from having to search for the method in the class hierarchy. When all classes are in the same module, there obviously aren't any versioning issues, since the module is always built as one unit.

Monday, 02 June 2003 17:56:11 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]
# Friday, 30 May 2003
Update

After a long hiatus, finally another update. Many changes, mostly clean up. I moved some of the custom attributes to a new assembly OpenSystem.Java.dll. Hopefully this will be useful for the dotGNU Java compiler.

I made new source and binaries snapshots available, these are based on the current GNU Classpath cvs code + one patch.

Friday, 30 May 2003 14:17:50 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]