# Monday, 01 September 2003
« More discussion on netexp and Class.forN... | Main | Vacation »
Constructing Inner Classes

Alan Macek pointed out an interesting bug in the verifier.

First, some background. Let's look at this code fragment:

class Test {
    Test() {
      new Runnable() {
          public void run() {}
      };
    }
    public static void main(String[] args) {
      new Test();
    }
}

Inner classes aren't supported by the VM, it's a compiler fiction, so when you compile the above code you get the equivalent of the following:

class Test {
    Test() {
      new Test$1(this);
    }
    public static void main(String[] args) {
      new Test();
    }
}
 
class Test$1 implements Runnable {
    private Test outer;
    Test$1(Test outer) {
      this.outer = outer;
    }
    public void run() {}
}

The interesting bit here is that the outer field is initialized after the base class constructor runs. In this case the base class is Object so this isn't significant, but when you have a base class that calls virtual methods it is:

abstract class Base {
    Base() {
      run();
    }
    public abstract void run();
}
 
class Test {
    Test() {
      new Base() {
          public void run() {
            System.out.println(Test.this);
          }
      };   
    }
    public static void main(String[] args) {
      new Test();
    }
}

When you compile this and run it, it prints out: null. The outer this reference is not yet initialized when the Base constructor runs.

However, when you compile it with the -target 1.4 option, it prints out: Test@17182c1 (or something similar). So clearly an interesting change was made to the compiler.

Let's take a look at the bytecode of the constructor of the original code fragment:

<init>(LTest;)V
0  aload_0
1  invokespecial java/lang/Object/<init>()V
4  aload_0
5  aload_1
6  putfield Test$1/this$0 Test
9  return

First, the base class constructor is called and then the this$0 field is initialized.

Now take a look at the same method, but now as compiled with the -target 1.4 option:

<init>(LTest;)V
0  aload_0
1  aload_1
2  putfield Test$1/this$0 Test
5  aload_0
6  invokespecial java/lang/Object/<init>()V
9  return

The order of the two steps is reversed. Now it first initializes the this$0 field and then calls the base class constructor.

I had missed the fact that it is legal to initialize fields in an unitialized object. So when the latter code was run, the verifier compiled that an uninitialized object reference was used.

The big question is why the compiler only does this when the -target 1.4 option is specified. This rule of allowing instance fields of unitialized object to be written has always been in the JVM specification, as far as I can tell.

New Snapshots

Friday, I put up new snapshots. Source and binaries and binaries only.

Changes since the last (on the blog announced) snapshot:

  • New version of netexp based on Java reflection (all .NET to Java mapping is now in the IKVM reflection runtime).
  • To support netexp, the .NET type reflection is now much better. This includes support for delegates and byref arguments. Instead of lower casing the .NET namespaces, all .NET types are now prefixed with the "cli." package name. Remapped this are also available and appear as final classes without constructors, but with static methods for all instance methods (taking the remapped type as the first argument). Constructors are available as static __new methods.
  • Bootstrap class loader is now a flat type space. All loaded assemblies are searched for bootstrap types when Class.forName is used.
  • Enums are now treated as other value types (i.e. boxed). This allows better accesses to overloaded .NET methods.
  • Lots of clean up and refactoring to support the new reflection capabilities.
  • Various bug fixes.
  • Updated to work with latest GNU Classpath CVS.
Monday, 01 September 2003 16:32:56 (W. Europe Daylight Time, UTC+02:00)  #    Comments [9]
Tuesday, 02 September 2003 17:11:50 (W. Europe Daylight Time, UTC+02:00)
The "cli." prefixing breaks round-tripping, doesn't it? :(
Stuart
Tuesday, 02 September 2003 17:48:51 (W. Europe Daylight Time, UTC+02:00)
No, it doesn't! When an assembly has the OpenSystem.Java.JavaAssembly attribute the prefix is omitted.
Tuesday, 02 September 2003 18:46:20 (W. Europe Daylight Time, UTC+02:00)
Cool! :)
Stuart
Tuesday, 02 September 2003 19:38:16 (W. Europe Daylight Time, UTC+02:00)
Actually I just realized there's still a problem. It's impossible for java code to "not care" whether an API is implemented in Java or natively in .NET.

Java code can explicitly put itself in the cli.* package so that other Java code doesn't care that it's in Java, but .NET code will then see it in the cli namespace, which isn't what's desired.

The desired behavior, for me, would be that the visible name for a given class A to another class B would only depend on what language B is written in, not what language A is written in. The only way I can think of to achieve that without breaking round-tripping is the way that you were doing it before, ugly "system" hacks and all.
Stuart
Wednesday, 03 September 2003 09:00:54 (W. Europe Daylight Time, UTC+02:00)
I guess I don't understand what you mean. Qualitatively, isn't the cli prefix the same as the namespace lowercasing? In both cases the name of the type as seen from Java is different from the "published" name of the type. I think the cli prefix is clearer (and doesn't suffer from the [theoretical] problem that namespaces that only differ by case cause problems).
Friday, 05 September 2003 16:41:09 (W. Europe Daylight Time, UTC+02:00)
It's not the fact that the name seen by Java is different from the name seen by C#, but the fact that Java code "cares" whether a particular class was implemented in Java or not (ie, the difference in behavior based on the JavaAssembly attribute).

If I'm writing Java code to be used with IKVM.NET and I want to use an API that lives in the Foo.Bar namespace, I can't just say "import Foo.Bar;" OR "import cli.Foo.Bar;" - I first have to determine whether the API in question was written in Java or in some other .NET language, and only *then* can I decide which of those import statements is appropriate.

And if the author of the Foo.Bar namespace later decides to reimplement in a different language, Java code referring to it may break because it now has a cli prefix that shouldn't be there, or is missing one that it didn't need before.

The lowercasing did, admittedly, have a similar problem, but at least an API author could avoid it by using a name that was all lowercase in the first place, because then the lowercasing transform is a no-op. There's no situation where prepending "cli." is a no-op.

And remember, I was opposed to lowercasing too, except for special cases where it was unavoidable ;) I still believe that, ugly or not, the only solution that doesn't have fatal drawbacks is to only lowercase namespaces that correspond to classes in java.lang.
Stuart
Friday, 05 September 2003 16:46:20 (W. Europe Daylight Time, UTC+02:00)
Oh, and just for completeness, arrange that if a java class is put in a package like "character.Foo" it gets translated to .NET as "Character.Foo" so that java code can put stuff into the System namespace and any other that's a java.lang class name. Then the only thing that *can't* be expressed is a .NET namespace that's equal to the *lowercased* name of a java.lang class, and since by convention .NET doesn't use lowercase names, that shouldn't be a problem.
Stuart
Friday, 05 September 2003 17:03:43 (W. Europe Daylight Time, UTC+02:00)
Oh, a few other possible namespaces couldn't be expressed either. Byte.*, Class.*, Double.*, Float.*, Long.*, Short.*, Void.*

Thinking about it a bit, I've figured out a concise way to express what I think the problem is, and I also propose a solution below. The problem is that all the proposals so far, including the old lowercasing one, don't provide a full bidirectional mapping between Java names and .NET names. With lowercasing, the problem is that the relationship is one-many. With the cli. prefix, the problem is that the relationship is many-one (or rather, two-one): cli.foo and foo can both refer to the same .NET name, depending on whether the referred-to object has the JavaAssembly attribute. In order to be a first-class .NET language, I think the relationship must be one-one.

I propose another possibility that I think works better than any I've suggested so far. Forget about lowercasing and use the "cli" prefix only for java.lang classnames. And have that mangling done in both directions and *regardless* of whether the JavaAssembly directive is present. Apart from the ugliness of specialcasing all the few dozen names in java.lang (and the problems that come up if Sun adds any new names to java.lang), it addresses all the issues I can think of. There's a direct bi-directional mapping between Java names and .NET names for *every* case except if the namespace is a keyword in one language but not the other (which .NET doesn't address between languages anyway).

(Java) <=> (.NET)
cli.System <=> System
Foo.Bar <=> Foo.Bar
cli.Foo.Bar <=> cli.Foo.Bar
system <=> system
cli.Class <=> Class
cli.system <=> cli.system
Stuart
Friday, 05 September 2003 17:13:40 (W. Europe Daylight Time, UTC+02:00)
Oh and just to be *really* theoretical (but necessary to preserve the full bidirectionality of the mapping):

cli.cli.System <=> cli.System
cli.cli.cli.System <=> cli.cli.System
cli.cli.system <=> cli.cli.system
Stuart
Name
E-mail
Home page

I apologize for the lameness of this, but the comment spam was driving me nuts. In order to be able to post a comment, you need to answer a simple question. Hopefully this question is easy enough not to annoy serious commenters, but hard enough to keep the spammers away.

Anti-Spam Question: What method on java.lang.System returns an object's original hashcode (i.e. the one that would be returned by java.lang.Object.hashCode() if it wasn't overridden)? (case is significant)

Answer:  
Comment (HTML not allowed)  

Live Comment Preview