This week I finished the support for missing
classes. This allows Eclipse to run without the -Xbootclasspath
workaround.
To run Eclipse (on Windows), you can now simply go to the Eclipse directory and
type:
eclipse -vm \ikvm\bin\ikvm.exe
The verifier and compiler had to be changed to support, what I call in the code, unloadable
classes (probably should have called it missing classes). Whenever the verifier
encounters a reference of a type that cannot be loaded, it treats it as a special
type (kind of like the null-type) and allows (almost) any operation to succeed. When
the compiler encounters this type, it instead of generating CIL instruction to implement
a particular instruction, it generates a call to a helper method in ByteCodeHelper
that implements the instruction using reflection. At the moment the reflection results
are not cached, so it could be made more efficient by adding caching, but since this
shouldn't happen that often, this is not a high priority.
java.lang.CharSequence
I also finally implemented support for (what I've termed) ghost interfaces.
Ghost interfaces are interfaces that are implemented by remapped types. So, for
example, java.lang.String (really System.String) appears to implement java.lang.CharSequence,
thus java.lang.CharSequence is a ghost interface. When a reference of a ghost interface
type is passed around, it really is an object reference (so that it can contain references
to types that actually implement the interface, as well as types that only appear
to implement the type).
Here is an example:
CharSequence
toUpperCase(CharSequence seq) {
StringBuffer buf = new StringBuffer();
int len
= seq.length();
for(int i = 0; i
< len; i++) {
char c
= seq.charAt(i);
c = Character.toUpperCase(c);
buf.append(c);
}
return buf;
}
This is compiled as:
Object toUpperCase(Object
seq) {
StringBuffer buf = new StringBuffer();
int len;
if(seq instanceof String)
{
len = ((String)seq).length();
} else {
len = ((CharSequence)seq).length();
}
for(int i = 0; i < len; i++) {
char c;
if(seq instanceof String)
{
c = ((String)seq).charAt(i);
} else {
c = ((CharSequence)seq).charAt(i);
}
c = Character.toUpperCase(c);
buf.append(c);
}
return buf;
}
There are some downsides to this approach:
-
Performance cost of the type check and the cast. BTW, since System.String is a sealed
type, the type check can (theoretically) be very efficient.
-
Type is erased in the method signature, causing problems if an identifical signature
already exists (the CLR supports a nice mechanism to
workaround this, but unfortunately Reflection.Emit currently doesn't expose this functionality).
-
When this method is statically compiled and called from, for example, C#, the signature
is confusing.
An alternative approach would be to wrap each String object when it needs to be treated
as a CharSequence, but is harder to implement (object identity has to be preserved)
and it isn't clear to me that it would be more efficient or elegant.
On the upside, this is a totally generic solution, there is no special support
for String, any remapped type (i.e. type in map.xml) can declare to support any interface,
and the compiler will automatically do the right thing. It is also used to make java.lang.Throwable
(i.e. System.Exception) appear to implement java.io.Serializable. At the moment it
isn't used for arrays (which should appear to implement java.io.Serializable and java.lang.Cloneable),
but it would be easy to add this.
Oh, and since java.lang.String now implements CharSequence, I can now use the StringBuffer
implementation from GNU Classpath instead of remapping it to System.Text.StringBuilder.
Reflecting on .NET types
Another thing I've been working on is integration with .NET types. When you want
to use a .NET type from within Java you have two options: 1) use netexp to
generate a jar containing stubs for the .NET classes, so you can statically compile
against the .NET types, or 2) use reflection against the .NET type.
Something I dislike about netexp is that it includes remapping logic to map
.NET types and signatures to Java compatible stuff. For example, when it encounters
an enum, it generates a final Java class with public static final members or when
it encounters a signature with a byref argument, it turns that into an array argument.
Now this is all very nice, because it allows Java code to use most of the .NET features
that Java doesn't really support, but the part I don't like is that this remapping
is also done in the IKVM.NET runtime, because when it compiles Java code that was
compiled against a netexp generated jar, it needs to do some of the same
translations. This duplication of the remapping logic is obviously not a good thing.
So I want to rewrite netexp in Java and make it use Java reflection to interrogate
the .NET types, that way all the remapping is done in one place, the IKVM.NET runtime.
However, while thinking about this I realized that there is a problem with respect
to type identity. There are two ways a .NET type can become visible in Java, netexp and
Class.forName. Both have problems with the class name. Netexp converts to
namespaces to lowercase, because Java compiler don't like the System namespace (System
binds to the java.lang.System class and is not considered as a package name) and Class.forName
requires the assembly qualified type name (e.g. "System.String, mscorlib, Version=1.0.5000.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089"). There are several ways to handle
this:
-
Only allow .NET types to be visible through netexp generated classes. Obviously,
I don't like this, because it would be very limiting and my plan to rewrite netexp in
Java wouldn't work.
-
Allow .NET types to be visible through netexp with one name (e.g. system.String)
and through Class.forName with another name (i.e. the assembly qualified name). The
problem this causes is that once you create an instance of a class and call getClass
on that instance, which of the two Class objects should it then return?
-
Use a universal name mangling scheme. Both netexp and Class.forName would
represent System.String as, for example, "System_String__mscorlib__Version_1_0_5000_0__
Culture_neutral__PublicKeyToken_b77a5c561934e089". I think this would work pretty
well, but the obvious downside is that it makes the Java source code totally unreadable.
Something that still would be an issue is how Java code reacts when it tries to load
one class and then gets another (in the face of .NET binding policy). For example,
it would be possible to load the above String, but then get a Class object that returns
"..._Version_2_0_..." as its name, because the app is running on some future version
of the CLR.
Maybe there are other solutions I haven't thought of, but at the moment I'm thinking
of going with number 2.
I updated the snapshots on Tuesday. Binaries and source.