Friday, 19 December 2003
Building IKVM.NET on Mono/Linux

I installed Debian 3.0r1 in VMware to work on getting IKVM.NET to build on Mono.

I put together a new snapshot, this time including a GNU Classpath snapshot, because of the compromise of the FSF CVS server my Classpath sources differ from what's available in CVS.

Here are the steps required to build IKVM.NET on Mono on Debian:

• Comment out the line
[assembly: AssemblyKeyFile("..\\..\\bytefx.snk")]
in mcs-0.29/class/ByteFX.Data/AssemblyInfo.cs.
• Make the HasShutdownStarted property in mcs-0.29/class/corlib/System/Environment.cs static and change it to return false instead of throw a NotImplementedException.
• cd mono-0.29
• ./configure && make && make install
• cd ../mcs-0.29
• make && make install
• cd jikes-1.18
• ./configure && make && make install
• mkdir nant-0.84
• cd nant-0.84
• unzip ../nant-0.84.zip
• Comment out the class constructor (static CompilerBase() { ... }) in src/NAnt.DotNet/Tasks/CompilerBase.cs.
• make clean && make
• Create a nant shell script in /usr/local/bin that contains:
#!/bin/sh /usr/local/bin/mono /usr/local/src/nant-0.84/bin/NAnt.exe "$@" • Create a dummy peverify shell script, that contains: #!/bin/sh • Download and unzip classpath.zip (don't run any of the scripts) • Download and unzip ikvm.zip • cd ikvm • nant clean • nant Note: I have not yet integrated Zoltan Varga's JNI provider for Mono and the (broken) Windows Forms based AWT is not built on Mono. Here is what's new since the last snapshot: • Changed build process to work on Mono/Linux. • Added flag to bytecode metadata table to see if an instruction can throw an exception. The compiler can use this optimize away useless exception blocks. • Changed constant pool constant handling to stop boxing the values and use type based accessors instead. • Fixed handling of ConstantValue attribute (now works for static fields regardless of whether they are final are not). • Exception remapping is now defined in map.xml. This allows more efficient exception handlers, because the compiler now understand the exception hierarchy (including the constraints imposed by the remapping). • Changed handling of netexp exported classes to be more robust in the face of errors (on Mono some of the mscorlib.jar classes are not (yet) present in mscorlib.dll). • Fixed emitting of DebuggableAttribute (the attribute was attached to the module, but it should be attached to the assembly). • Moved most of ExceptionHelper.cs to ExceptionHelper.java and changed the runtime to generate the exception mapping method from map.xml. • Fixed some ghost related bugs. • Added a test to supress the type initializer bug warning (during netexp) on runtimes that are not broken. • Moved common conversion emit operations to TypeWrapper (EmitConvStackToParameterType & EmitConvParameterToStackType). • Added test in JavaTypeImpl.Finish to make sure that we are not in "phase 1" of static compilation. During phase 1, classes are being loaded and no compilation should occur, if it does get triggered it is because of a bug in the compiler or a compilation error during compilation of the bootstrap classes. • Changed loading of java.lang.Throwable$VirtualMethods so that the ClassNotFound warning doesn't occur any more.
• Added (partial) support for private interface method implementations to reflection. This fixes a bug in netexp, that caused classes that use private interface implementation to be unusable from Java (because they appear abstract, because of the missing method).
• Removed WeakHashtable.cs. Exception mapping code is now written in Java and uses java.util.WeakHashMap.
• Removed StackTraceElement class from classpath.cs. Exception mapping code is now written in Java and uses the GNU Classpath StackTraceElement.java.
• Moved java.lang.Runtime native methods to Java (except for getVersion and nativeLoad). This is based on a new split of java.lang.Runtime and java.lang.VMRuntime that hasn't been checked into Classpath yet.
• Many changes to the bytecode compiler to emit more efficient (actually less inefficient) code for exception handlers.
• Added workaround to bytecode compiler to Improve debugging line number information.
• Various bug fixes and some clean up of bytecode compiler.
• Made ikvmc more user-friendly. It now guesses all options based on the input. You can now do "ikvmc HelloWorld.class" and it will generate HelloWorld.exe (if HelloWorld.class has a main method, if not it will generate HelloWorld.dll).
• Fixed DotNetProcess (that implement Runtime.exec) to handle environment variable duplicates properly.
• Removed support for throwing arbitrary exceptions using Thread.stop(Throwable). You can now only throw arbitrary exceptions on the current thread or ThreadDeath exceptions on other threads.
• Implemented shutdown hooks.
• Changed ikvm.exe to use a more compatible way of finding the main method and to always run the static initializer of the main class (even if the main method is missing).
• The ikvm -Xsave option is now implemented using a shutdown hook. This allows it to work even if the application terminates with System.exit().

New snapshots: just the binaries, source plus binaries and GNU Classpath.

Friday, 19 December 2003 14:56:15 (W. Europe Standard Time, UTC+01:00)      Comments [2]
Thursday, 18 December 2003
Asynchronous Exceptions

I've been working on some low hanging fruit optimizations for the code generator and I came across the following peculiar bytecode pattern used by Jikes to compile synchronized blocks:

   0  goto 6
4  monitorexit
5  athrow
7  dup
8  astore_1
9  monitorenter
11  monitorexit
12  return
Exception table:
start_pc = 3
end_pc = 5
handler_pc = 3
catch_type = java/lang/Throwable
start_pc = 10
end_pc = 12
handler_pc = 3
catch_type = java/lang/Throwable

This code results from this method:

static void main(String[] args) {
synchronized(args) {
}
}


I was confused by the first entry in the exception table. It protects part of the exception handler and points to itself. Why would you do this?

Luckily, Jikes is open source, so I went and looked at the source code (see ByteCode::EmitSynchronizedStatement). The comment in the function explains that the additional protection of the exception handler is there to deal with asynchronous exceptions. The Jikes bug database contains a better explanation of the issue.

So, it turns out that this somewhat strange looking construct is actually a perfect way to make sure that locks are always released (and only once) even if an asynchronous exception occurs. (Note that this assumes that monitorexit is an atomic instruction, wrt asynchronous exceptions, this isn't in the JVM specification[1], but it is a reasonable assumption.)

At the moment, IKVM compiles this code as follows (pseudo code):

  object local = args;
object exception = null;
Monitor.Enter(local);
try {
// this is where the body of the synchronized block would be
Monitor.Exit(local);
} catch(System.Exception x) {
exception = x;
ExceptionHelper.MapExceptionFast(x);
goto handler;
}
return;
handler:
try {
Monitor.Exit(local);
} catch(System.Exception x) {
exception = x;
ExceptionHelper.MapExceptionFast(x);
goto handler;
}
throw x;


This is obviously pretty inefficient, but more importantly, it is incorrect. If an asynchronous exception occurs at the right (or rather, wrong) moment the lock will be released twice.

The right way to compile this would be (pseudo code again):

  object local = args;
Monitor.Enter(local);
try {
// this is where the body of the synchronized block would be
} finally {
Monitor.Exit(local);
}

Of course, in the current version of the CLR, this still wouldn't be safe in the face of asynchronous exceptions, but Chris Brumme assures us that in future versions it will be.

BTW, an alternative way to compile it (which, presumably, would work correctly even in today's CLR), is to move the synchronized block into a new method that is marked with MethodImplOptions.Synchronized.

The tricky part of both of these solutions, is recognizing the code sequences that need to be compiled as a try finally clause. Various Java compilers can use different patterns (although a pretty firm clue is provided by the two exception blocks that must end exactly after the monitorexit instruction).

This is one situation where compiling bytecode instead of Java source, makes it a lot harder to do the right thing.

[1] The JVM specification actually contains an incorrect example of how to compile a synchronized block. Not only does this example not use the above protection against asynchronous exceptions, it also doesn't protect the aload_2 and monitorexit instructions at offset 8 and 9.

Thursday, 18 December 2003 11:33:18 (W. Europe Standard Time, UTC+01:00)      Comments [0]
Friday, 12 December 2003

(This entry is totally unrelated to IKVM.NET)

Last week I got my new ThinkPad T41p and it has a cool feature, a built-in accelerometer. It is used to park the harddisk when the system detects shocks or falls down ("IBM Hard Drive Active Protection System").

I reverse engineered one of the IOCTLs that can be used to read the accelerometer data and built a simple C# application that displays an artificial horizon. It includes a simple reusable class that encapsulates the communication with the device driver (the standard IBM device driver).

Source can be found here.

Not very useful, but fun stuff anyway. In Longhorn this could be used to keep the desktop level

Friday, 12 December 2003 14:10:43 (W. Europe Standard Time, UTC+01:00)      Comments [13]
Monday, 01 December 2003
Finalization and the JIT

One of the subtle differences between the JVM bytecode instruction set and the CLR instructin set is that the JVM splits object instantiation in two steps: 1) allocation and 2) initialization. The CLR combines both steps in a single instruction.

This usually isn't a big difference. The CLR way makes the verifier a little easier because it doesn't need to keep track of uninitialized references. However, in the face of finalization, the difference is actually detectable.

Here is a Java example:

class foo {
static int throwException() {
throw new Error();
}

foo(int i) {
}

public static void main(String[] args) {
try {
new foo(throwException());
} catch(Error x) {
System.gc();
System.runFinalization();
throw x;
}
}

protected void finalize() {
System.out.println("finalize");
}
}

When this class is run, it prints "finalizable" and a stack trace. Looking at the bytecode of the main method, it is obvious why:

0  new foo
3  invokestatic <Method foo throwException()I>
6  invokespecial <Method foo <init>(I)V>
9  goto 21
12  astore_1
13  invokestatic <Method java/lang/System gc()V>
16  invokestatic <Method java/lang/System runFinalization()V>
20  athrow
21  return

The first instruction is a new that allocates the foo instance. Even if the following call to throwException throws an exception, the foo instance already exists and will be finalized.

Here is the (approximately) corresponding C# code:

class Foo {
Foo(int i) {
}

~Foo() {
System.Console.WriteLine("Finalize");
}

static int ThrowException() {
throw new System.Exception();
}

static void Main() {
try {
new Foo(ThrowException());
} catch(System.Exception x) {
System.Console.WriteLine(x);
}
}
}

When this code is run, the output should be just the stack trace, but it turns out that it will also print "Finalize" (on .NET, not on Mono). This is a bug in the .NET JIT. It moves the object allocation up to be able to more efficiently construct the call stack for the constructor invocation, but in doing so it subtly changes the semantics. When the above code is compiled with debugging enabled, it works as expected.

Monday, 01 December 2003 21:17:03 (W. Europe Standard Time, UTC+01:00)      Comments [1]
Sunday, 23 November 2003
More On Finalization

After posting my previous entry about C# destructors, I came up with a better implementation of them.

What if, instead of depending on the derived class to be well behaved we could enforce that the base class destructor is always called?

It turns out that this is possible, thanks to the extremely powerful capabilities the CLR has for dealing with methods.

Here is an example C# class:

  class Foo {
~Foo() {
// release unmanaged resource
}
} 

The current C# compiler compiles this as:

  class Foo {
protected void Finalize() {
try {
// release unmanaged resource
} finally {
base.Finalize();
}
}
}

As I pointed out in my previous entry, the problem with this construct is that it depends on any derived class to have a similar Finalize method (that ensures the base class Finalize gets called). However, not all languages have such a feature built-in like C#. For example, in VB.NET it is trivial (and an easy mistake to make) to override Finalize and to forget to call the base class implementation.

It occurred to me that you could also compile the example class as follows (pseudo code):

class Foo {
private void .dtor() overrides Object.Finalize {
try {
Finalize();
} finally {
// release unmanaged resource
}
}
protected new virtual void Finalize() {
}
}

Note that this uses the CLR's ability to override a method even though the method name is different and the ability to introduce a new virtual method with the same name, but a different vtable slot.

Now, you don't rely on the derived class to have a correct Finalize method, because even if the derived class' Finalize method doesn't call the base class method, the finally block in .dtor will still run.

For a few minutes I was happy with myself for coming up with this clever trick, but I soon realized that while it was a little better, it still did nothing to prevent malicious code from calling Thread.Sleep() in their overridden Finalize method.

It turns out that there is an even bigger problem with this solution, though. The current design guidelines for cleaning up unmanaged resources actually suggest that you call a virtual method from your destructor that does the actual cleanup. This, of course, reintroduces the problem that the derived class can override that method and not call the base class.

In summary, it's a big mess. My original (Java) suggestions for writing a safe class that wraps an unmanaged resource are here and they apply to .NET classes in the same way. My suggestions to Microsoft are:

• Fix the guidelines. Managed and unmanaged cleanup are two very different things and should not be mixed. Also, add a rule not to call any virtual methods from a Finalize method.
• Make overriding Object.Finalize require full trust. There is no need for a finalizer unless you are wrapping an unmanaged resource and only fully trusted code can do that.
• Change the C# compiler to make the Finalize method generated for the destructor sealed by default.

These are all compatibility breaking changes, so they are probably not going to happen. Resource leaks and denial of service attacks are not on the agenda for the mainstream CLR. It's interesting to note that Yukon apparantly prohibits untrusted code from overriding the Finalize method, I wonder how they'll deal with the dispose pattern.

Sunday, 23 November 2003 15:48:06 (W. Europe Standard Time, UTC+01:00)      Comments [0]
Monday, 17 November 2003
C# Destructor Considered Harmful

After last week's Java finalization bashing, it turns out that C# is even more broken.

A C# Destructor Cannot Be Sealed

This is really bad! If you recall my example of a proper class that uses finalization correctly, you might remember that the class was final. I still  highly recommend this, but in some scenarios it might be preferable to allow others to extend your class. In such cases it is highly recommended that you make your finalize method final. Otherwise the subclasser might override finalize and forget to call your finalize method.

If you wrap an unmanaged resource and your class is non-final and it is exposed to untrusted code, you must make your finalize method final.

If you don't, the untrusted code can (intentionally or not) create a subclass of your class, override finalize (not call super.finalize()) and start leaking unmanaged resources that will never be cleaned up as long as the JVM is running.

Back To C#

To see why the C# design is a problem, we only need to look at System.WeakReference. It is a public non-sealed class that wraps an unmanaged resource (a GCHandle), the destructor is obviously not sealed and it does not require any privileges to use, this equals a recipe for disaster. Untrusted code can leak GCHandles that will never be reclaimed as long as the CLR is running [1]. Not even when the AppDomain is unloaded!

IMO, the destructor syntax should be deprecated and Finalize should be treated like any other method. This current design is hardly a pit of success.

[1] While the C# destructor is nice enough to always call the base class Finalize method, other languages (e.g. ILASM or VB.NET) don't require this.

Monday, 17 November 2003 14:19:49 (W. Europe Standard Time, UTC+01:00)      Comments [10]
New Snapshot

As usual it took a mighty long time, but I finally managed to check in my changes and create a new snapshot.

Here is what's new:

• Support for @deprecated attribute in ikvmc. Compiled as System.ObsoleteAttribute (IsError = false).
• Support for declared exceptions. Reflection (Method.getExceptionTypes) will now return all declared exceptions that the method throws. C# (or VB.NET or whatever) code can use the OpenSystem.Java.ThrowsAttribute attribute to declare exceptions.
• Changed value type handling to treat null reference as default instance when unboxing (instead of throwing an unexpected NullPointerException). Added (experimental) support for customized boxing in map.xml.
• Remapped types can now have constant (non-blank static final) fields.
• java.lang.String and java.lang.Throwable now have the correct serialVersionUID.
• Fixed a bug (introduced with the ghost interfaces change) that caused virtual call to Object.clone() to fail with a CloneNotSupportedException.
• Changed static initializer support to use a special field (__<clinit>) in the base class to trigger the base class static initializer to run (instead of using RuntimeHelpers.RunClassConstructor). This should make static initialization a little faster.
• Made all introduced methods in ghost wrappers HideFromReflection.
• Re-enabled the warning for missing native methods on static compilation.
• Fixed reflection on .NET types to hide static methods on interfaces (Java doesn't allow them).
• Added dummy "native" methods for unused methods in java.lang.reflect.Proxy (to avoid warnings from ikvmc when compiling classpath.dll).
• Fixed InetAddress.getLocalHostname() method name typo.
• Added skeleton implementation of "native" methods for java.nio.channels.FileChannelImpl.
• Changed DupHelper in compiler to support "unitialized this" references.
• Changed compiler to emit ldc_i4_0 / conv_i8 instead of ldc_8 0 and various similar (small) optimizations.
• Added support for declaring exceptions on methods of remapped types and added appropriate declaration to map.xml.
• Added experimental remapping of gnu.classpath.RawData to System.IntPtr and custom boxing operator for gnu.classpath.RawData to box IntPtr.Zero to a null reference.
• Added support for ldnull and box opcodes to remapping instruction set.
• Added optional constant attribute to field element of remapping xml.
• Added ikvm\classpath\java\lang\ref\Reference.java (copied from GNU Classpath and modified to implement (most of) the required functionality).
• Added exclusion list to compilation of classpath.dll to remove unused classes (VMObject, VMString and VMThrowable).
• Fixed netexp to create classes for non-public base classes.
• Fixed netexp to export the interfaces a class implements.
• Fixed netexp to export declared exceptions.
• Some refactoring and a few small other changes.

I used Stuart's excellent japitools to test the remapping, reflection and round-tripping and found a lot of bugs this way. I ran japize on a  classpath.jar generated by netexp from classpath.dll. This way, the whole tool chain is tested (ikvmc -> reflection -> netexp).

Note that @deprecated doesn't round-trip, because Java reflection doesn't expose the fact that something is deprecated.

The full patch is here (excluding the new files, Reference.java and exclude.lst).

Updated binary and source snapshots are available.

Monday, 17 November 2003 13:04:39 (W. Europe Standard Time, UTC+01:00)      Comments [2]
Saturday, 15 November 2003
Weak References

I finally implemented (partial) support for weak references. Java has three types of weak references:

Only WeakReference is implemented 100% correctly (barring any bugs), but I'm pretty sure that the difference between the IKVM and JVM implementations of PhantomReference is undetectable. SoftReference is currently implemented as a WeakReference, this means that they will be cleared too soon and this could adversly affect performance of caches that depend on SoftReference.

ReferenceQueue

ReferenceQueue is fully supported, but it isn't very efficient. Every Reference instance that is associated with a queue has a corresponding object that watches for GC activity (by having a finalize method and being unreachable), each time the finalizer runs it checks the associated Reference to see if it has been cleared, if it has the Reference is inserted into the ReferenceQueue, if it hasn't, a new watcher object instance is created.

PhantomReference

The difference between the IKVM and JVM PhantomReference is that the IKVM implementation doesn't actually prevent the object from being collected, but since you cannot possibly get a reference to a phantom reachable object (apart from using non-portable reflection hacks), I don't understand the point of the specified behavior for PhantomReference.

Source

Source is not yet in CVS. I made many other changes as well and I'll try to check them in and create a new snapshot tomorrow.

Saturday, 15 November 2003 22:03:33 (W. Europe Standard Time, UTC+01:00)      Comments [0]
Friday, 14 November 2003
Generic Algorithms

Anders Hejlsberg came up with a clever trick to implement generic algorithms with the current C# generics implementation.

According to the C# Version 2.0 Language Specification, interface method invocations on value types (uses a generics parameters) will not cause the value to be boxed. This gave me the idea to use a value type for the calculator to avoid the virtual function dispatch.

using System;
using System.Collections.Generic;

interface ICalculator<T>
{
}

struct IntCalculator : ICalculator<int>
{

public int Add(int t1, int t2)
{

return t1 + t2;
}
}

class AlgorithmLibrary<C, T>
where C : ICalculator<T>
{

static C calculator = C.default;

public static T Sum(List<T> items)
{
T sum = T.
default;

for(int i = 0; i < items.Count; i++)

{
}

return sum;
}
}

public class Class4
{

static void Main()
{

List<int> foo = new List <int>();

int sum = AlgorithmLibrary<IntCalculator
, int>.Sum(foo);

Console.WriteLine(sum);
}
}

Depending on how the JIT decides to compile this, this could be very efficient (the JIT could decide to generate specialized x86 code for every type that Sum() is used on).

Friday, 14 November 2003 11:53:43 (W. Europe Standard Time, UTC+01:00)      Comments [4]
Sunday, 09 November 2003
Finalize Considered Harmful

Last week at the PDC I talked to Anders Hejlsberg (btw, he's a really nice person). I asked him about having a way to prevent boxing and he told me they had already considered it and found too many problems with it (reflection, generics, more on that in another post). I also told him that I felt that C# had only one design flaw: The destructor syntax. I was reminded of this when I was working on some java.nio classes yesterday.

The Java Community Doesn't Understand Finalization

How's that for a section title? It's obviously a generalization, but I don't think it is too far from the truth. To prepare for writing this entry I looked at finalization in three books:

All three are excellent books. Highly recommended.

Addison-Wesley put online the relevant section of Peter's book here. First, let me say that Peter is a very smart guy (I know him from the Colorado Software Summit) and that this doesn't reflect on the quality of the rest of the book, but I'm picking on him because part of this particular "praxis" demonstrates the, IMO, typical misunderstanding of finalization in the Java community (and because his book is one of the few Java books I own).

So, what is wrong with his code? He uses the finalize method to cleanup managed objects that already have their own finalizer!
This practice is widely used in the Java libraries as well and it is a really bad idea. At best it doesn't help (the ServerSocket and FileInputStream will have already been finalized (see JLS 12.6.2 Finalizer Invocations are Not Ordered), or will be finalized soon anyway) and at worst you create APIs that are very difficult to use (or code) because multiple objects "own" the same resource.

The JLS doesn't say anything useful about finalization, but does mention that you should always call super.finalize() in your finalizer (like Peter does in his code).

THIS IS USELESS!

Why? All finalizable classes should extend java.lang.Object and be final, because there is no reason for them not to be. Here is what a finalizable class should look like.

final class FileHandle {
private int nativeHandle;

FileHandle(String filename) {
nativeHandle = nativeOpen(filename);
}
private static int nativeOpen(String filename);

synchronized void close() {
if(nativeHandle != 0) {
nativeClose(nativeHandle);
nativeHandle = 0;
}
}
private static native void nativeClose(int handle);

protected void finalize() {
close();
}

// example operation, the others are omitted
}
private static native int nativeRead(int handle, ...);
} 

Other than exposing the primitive operations that can be performed on the unmanaged resource (thru the handle), the class should have no functionality. The class is package private and used only by the public classes that actually implement the exposed API. The class should never have any references to other objects.

This pattern of factoring out the finalizable resource into a separate class is discussed in the Jones/Lins book. It seems that not enough people read it.

The above discussion is specific to Java and .NET finalization, there are other implementations of finalization that do finalize in order. In such an implementation it can actually be useful to use references to other objects in your finalize method, because you know that the object you're using hasn't already been finalized.

Back To C#

So what's wrong with the C# destructor syntax? I think it encourages the mistake of thinking of Finalize as a destructor (and thus using it to cleanup other managed objects). Also, a "feature" of the C# destructor is that it always call the base class Finalize method. I hope I've shown that this isn't very useful.

Anders agreed with me. "In retrospect we probably shouldn't have done that." (paraphrased from memory).

Sunday, 09 November 2003 15:53:01 (W. Europe Standard Time, UTC+01:00)      Comments [1]