# Monday, 15 August 2005
Revenge of the Non-Public Base Class

I think I've previously written about the stupidity of allowing public classes to extend non-public classes and implement non-public interfaces, but unfortunately in Java we have to live with this.

Last week rnaylor filed a bug that ikvmc created invalid code when calling public methods inherited from a non-public base class from another class (in another package).

A good example that is (sort of) equivalent to the problem in the bug report are the fields in the java.util.zip.ZipConstants interface. I'd link to the Javadoc for this interface, but it is a private type so there is no public documentation. However, this type still leaves it's marks on the public API, because several of the zip classes implement this interface. For example, ZipFile implements this interface and as a consequence it inherits a whole bunch of constants (i.e. public static final fields) that, as far as I can tell, serve no purpose whatsoever in the public API.

Before I checked in my fix yesterday, ikvmc didn't do anything special with these inherited fields or methods and because of this if you referenced these fields or methods from another assembly ikvmc would generate invalid code to access these members on the private base class and the .NET runtime would complain about this (by throwing System.MethodAccessException or System.FieldAccessException).

The fix was quite involved (the diff was about a thousand lines), because stub methods and field accessors need to be added to any public class that extends a non-public class or implements a non-public interface, however these stubs should not be visible to Java reflection, because that could potentially change the semantics of the code (e.g. for things like calculating the serial version UID). Additionally, while most fields should be exposed as properties (that read/write the field in the base class), constant fields need to be copied to retain their "constantness" when accessing them from another language (e.g. so that you can you say case ZipFile.CENATT: in C#). Another non-obvious consequence of these stub methods is that the stack walking code (in the security system and the code that generates the stack traces) needs to filter out these methods, because the system should function as if these methods weren't there.

The lesson here is that you have to be very careful when designing a class library in Java. I believe that the fields that the ZipConstants interface exposes on ZipFile, ZipEntry, etc. were actually an accident that the Sun developers failed to spot before shipping the original JDK. The general advice should be, don't have public classes extend non-public base classes and don't implement non-public interfaces on any of your public classes. Or at least have some tools in place that check for these inherited public members, to make you aware of them before shipping your library.

Finally, this problem doesn't occur in C#, because C# doesn't allow you to create a public class that extends a non-public base class. It does allow implementing non-public interfaces, but since interfaces can't have fields in C# that isn't a problem. An amusing note is that when you call an inherited public method (in a non-public base class) in an assembly that was generated with the broken version of ikvmc, the C# compiler will happily compile the call for you and the generated (invalid) code will again fail at runtime with a System.MethodAccessException.

Monday, 15 August 2005 19:09:27 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]
# Saturday, 06 August 2005
Frozen Strings are Cool

First off all, sorry for the bad pun, but I couldn't resist. Once Whidbey ships, one of the areas that .NET will be light years ahead of Java is the ability to share memory between different instances of the runtime. Microsoft did lots of work in Whidbey to enable sharing of memory pages (e.g. see Rico's post). Sun did a little work in J2SE 5.0 to allow rt.jar to be shared across VM instances, but that's really not much compared with the sharing that NGEN enables on Whidbey.

Frozen Strings

One aspect that hasn't been written about much is the ability to pre-create string instances in NGENed images. What this means is that string literals are layed out in the .data section of the NGEN image exactly like they would be layed out when they are dynamically created by the CLR. So whenever you use a frozen string literal in your managed code you're simply passing around a pointer to static data in the NGEN image and not to an object in the GC heap. Since these strings live in the .data section of the image, the standard copy-on-write page sharing that the operating system uses for initialized data sections in images applies, so unless you modify the object somehow (more about this in a bit) all applications using that image will be sharing the same physical memory pages.

To get NGEN to create frozen strings for your string literals, you have to mark your assembly with the StringFreezingAttribute. Note that the downside of doing this is that your assembly will not be unloadable, because the frozen string instances that live in your image aren't tracked by the GC, the CLR needs to keep the image loaded for the lifetime of the process.

Copy-on-Write

Strings are immutable, so why did I mention modifying the object earlier? One obvious way to modify a string is to use unsafe (or native) code to poke inside the string (a really bad idea!), but there are other ways of "modifying" immutable objects. The first is to use an object as a monitor (using Monitor.Enter or the C# lock() construct) and the second is to get the object's identity hashcode by calling System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode() or doing a non-virtual call to Object.GetHashCode() on the object. Using an object as a monitor will cause the object header to be used as a lightweight lock or as an index into the syncblock table that contains the heavyweight lock, so this can mutate the object (header). Locking on string literals was always a bad idea, because they're probably interned so they may be shared by other pieces of code that you don't know about and they can also be passed across AppDomain boundaries, but in Whidbey there is the additional (potential) cost of having to take a page fault and having to make a private copy of the page containing the strings object header, if the string is frozen. The second issue (identity hashcode) turns out not to be an issue for frozen strings, because NGEN pre-computes an identity hashcode for frozen strings, so RuntimeHelpers.GetHashCode() will simply return the value that was pre-computed and stored in the object header.

Saturday, 06 August 2005 18:52:03 (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]