Saturday, August 06, 2005

Frozen Strings are Cool

First off all, sorry for the bad pun, but I couldn't resist. Once Whidbey ships, one of the areas that .NET will be light years ahead of Java is the ability to share memory between different instances of the runtime. Microsoft did lots of work in Whidbey to enable sharing of memory pages (e.g. see Rico's post). Sun did a little work in J2SE 5.0 to allow rt.jar to be shared across VM instances, but that's really not much compared with the sharing that NGEN enables on Whidbey.

Frozen Strings

One aspect that hasn't been written about much is the ability to pre-create string instances in NGENed images. What this means is that string literals are layed out in the .data section of the NGEN image exactly like they would be layed out when they are dynamically created by the CLR. So whenever you use a frozen string literal in your managed code you're simply passing around a pointer to static data in the NGEN image and not to an object in the GC heap. Since these strings live in the .data section of the image, the standard copy-on-write page sharing that the operating system uses for initialized data sections in images applies, so unless you modify the object somehow (more about this in a bit) all applications using that image will be sharing the same physical memory pages.

To get NGEN to create frozen strings for your string literals, you have to mark your assembly with the StringFreezingAttribute. Note that the downside of doing this is that your assembly will not be unloadable, because the frozen string instances that live in your image aren't tracked by the GC, the CLR needs to keep the image loaded for the lifetime of the process.

Copy-on-Write

Strings are immutable, so why did I mention modifying the object earlier? One obvious way to modify a string is to use unsafe (or native) code to poke inside the string (a really bad idea!), but there are other ways of "modifying" immutable objects. The first is to use an object as a monitor (using Monitor.Enter or the C# lock() construct) and the second is to get the object's identity hashcode by calling System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode() or doing a non-virtual call to Object.GetHashCode() on the object. Using an object as a monitor will cause the object header to be used as a lightweight lock or as an index into the syncblock table that contains the heavyweight lock, so this can mutate the object (header). Locking on string literals was always a bad idea, because they're probably interned so they may be shared by other pieces of code that you don't know about and they can also be passed across AppDomain boundaries, but in Whidbey there is the additional (potential) cost of having to take a page fault and having to make a private copy of the page containing the strings object header, if the string is frozen. The second issue (identity hashcode) turns out not to be an issue for frozen strings, because NGEN pre-computes an identity hashcode for frozen strings, so RuntimeHelpers.GetHashCode() will simply return the value that was pre-computed and stored in the object header.

8/6/2005 6:52:03 PM (W. Europe Daylight Time, UTC+02:00)  #    Comments [0]