# Tuesday, 10 November 2009
« New Development Snapshot | Main | Running Eclipse with NGEN »
Going Crazy with Generics, or The Story of ThreadLocal(Of T)

Jon Skeet recently blogged about the performance of [ThreadStatic] versus the new .NET 4.0 ThreadLocal<T>. I was surprised to see that ThreadLocal<T> was faster than [ThreadStatic], because ThreadLocal<T> uses [ThreadStatic] as the underlying primitive.

How do you go from a static field to a per instance field? It's simple, once you think of it. You (ab)use generic types. Here's a simplified ThreadLocal<T>:

public class ThreadLocal<T>
{
  HolderBase holder;
  static int count;
  static Type[] types = new Type[] { typeof(C1), typeof(C2), typeof(C3) };

  abstract class HolderBase
  {
    internal abstract T Value { get; set; }
  }

  class C1 { }
  class C2 { }
  class C3 { }

  class Holder : HolderBase
  {
    [ThreadStatic]
    static T val;

    internal override T Value
    {
      get { return val; }
      set { val = value; }
    }
  }

  public ThreadLocal()
  {
    holder = MakeHolder(Interlocked.Increment(ref count) - 1);
  }

  HolderBase MakeHolder(int index)
  {
    Type t1 = types[index % 3];
    Type t2 = types[(index / 3) % 3];
    Type t3 = types[index / 9];
    Type type = typeof(Holder<,,>).MakeGenericType(typeof(T), t1, t2, t3);
    return (HolderBase)Activator.CreateInstance(type);
  }

  public T Value
  {
    get { return holder.Value; }
    set { holder.Value = value; }
  }
}

The real ThreadLocal<T> type in .NET 4.0 beta 2 is much more complex, because it has to deal with recycling the types and protecting against returning a value from a recycled type. It also uses a higher base counting system  to number the types, the maximum number of types generated (per T) is 4096 in beta 2. After you allocate more than that, it falls back to using a holder type that uses Thread.SetData().

I'm not sure what to make of this. It's a clever trick, but I think it ultimately is too clever. I benchmarked a simpler approach using arrays (where each ThreadLocal<T> simply allocated an index in the [ThreadStatic] array) and it was a little bit faster and doesn't suffer from the downsides of creating a gazillion types (which probably take more memory and those types stay around until the AppDomain is destroyed).

Finally a tip for Microsoft, move the Cn types out of ThreadLocal, because currently they are also generic (due to the fact that C# automatically makes nested types generic based on the outer type's generic type parameters) and that is unnecessarily wasteful.

Tuesday, 10 November 2009 07:03:31 (W. Europe Standard Time, UTC+01:00)  #    Comments [6]