# Monday, 24 March 2008
« IKVM 0.36 Update 1 Release Candidate 5 | Main | IKVM 0.36 Update 1 Released »
Generic Algorithms Revisited (Again)

Back in 2003 I described how you could theoretically use value types to efficiently inject behavior into a generic type. Recently I was thinking about this again and decided to look at the code generated by the JIT.

Let's look at a very simple case:

interface IOperation
{
  void DoIt();
}

struct Dummy : IOperation
{
  public void DoIt() { }
}

public class Program
{
  static void Gen<T>() where T : IOperation
  {
    default(T).DoIt();
  }

  static void Main()
  {
    for (; ; )
    {
      Gen<Dummy>();
    }
  }
}

The .NET 2.0 SP1 x86 JIT generates the following code for the Gen<Dummy>() method:

sub    esp,8
xor    eax,eax
mov    dword ptr [esp],eax
mov    dword ptr [esp+4],eax
mov    byte ptr [esp],0
movsx  eax,byte ptr [esp]
mov    byte ptr [esp+4],al
add    esp,8
ret

The good news is that the Dummy.DoIt() method has been inlined. The bad news is that every single instruction apart from the ret is unnecessary.

In order to understand what is going on, we need to look at the IL generated by the C# compiler for the Gen<T>() method and we also need to know that the CLR treats zero length types as having a length of one byte.

Here's the IL for Gen<T>():

.maxstack 1
.locals init ([0] !!T CS$0$0000, [1] !!T CS$0$0001)
ldloca.s CS$0$0000
initobj !!T
ldloc.0
stloc.1
ldloca.s CS$0$0001
constrained. !!T
callvirt instance void IOperation::DoIt()
ret

For some reason the default(T).Doit() construct results in two local variables being created. Now the JIT code above is starting to make a little sense (although it still isn't an excuse why it isn't optimized away):

sub    esp,8
xor    eax,eax
mov    dword ptr [esp],eax
mov    dword ptr [esp+4],eax
.locals init ([0] !!T CS$0$0000, [1] !!T CS$0$0001)
Reserve and initialize space on the stack for the two local variables. Note that they are both unnecessarily padded, so we need 8 bytes intsead of 4.
 
mov    byte ptr [esp],0 ldloca.s CS$0$0000
initobj !!T
Initialize the first local variable.
 
movsx  eax,byte ptr [esp] ldloc.0
Read the value of the first local variable.
 
mov    byte ptr [esp+4],al stloc.1
Store the value we just read in the second local variable.
 
  ldloca.s CS$0$0001
constrained. !!T
callvirt instance void IOperation::DoIt()
Inlined (empty) DoIt method.
 
add    esp,8
ret
ret
Unreserve stack space and return to caller.


We can improve the code a little by changing the Gen<T>() method body as follows:

T t = default(T);
t.DoIt();

Now the C# compiler won't generate the extra local variable and the x86 code looks a little better:

push   eax
xor    eax,eax
mov    dword ptr [esp],eax
mov    byte ptr [esp],0
pop    ecx
ret

For completeness here's the x64 code for the original Gen<Dummy>() method:

sub    rsp,38h
xor    eax,eax
mov    byte ptr [rsp+21h],al
xor    eax,eax
mov    byte ptr [rsp+20h],al
xor    eax,eax
mov    byte ptr [rsp+21h],al
movzx  eax,byte ptr [rsp+21h]
mov    byte ptr [rsp+20h],al
lea    rcx,[rsp+20h]
mov    rax,6428001C078h
call   rax
nop
add    rsp,38h
rep ret

This is even worse than the x86 code, because the Dummy.DoIt() method isn't even inlined.

NGEN

The standard excuse of why a JIT doesn't do some "obvious" optimizations is of course that it runs while your program is supposed to be running, so it can't take a lot of time to analyze and optimize the code. This is true to some extent (and JIT performance is one reason why running your managed code on x64 takes so much longer to start up), but it doesn't apply to NGEN and from what I understand today's NGEN doesn't do any more optimizations than the JIT compiler. I tested this specific example (for x64) and the NGEN code is indeed identical to the JIT code. This is unfortunate and hopefully, someday we'll see more sophisticated optimizations specific to NGEN as well.

Conclusion

This trick of using value types in generic algorithms appears to work reasonably well on x86, but there is still room for improvement in the JIT. Rumour has it that in the next .NET 2.0 service pack (due out this summer?) there will be an update of the JIT that finally supports inlining of methods that have value type arguments, let's hope that this JIT update will also include other optimizations as well. Let's also hope that this update includes a better version of the x64 JIT, because as this example shows yet again the x64 JIT still considerably lags behind the x86 JIT.

P.S. Here's the Mono 1.9 x86 JIT code for the original Gen<Dummy>():

push   ebp
mov    ebp,esp
sub    esp,8
mov    byte ptr [ebp-8],0
mov    byte ptr [ebp-4],0
mov    byte ptr [ebp-8],0
movsx  eax,byte ptr [ebp-8]
mov    byte ptr [ebp-4],al
lea    eax,[ebp-4]
push   eax
call   02A20278
pop    ecx
leave
ret

Monday, 24 March 2008 09:23:29 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]