# Monday, 24 March 2008
Generic Algorithms Revisited (Again)

Back in 2003 I described how you could theoretically use value types to efficiently inject behavior into a generic type. Recently I was thinking about this again and decided to look at the code generated by the JIT.

Let's look at a very simple case:

interface IOperation
{
  void DoIt();
}

struct Dummy : IOperation
{
  public void DoIt() { }
}

public class Program
{
  static void Gen<T>() where T : IOperation
  {
    default(T).DoIt();
  }

  static void Main()
  {
    for (; ; )
    {
      Gen<Dummy>();
    }
  }
}

The .NET 2.0 SP1 x86 JIT generates the following code for the Gen<Dummy>() method:

sub    esp,8
xor    eax,eax
mov    dword ptr [esp],eax
mov    dword ptr [esp+4],eax
mov    byte ptr [esp],0
movsx  eax,byte ptr [esp]
mov    byte ptr [esp+4],al
add    esp,8
ret

The good news is that the Dummy.DoIt() method has been inlined. The bad news is that every single instruction apart from the ret is unnecessary.

In order to understand what is going on, we need to look at the IL generated by the C# compiler for the Gen<T>() method and we also need to know that the CLR treats zero length types as having a length of one byte.

Here's the IL for Gen<T>():

.maxstack 1
.locals init ([0] !!T CS$0$0000, [1] !!T CS$0$0001)
ldloca.s CS$0$0000
initobj !!T
ldloc.0
stloc.1
ldloca.s CS$0$0001
constrained. !!T
callvirt instance void IOperation::DoIt()
ret

For some reason the default(T).Doit() construct results in two local variables being created. Now the JIT code above is starting to make a little sense (although it still isn't an excuse why it isn't optimized away):

sub    esp,8
xor    eax,eax
mov    dword ptr [esp],eax
mov    dword ptr [esp+4],eax
.locals init ([0] !!T CS$0$0000, [1] !!T CS$0$0001)
Reserve and initialize space on the stack for the two local variables. Note that they are both unnecessarily padded, so we need 8 bytes intsead of 4.
 
mov    byte ptr [esp],0 ldloca.s CS$0$0000
initobj !!T
Initialize the first local variable.
 
movsx  eax,byte ptr [esp] ldloc.0
Read the value of the first local variable.
 
mov    byte ptr [esp+4],al stloc.1
Store the value we just read in the second local variable.
 
  ldloca.s CS$0$0001
constrained. !!T
callvirt instance void IOperation::DoIt()
Inlined (empty) DoIt method.
 
add    esp,8
ret
ret
Unreserve stack space and return to caller.


We can improve the code a little by changing the Gen<T>() method body as follows:

T t = default(T);
t.DoIt();

Now the C# compiler won't generate the extra local variable and the x86 code looks a little better:

push   eax
xor    eax,eax
mov    dword ptr [esp],eax
mov    byte ptr [esp],0
pop    ecx
ret

For completeness here's the x64 code for the original Gen<Dummy>() method:

sub    rsp,38h
xor    eax,eax
mov    byte ptr [rsp+21h],al
xor    eax,eax
mov    byte ptr [rsp+20h],al
xor    eax,eax
mov    byte ptr [rsp+21h],al
movzx  eax,byte ptr [rsp+21h]
mov    byte ptr [rsp+20h],al
lea    rcx,[rsp+20h]
mov    rax,6428001C078h
call   rax
nop
add    rsp,38h
rep ret

This is even worse than the x86 code, because the Dummy.DoIt() method isn't even inlined.

NGEN

The standard excuse of why a JIT doesn't do some "obvious" optimizations is of course that it runs while your program is supposed to be running, so it can't take a lot of time to analyze and optimize the code. This is true to some extent (and JIT performance is one reason why running your managed code on x64 takes so much longer to start up), but it doesn't apply to NGEN and from what I understand today's NGEN doesn't do any more optimizations than the JIT compiler. I tested this specific example (for x64) and the NGEN code is indeed identical to the JIT code. This is unfortunate and hopefully, someday we'll see more sophisticated optimizations specific to NGEN as well.

Conclusion

This trick of using value types in generic algorithms appears to work reasonably well on x86, but there is still room for improvement in the JIT. Rumour has it that in the next .NET 2.0 service pack (due out this summer?) there will be an update of the JIT that finally supports inlining of methods that have value type arguments, let's hope that this JIT update will also include other optimizations as well. Let's also hope that this update includes a better version of the x64 JIT, because as this example shows yet again the x64 JIT still considerably lags behind the x86 JIT.

P.S. Here's the Mono 1.9 x86 JIT code for the original Gen<Dummy>():

push   ebp
mov    ebp,esp
sub    esp,8
mov    byte ptr [ebp-8],0
mov    byte ptr [ebp-4],0
mov    byte ptr [ebp-8],0
movsx  eax,byte ptr [ebp-8]
mov    byte ptr [ebp-4],al
lea    eax,[ebp-4]
push   eax
call   02A20278
pop    ecx
leave
ret

Monday, 24 March 2008 09:23:29 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]
# Friday, 14 March 2008
IKVM 0.36 Update 1 Release Candidate 5

More fixes. Since I used version 0.36.0.10 for some interim fixes, this rc is 0.36.0.11.

Changes:

  • Changed version to 0.36.0.11.
  • Fixed ikvmc to include zero length resource files.
  • Implemented SocketOptions.IP_MULTICAST_IF and SocketOptions.IP_MULTICAST_IF2.
  • Fixed assembly class loader to ignore codebase for dynamic assemblies (previously it would throw an exception).
  • Fixed exception stack trace code to return the .NET name of a type when a method in a primitive type is on the stack.
  • Fixed JNI reflection to filter out HideFromReflection members.
  • Fixed java.net.NetworkInterface to work on pre-Win2K3 systems.
  • Fixed java.lang.Thread to set context class loader for threads started from .NET.
  • Added workaround for .NET 1.1 JIT bug exposed by bytecode optimizations introduced in 0.36.0.9.

Binaries available here: ikvmbin-0.36.0.11.zip
Sources (+ binaries):ikvm-0.36.0.11.zip

Friday, 14 March 2008 09:05:21 (W. Europe Standard Time, UTC+01:00)  #    Comments [0]