# Friday, 09 May 2008
« IKVM 0.36 Update 2 Release Candidate 1 | Main | New Development Snapshot »
Compiler Intrinsics

Most compilers have some (or in some cases many) intrinsic functions. HotSpot has a number of them (see here, search for "intrinsics known to the runtime") as does the CLR JIT. IKVM has had a couple as well (System.arraycopy(), AtomicReferenceFieldUpdater.newUpdater(), String.toCharArray()). These were sort of hacked into the compiler and I finally decided to clean that up a little and add more scalable support for adding intrinsincs. The trigger to do this was that I added four more intrinsics: Float.floatToRawIntBits(), Float.intBitsToFloat(), Double.doubleToRawLongBits() and Double.longBitsToDouble().


Here's a micro benchmark:

public class test {
  public static void main(String[] args) {
    long sum = 1;
    long start = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
      sum += Double.doubleToRawLongBits(sum);
    long end = System.currentTimeMillis();
    System.out.println(end - start);

Here are the results:

         x86 (aligned)     x86 (unaligned)                      x64
JDK 1.6 HotSpot Server VM    287   109
JDK 1.6 HotSpot Client VM 335    
IKVM 0.36 .NET 1.1 479 565  
IKVM 0.36 .NET 2.0 570 704 124
IKVM 0.37 338 468 101

Since the x86 .NET results are highly sensitive as to whether the double on the stack happens to be aligned or not, I included both results.


Here's the MSIL that IKVM generates for the loop:

IL_000b:   ldloc.2
IL_000c:   ldc.i4    0x989680
IL_0011:   bge       IL_0028
IL_0016:   ldloc.0
IL_0017:   ldloc.0
IL_0018:   conv.r8
IL_0019:   ldloca.s  V_3
IL_001b:   call      int64 [IKVM.Runtime]IKVM.Runtime.DoubleConverter::ToLong(float64,
                     valuetype [IKVM.Runtime]IKVM.Runtime.DoubleConverter&)
IL_0020:   add
IL_0021:   stloc.0
IL_0022:   ldloc.2
IL_0023:   ldc.i4.1
IL_0024:   add
IL_0025:   stloc.2
IL_0026:   br.s      IL_000b

The conversion isn't actually inlined, but instead a local variable of value type IKVM.Runtime.DoubleConverter is added to the method and a static method on that type that takes the value to be converted and a reference to the local variable is called. Here's the code for IKVM.Runtime.DoubleConverter:

public struct DoubleConverter
  private double d;
  private long l;

  public static long ToLong(double value, ref DoubleConverter converter)
    converter.d = value;
    return converter.l;

  public static double ToDouble(long value, ref DoubleConverter converter)
    converter.l = value;
    return converter.d;

It uses the .NET feature that allows you to explicitly control the layout of a struct  to overlay the double and long fields. Note that this construct is fully verifiable.

For comparison, the standard System.BitConverter.DoubleToInt64Bits() uses unsafe code and looks something like this:

public static unsafe long DoubleToInt64Bits(double value)
  return *((long*)&value);

For some reason (probably because it isn't verifiable) the JIT doesn't like this so much and doesn't inline this method.

JIT Code

Here's the x86 code generated by the .NET 2.0 SP1 JIT:

049E15CE  cmp    ebx,989680h
049E15D4  jge    049E1600
049E15D6  lea    ecx,[esp+8]
049E15DA  mov    dword ptr [esp+10h],esi
049E15DE  mov    dword ptr [esp+14h],edi
049E15E2  fild   qword ptr [esp+10h]
049E15E6  fstp   qword ptr [esp+10h]
049E15EA  fld    qword ptr [esp+10h]
049E15EE  fstp   qword ptr [ecx]
049E15F0  mov    eax,dword ptr [ecx]
049E15F2  mov    edx,dword ptr [ecx+4]
049E15F5  add    eax,esi
049E15F7  adc    edx,edi
049E15F9  mov    esi,eax
049E15FB  mov    edi,edx
049E15FD  inc    ebx
049E15FE  jmp    049E15CE

Here's the x64 code generated by the .NET 2.0 SP1 JIT:

00000642805B8A90  cmp        ecx,989680h
00000642805B8A96  jge        00000642805B8AB1
00000642805B8A98  cvtsi2sd   xmm0,rdi
00000642805B8A9D  lea        rax,[rsp+20h]
00000642805B8AA2  movsd      mmword ptr [rax],xmm0
00000642805B8AA6  mov        rax,qword ptr [rax]
00000642805B8AA9  add        rdi,rax
00000642805B8AAC  add        ecx,1
00000642805B8AAF  jmp        00000642805B8A90

In both cases the construct is inlined properly. It is also obvious why the x64 code is so much faster, it uses SSE (as we've seen before) and only uses one memory store/load combination.


For completeness, here's the code generated by HotSpot x64:

0000000002772EA0  cvtsi2sd   xmm0,r11
0000000002772EA5  add        ebp,10h
0000000002772EA8  movsd      mmword ptr [rsp+20h],xmm0
0000000002772EAE  mov        r10,qword ptr [rsp+20h]
0000000002772EB3  add        r10,r11
0000000002772EB6  cvtsi2sd   xmm0,r10
0000000002772EBB  movsd      mmword ptr [rsp+20h],xmm0
0000000002772EC1  mov        r11,qword ptr [rsp+20h]
0000000002772EC6  add        r11,r10
0000000002772EC9  cvtsi2sd   xmm0,r11
0000000002772ECE  movsd      mmword ptr [rsp+20h],xmm0
0000000002772ED4  mov        r10,qword ptr [rsp+20h]
0000000002772ED9  add        r10,r11
0000000002772FC0  cvtsi2sd   xmm0,r10
0000000002772FC5  movsd      mmword ptr [rsp+20h],xmm0
0000000002772FCB  mov        r11,qword ptr [rsp+20h]
0000000002772FD0  add        r11,r10
0000000002772FD3  cmp        ebp,r9d
0000000002772FD6  jl         0000000002772EA0

It actually unrolled the loop 16 times (which appears not be helping in the case), but otherwise the code generated is pretty similar to what we saw on the CLR. Of course, in HotSpot Double.doubleToRawIntBits() is also an intrinsic because in Java the only alternative would be to write it in native code and the JNI transition would add significant overhead in this case.

Friday, 09 May 2008 11:27:51 (W. Europe Daylight Time, UTC+02:00)  #    Comments [3]