Mar 082010
 

I have started this series of posts dedicated to exploring the internals of the .NET framework and runtime. To get started on probing of internals of .NET, we need a debugging extension called SOS (Sons of Strike). Visual Studio comes packaged with this assembly (SOS.dll). The first posts is about one of the most important features of .NET 2.0 over 1.1 – Generics or “Parametric Polymorphism”. Wikipedia defines Generics as “a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters”.

First, we need to set up Visual studio  to use the Sons of Strike. A few simple steps mentioned below are needed to accomplish this:-

  • Enable Unmanaged Debugging in the Project Properties

  • Set the symbol server to  *C:\localcache*http://msdl.microsoft.com/download/symbols. The symbol server can be set by going to Tools -> Options-> Debugging. This step allows Visual studio to download the symbols needed for debugging from the Microsoft server.
  • Set a breakpoint in the code and press F5 to start the program in debug mode. Once the application hits the breakpoint, load the SOS.dll using the command .load SOS in the Immediate Window.

Now we are ready to use the debugging extension. And SOS is also compatible with debugging tools like WinDbg in case you are comfortable with using that. Click here for a list of SOS commands.

Now, lets see the code that we would use for seeing how generics work. It consists of a Generic Type Customer which contains a list of the generic type and a public integer property.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Employee
    {
 
        public int EmployeeID;
        public static int age;
        public static string Name;
    }
 
    class Customer<T>
    {
        public int CustomerID;
        List<T> _innerList = new List<T>();
 
        public List<T> ListItems { get { return _innerList; } }
 
        public void AddItems(T _addObject)
        {
            _innerList.Add(_addObject);
        }
    }

The main method to call this code is below. We create the Customer class in three types – a string, an integer and a custom Employee type. Of these the string and Employee are reference types while integer is a primitive value type.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static void Main(string[] args)
        {
            Customer<string> _cxStringObj = new Customer<string>();
            Customer<int> _cxIntObj = new Customer<int>();
            Customer<Employee> _cxEmpObj = new Customer<Employee>();
 
            _cxStringObj.CustomerID = 34;
            _cxStringObj.AddItems("Ganesh");
 
            _cxIntObj.CustomerID = 23;
            _cxIntObj.AddItems(43);
 
            _cxEmpObj.CustomerID = 34;
            _cxEmpObj.AddItems(new Employee());
        }

Every .NET class has two main data structures it uses to maintain type identity – the MethodTable and EEClass. Every object has a pointer to its MethodTable, and this methodtable contains a pointer to the EEClass. In .NET 1.1, the EE class was the strongest identifier of Type Identity of an object, however with .NET 2.0 and the implementation of Generics, Generic Types can share the same EEClass, and the MethodTable takes on the mantle of being the unique type identifier.

There are two main ways of implementing Generics which were proposed by the researchers who were working at Microsoft (This paper provides details of the implementation).

  • Expansion: In this way, whenever the runtime encounters a generic type, the code could be generated on the fly for the type being called. For e.g. in our code, when the object Customer<string> was encountered, it could have expanded into the string representation of the object. When the type Customer<int> was encountered, a similar expansion can be done for it.
  • Code sharing: In this approach only a single instantiation is done for the generic type, and it is reused for all the calls.

The method chosen in the end was a mixture of both these approaches. Generic types share as much code as possible They contain the same EEClass as mentioned before, but also have distinct MethodTables. Even though they have distinct MethodTables, code is shared between compatible types (all reference types).

Fire up the debugger and load the SOS.dll. To see the particular instances of the type on the heap give the Command DumpHeap -type . Once all the three objects have been created, lets see the result of this command.

!DumpHeap -type Customer
 Address       MT     Size
01ed35cc 002331d8       16
01ed3604 00233284       16
01ed3638 0023336c       16
total 3 objects
Statistics:
      MT    Count    TotalSize Class Name
0023336c        1           16 GenericsTry.Customer`1[[GenericsTry.Employee, GenericsTry]]
00233284        1           16 GenericsTry.Customer`1[[System.Int32, mscorlib]]
002331d8        1           16 GenericsTry.Customer`1[[System.String, mscorlib]]
Total 3 objects

As you can see from the MT column, all the three objects have different MethodTables. Lets see the MethodTables for each of the three type instantiations.

The command is DumpMT -MD . First the Employee type. Think of the MethodTable as an extension to a conventional vtable which simply contains the method pointers. The MethodTable’s vtable points to 7 slots, 4 of which are inherited from the parent Object class and three which are defined by the object itself. If you see the JIT status, there is a YES and NONE. The NONE means the method has never been called by the JIT Compiler and it points to a temporary stub. The first time the method is called, it is compiled on the fly and the temporary stub is replaced with the actual method pointer. To optimize performance, JIT defers execution and compilation till as late as possible.

!DumpMT -MD 0023336c
EEClass: 00231420
Module: 00232c5c
Name: GenericsTry.Customer`1[[GenericsTry.Employee, GenericsTry]]
mdToken: 02000004  (Projects\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe)
BaseSize: 0x10
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 7
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
70766aa0   705e4a34   PreJIT System.Object.ToString()
70766ac0   705e4a3c   PreJIT System.Object.Equals(System.Object)
70766b30   705e4a6c   PreJIT System.Object.GetHashCode()
707d7550   705e4a90   PreJIT System.Object.Finalize()
0023c058   0023313c     NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].get_ListItems()
0023c060   00233148     NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon)
0023c068   00233154      JIT GenericsTry.Customer`1[[System.__Canon, mscorlib]]..ctor()

Lets see the MethodTable for the integer type. As you can see the EEClass for the Integer and Employee types are not same, this is because they are not compatible types and hence no code sharing can take place

!DumpMT -MD 00233284
EEClass: 002314b4
Module: 00232c5c
Name: GenericsTry.Customer`1[[System.Int32, mscorlib]]
mdToken: 02000004  (\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe)
BaseSize: 0x10
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 7
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
70766aa0   705e4a34   PreJIT System.Object.ToString()
70766ac0   705e4a3c   PreJIT System.Object.Equals(System.Object)
70766b30   705e4a6c   PreJIT System.Object.GetHashCode()
707d7550   705e4a90   PreJIT System.Object.Finalize()
0023c080   00233254     NONE GenericsTry.Customer`1[[System.Int32, mscorlib]].get_ListItems()
0023c088   00233260     NONE GenericsTry.Customer`1[[System.Int32, mscorlib]].AddItems(Int32)
0023c090   0023326c      JIT GenericsTry.Customer`1[[System.Int32, mscorlib]]..ctor()

The last one is the string type. This brings us to interesting conclusions. Compare the first MethodTable with the last one. Both are reference types and hence compatible with other. It can be observed that the string and Employee tables have the same EEClass 00231420. The integer type however has the EEClass 002314b4. There is no way for primitive value types to share code with each other since they differ in size. For e.g. If we created a bool type for Customer class, it would still have a different EEClass and MethodTables.

!DumpMT -MD 002331d8
EEClass: 00231420
Module: 00232c5c
Name: GenericsTry.Customer`1[[System.String, mscorlib]]
mdToken: 02000004  (\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe)
BaseSize: 0x10
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 7
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
70766aa0   705e4a34   PreJIT System.Object.ToString()
70766ac0   705e4a3c   PreJIT System.Object.Equals(System.Object)
70766b30   705e4a6c   PreJIT System.Object.GetHashCode()
707d7550   705e4a90   PreJIT System.Object.Finalize()
0023c058   0023313c     NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].get_ListItems()
0023c060   00233148     NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon)
0023c068   00233154      JIT GenericsTry.Customer`1[[System.__Canon, mscorlib]]..ctor()

One more thing of notice is the Method Descriptors for the first and third types. The AddItems method in Customer class has the same Method descriptor for both the string and Employee implementation. So even after a MethodTable is duplicated, both those entries still point to the same Method Description. Even the name has the type System.__Canon, which is dynamically replaced by the type on which the method is called. This is an illustration of the second approach of code sharing given by the researchers from Microsoft.

Once the code for AddItems method is called for the first time, it is JIT Compiled and assigned a code address. The Method Descriptor holds this code address which is assembly code and can be viewed by the SOS command !u.  Till the method is Jitted, there is junk value in this field on the MethodDescriptor.

!DumpMD 00233148
Method Name: GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon)
Class: 00231420
MethodTable: 0023316c
mdToken: 06000005
Module: 00232c5c
IsJitted: yes
CodeAddr: 00aa0298

Running the CodeAddress would give you a bunch of assembly code that is executed when the method is called.

!u 00aa0298
Normal JIT generated code
GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon)
Begin 00aa0298, size 42
>>> 00AA0298 55               push        ebp
00AA0299 8BEC             mov         ebp,esp
00AA029B 57               push        edi
00AA029C 56               push        esi
00AA029D 53               push        ebx
00AA029E 83EC34           sub         esp,34h
00AA02A1 33C0             xor         eax,eax
00AA02A3 8945F0           mov         dword ptr [ebp-10h],eax
00AA02A6 33C0             xor         eax,eax
00AA02A8 8945E4           mov         dword ptr [ebp-1Ch],eax
00AA02AB 894DC4           mov         dword ptr [ebp-3Ch],ecx
00AA02AE 8955C0           mov         dword ptr [ebp-40h],edx
00AA02B1 833D142E230000   cmp         dword ptr ds:[00232E14h],0
00AA02B8 7405             je          00AA02BF
00AA02BA E8BABC3571       call        71DFBF79 (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE)
00AA02BF 90               nop
00AA02C0 8B45C4           mov         eax,dword ptr [ebp-3Ch]
00AA02C3 8B4804           mov         ecx,dword ptr [eax+4]
00AA02C6 8B55C0           mov         edx,dword ptr [ebp-40h]
00AA02C9 3909             cmp         dword ptr [ecx],ecx
00AA02CB E8E0F1D16F       call        707BF4B0 (System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon), mdToken: 060019f1)
00AA02D0 90               nop
00AA02D1 90               nop
00AA02D2 8D65F4           lea         esp,[ebp-0Ch]
00AA02D5 5B               pop         ebx
00AA02D6 5E               pop         esi
00AA02D7 5F               pop         edi
00AA02D8 5D               pop         ebp
00AA02D9 C3               ret

In this post, we have looked at how the CLR treats Generics and how it takes care of performance implications of Generics code by sharing as much code as possible without having to needlessly box and unbox the types being passed to the Generic classes.