I have started this series of posts dedicated to exploring the internals of the .NET framework and runtime. To get started on probing of internals of .NET, we need a debugging extension called SOS (Sons of Strike). Visual Studio comes packaged with this assembly (SOS.dll). The first posts is about one of the most important features of .NET 2.0 over 1.1 – Generics or “Parametric Polymorphism”. Wikipedia defines Generics as “a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters”.
First, we need to set up Visual studio  to use the Sons of Strike. A few simple steps mentioned below are needed to accomplish this:-
- Enable Unmanaged Debugging in the Project Properties
- Set the symbol server to  *C:\localcache*http://msdl.microsoft.com/download/symbols. The symbol server can be set by going to Tools -> Options-> Debugging. This step allows Visual studio to download the symbols needed for debugging from the Microsoft server.
- Set a breakpoint in the code and press F5 to start the program in debug mode. Once the application hits the breakpoint, load the SOS.dll using the command .load SOS in the Immediate Window.
Now we are ready to use the debugging extension. And SOS is also compatible with debugging tools like WinDbg in case you are comfortable with using that. Click here for a list of SOS commands.
Now, lets see the code that we would use for seeing how generics work. It consists of a Generic Type Customer which contains a list of the generic type and a public integer property.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | class Employee { public int EmployeeID; public static int age; public static string Name; } class Customer<T> { public int CustomerID; List<T> _innerList = new List<T>(); public List<T> ListItems { get { return _innerList; } } public void AddItems(T _addObject) { _innerList.Add(_addObject); } } |
The main method to call this code is below. We create the Customer class in three types – a string, an integer and a custom Employee type. Of these the string and Employee are reference types while integer is a primitive value type.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | static void Main(string[] args) { Customer<string> _cxStringObj = new Customer<string>(); Customer<int> _cxIntObj = new Customer<int>(); Customer<Employee> _cxEmpObj = new Customer<Employee>(); _cxStringObj.CustomerID = 34; _cxStringObj.AddItems("Ganesh"); _cxIntObj.CustomerID = 23; _cxIntObj.AddItems(43); _cxEmpObj.CustomerID = 34; _cxEmpObj.AddItems(new Employee()); } |
Every .NET class has two main data structures it uses to maintain type identity – the MethodTable and EEClass. Every object has a pointer to its MethodTable, and this methodtable contains a pointer to the EEClass. In .NET 1.1, the EE class was the strongest identifier of Type Identity of an object, however with .NET 2.0 and the implementation of Generics, Generic Types can share the same EEClass, and the MethodTable takes on the mantle of being the unique type identifier.
There are two main ways of implementing Generics which were proposed by the researchers who were working at Microsoft (This paper provides details of the implementation).
- Expansion: In this way, whenever the runtime encounters a generic type, the code could be generated on the fly for the type being called. For e.g. in our code, when the object Customer<string> was encountered, it could have expanded into the string representation of the object. When the type Customer<int> was encountered, a similar expansion can be done for it.
- Code sharing: In this approach only a single instantiation is done for the generic type, and it is reused for all the calls.
The method chosen in the end was a mixture of both these approaches. Generic types share as much code as possible They contain the same EEClass as mentioned before, but also have distinct MethodTables. Even though they have distinct MethodTables, code is shared between compatible types (all reference types).
Fire up the debugger and load the SOS.dll. To see the particular instances of the type on the heap give the Command DumpHeap -type . Once all the three objects have been created, lets see the result of this command.
!DumpHeap -type Customer Address MT Size 01ed35cc 002331d8 16 01ed3604 00233284 16 01ed3638 0023336c 16 total 3 objects Statistics: MT Count TotalSize Class Name 0023336c 1 16 GenericsTry.Customer`1[[GenericsTry.Employee, GenericsTry]] 00233284 1 16 GenericsTry.Customer`1[[System.Int32, mscorlib]] 002331d8 1 16 GenericsTry.Customer`1[[System.String, mscorlib]] Total 3 objects |
As you can see from the MT column, all the three objects have different MethodTables. Lets see the MethodTables for each of the three type instantiations.
The command is DumpMT -MD . First the Employee type. Think of the MethodTable as an extension to a conventional vtable which simply contains the method pointers. The MethodTable’s vtable points to 7 slots, 4 of which are inherited from the parent Object class and three which are defined by the object itself. If you see the JIT status, there is a YES and NONE. The NONE means the method has never been called by the JIT Compiler and it points to a temporary stub. The first time the method is called, it is compiled on the fly and the temporary stub is replaced with the actual method pointer. To optimize performance, JIT defers execution and compilation till as late as possible.
!DumpMT -MD 0023336c EEClass: 00231420 Module: 00232c5c Name: GenericsTry.Customer`1[[GenericsTry.Employee, GenericsTry]] mdToken: 02000004 (Projects\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe) BaseSize: 0x10 ComponentSize: 0x0 Number of IFaces in IFaceMap: 0 Slots in VTable: 7 -------------------------------------- MethodDesc Table Entry MethodDesc JIT Name 70766aa0 705e4a34 PreJIT System.Object.ToString() 70766ac0 705e4a3c PreJIT System.Object.Equals(System.Object) 70766b30 705e4a6c PreJIT System.Object.GetHashCode() 707d7550 705e4a90 PreJIT System.Object.Finalize() 0023c058 0023313c NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].get_ListItems() 0023c060 00233148 NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon) 0023c068 00233154 JIT GenericsTry.Customer`1[[System.__Canon, mscorlib]]..ctor() |
Lets see the MethodTable for the integer type. As you can see the EEClass for the Integer and Employee types are not same, this is because they are not compatible types and hence no code sharing can take place
!DumpMT -MD 00233284 EEClass: 002314b4 Module: 00232c5c Name: GenericsTry.Customer`1[[System.Int32, mscorlib]] mdToken: 02000004 (\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe) BaseSize: 0x10 ComponentSize: 0x0 Number of IFaces in IFaceMap: 0 Slots in VTable: 7 -------------------------------------- MethodDesc Table Entry MethodDesc JIT Name 70766aa0 705e4a34 PreJIT System.Object.ToString() 70766ac0 705e4a3c PreJIT System.Object.Equals(System.Object) 70766b30 705e4a6c PreJIT System.Object.GetHashCode() 707d7550 705e4a90 PreJIT System.Object.Finalize() 0023c080 00233254 NONE GenericsTry.Customer`1[[System.Int32, mscorlib]].get_ListItems() 0023c088 00233260 NONE GenericsTry.Customer`1[[System.Int32, mscorlib]].AddItems(Int32) 0023c090 0023326c JIT GenericsTry.Customer`1[[System.Int32, mscorlib]]..ctor() |
The last one is the string type. This brings us to interesting conclusions. Compare the first MethodTable with the last one. Both are reference types and hence compatible with other. It can be observed that the string and Employee tables have the same EEClass 00231420. The integer type however has the EEClass 002314b4. There is no way for primitive value types to share code with each other since they differ in size. For e.g. If we created a bool type for Customer class, it would still have a different EEClass and MethodTables.
!DumpMT -MD 002331d8 EEClass: 00231420 Module: 00232c5c Name: GenericsTry.Customer`1[[System.String, mscorlib]] mdToken: 02000004 (\GenericsTry\GenericsTry\bin\Debug\GenericsTry.exe) BaseSize: 0x10 ComponentSize: 0x0 Number of IFaces in IFaceMap: 0 Slots in VTable: 7 -------------------------------------- MethodDesc Table Entry MethodDesc JIT Name 70766aa0 705e4a34 PreJIT System.Object.ToString() 70766ac0 705e4a3c PreJIT System.Object.Equals(System.Object) 70766b30 705e4a6c PreJIT System.Object.GetHashCode() 707d7550 705e4a90 PreJIT System.Object.Finalize() 0023c058 0023313c NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].get_ListItems() 0023c060 00233148 NONE GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon) 0023c068 00233154 JIT GenericsTry.Customer`1[[System.__Canon, mscorlib]]..ctor() |
One more thing of notice is the Method Descriptors for the first and third types. The AddItems method in Customer class has the same Method descriptor for both the string and Employee implementation. So even after a MethodTable is duplicated, both those entries still point to the same Method Description. Even the name has the type System.__Canon, which is dynamically replaced by the type on which the method is called. This is an illustration of the second approach of code sharing given by the researchers from Microsoft.
Once the code for AddItems method is called for the first time, it is JIT Compiled and assigned a code address. The Method Descriptor holds this code address which is assembly code and can be viewed by the SOS command !u. Â Till the method is Jitted, there is junk value in this field on the MethodDescriptor.
!DumpMD 00233148 Method Name: GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon) Class: 00231420 MethodTable: 0023316c mdToken: 06000005 Module: 00232c5c IsJitted: yes CodeAddr: 00aa0298 |
Running the CodeAddress would give you a bunch of assembly code that is executed when the method is called.
!u 00aa0298 Normal JIT generated code GenericsTry.Customer`1[[System.__Canon, mscorlib]].AddItems(System.__Canon) Begin 00aa0298, size 42 >>> 00AA0298 55 push ebp 00AA0299 8BEC mov ebp,esp 00AA029B 57 push edi 00AA029C 56 push esi 00AA029D 53 push ebx 00AA029E 83EC34 sub esp,34h 00AA02A1 33C0 xor eax,eax 00AA02A3 8945F0 mov dword ptr [ebp-10h],eax 00AA02A6 33C0 xor eax,eax 00AA02A8 8945E4 mov dword ptr [ebp-1Ch],eax 00AA02AB 894DC4 mov dword ptr [ebp-3Ch],ecx 00AA02AE 8955C0 mov dword ptr [ebp-40h],edx 00AA02B1 833D142E230000 cmp dword ptr ds:[00232E14h],0 00AA02B8 7405 je 00AA02BF 00AA02BA E8BABC3571 call 71DFBF79 (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE) 00AA02BF 90 nop 00AA02C0 8B45C4 mov eax,dword ptr [ebp-3Ch] 00AA02C3 8B4804 mov ecx,dword ptr [eax+4] 00AA02C6 8B55C0 mov edx,dword ptr [ebp-40h] 00AA02C9 3909 cmp dword ptr [ecx],ecx 00AA02CB E8E0F1D16F call 707BF4B0 (System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon), mdToken: 060019f1) 00AA02D0 90 nop 00AA02D1 90 nop 00AA02D2 8D65F4 lea esp,[ebp-0Ch] 00AA02D5 5B pop ebx 00AA02D6 5E pop esi 00AA02D7 5F pop edi 00AA02D8 5D pop ebp 00AA02D9 C3 ret |
In this post, we have looked at how the CLR treats Generics and how it takes care of performance implications of Generics code by sharing as much code as possible without having to needlessly box and unbox the types being passed to the Generic classes.