May 082010
 

When I looked at the first post in this series, I realized had jumped the gun a bit by going straight to generics and didn’t do enough justice to the fundamentals. So in this post, I have made an effort to go back to the basics.

The CLR (Common Language Runtime) is the heart of .NET. It’s the virtual execution system responsible for converting the platform neutral CIL (Common Intermediate Language) into platform specific and optimized code. The CLR provides services like memory management, garbage collection, exception handling and type verification. Thus it allows language designers to concentrate solely on outputting good CIL and provides a uniform API to allow language interoperability.

The only truth in .NET is the assembly code which is the final product. All the rest are virtual constraints enforced by the execution system in a very robust manner. For e.g. A memory address declared as int cannot take a string value, not because the memory is not able to take the value, but rather the CLR makes sure the values conform to the types declared – a feature called type verification. Usually this happens at the compilation level, but still rechecked at runtime.

Microsoft released the code of the CLI (Common Language Infrastructure) under the name SSCLI (Shared Source CLI). It can be downloaded here. Joel Pobar and others wrote a great book about it. Unfortunately the 2.0 version is still a draft.

Type safety is the most important aspect of .NET programming and a lot of thought went into it. A type can be thought of as a contract which the objects need to conform to. For e.g. in the following code, the Person class is a type – which is supposed to have five public variables and one method. Any object that claims itself to be a person type must necessarily fulfill this contract – or the CLR will reject it during runtime. While working with the more mature compilers like C# and VC++, these checks are already done while converting the code to CIL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Person
{
    public string _name;
    public int _ssn;
    public char _middleName;
    public decimal _phoneNumber;
    public char _bloodGroup;
 
    public Person(string name, int ssn, char middleName, decimal phoneNumber, char bloodGroup)
    {
        this._name = name;
        this._ssn = ssn;
        this._middleName = middleName;
        this._phoneNumber = phoneNumber;
        this._bloodGroup = bloodGroup;
    }
 
    public string GetSomeDetails()
    {
        return String.Empty;
    }
}
 
static void Main(string[] args)
{
    Person _p = new Person("John Doe", 4454353, 'B', 324242432, 'O');
    _p.GetSomeDetails();
}

The code for object class can be found at sscli20\clr\src\vm\object.h in the SSCLI code and the Type class at \sscli\clr\src\vm\typehandle.h. The Type class which is extensively used in Reflection for reading the type metadata is a wrapper for this TypeHandle class. Lets look at the underlying code for some familiar methods of the TypeHandle, some of which you see in Type Class. So every object that declares itself of this type, indirectly points to the data structure to define itself.

1
2
3
4
5
6
7
8
    BOOL IsEnum() const;
    BOOL IsFnPtrType() const;
    inline MethodTable* AsMethodTable() const;
    inline TypeDesc* AsTypeDesc() const;
    BOOL IsValueType() const;
    BOOL IsInterface() const;
    BOOL IsAbstract() const;
    WORD GetNumVirtuals() const;

The MethodTable that you see is the datastructure which contains the frequently used fields needed to call the methods. Along with another structure called EEClass, it defines the type identity of an object in .NET. The difference is that the MethodTable contains data that is frequently accessed by the runtime while the EEClass is a larger store of type metadata. This metadata helps querying type information and dynamically invoking methods using the Reflection API. Using the SOS dll’s DumpHeap command, the address of all the types can be gotten and used to see the EEClass and MethodTables. Lets examine the Person type in the above example.

.load SOS
extension C:\Windows\Microsoft.NET\Framework\v2.0.50727\SOS.dll loaded
 
!DumpHeap -type Person
PDB symbol for mscorwks.dll not loaded
 Address       MT     Size
020a34d4 001530f0       36
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
001530f0        1           36 DebugApp.Person
Total 1 objects
 
//Getting the address and using the DumpObj command
 
!DumpObj 020a34d4
Name: DebugApp.Person
MethodTable: 001530f0
EEClass: 001513d0
Size: 36(0x24) bytes
 (D:\Ganesh Ranganathan\Documents\Visual Studio 2005\Projects\DebugApp\DebugApp\bin\Debug\DebugApp.exe)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70d00b68  4000001        4        System.String  0 instance 020a34b0 _name
70d02db4  4000002        8         System.Int32  1 instance  4454353 _ssn
70d01848  4000003        c          System.Char  1 instance       42 _middleName
70cd7f00  4000004       10       System.Decimal  1 instance 020a34e4 _phoneNumber
70d01848  4000005        e          System.Char  1 instance       4f _bloodGroup

Lets dissect this output. First the DumpObj command lists both the MethodTable and the EEClass address and the proceeds to list the fields . See how the value column lists the direct value for the int and char fields while the address is listed for the reference type string. However the bigger decimal type, which actually is a struct and hence a value type, displays the reference. Though SOS displays the reference, we can observe that the address is actually an offset from the object header, which means that it is still stored by value and not the reference. Looking at the memory window for the string and decimal fields’s address gives their original values.


Viewing the object in the memory window shows a pattern of how the runtime stores the values in memory. The object starts with a reference to the MethodTable, then the fields are lined up. It can be observed that there is a difference in how the runtime stores the values of the fields and how we defined them. For e.g. The two character fields are pushed together in spite of not being declared sequentially. This is done to save memory and the runtime is able to manage this situation because all it stores is the memory offset of the fields from the header. To avoid this behavior, types can be decorated with the [StructLayout(LayoutKind.Sequential)] attribute, often used while marshalling data out of managed code, because unmanaged code cant deal with such vagaries. You should also pin your objects, especially references while passing them to unmanaged code, because the runtime keeps moving the memory blocks around.

Now lets look at the MethodTable through SOS. As you can see, every type also inherits the methods from its parent, in this case System.Object. The MethodTable also contains a pointer to the EEClass. When it is laid out during type creation, the method points to a temporary piece of code called a thunk. The thunk in turn calls the JIT compiler and asks it to compile the method. This lazy compilation works wonders for performance and the memory footprint. Once the method is compiled the JIT updates the method to point to the compiled code instead of the thunk.

!DumpMT -MD 001230f0
EEClass: 001213d0
Module: 00122c5c
Name: DebugApp.Person
mdToken: 02000005  (D:\Ganesh Ranganathan\Documents\Visual Studio 2005\Projects\DebugApp\DebugApp\bin\Debug\DebugApp.exe)
BaseSize: 0x24
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
70c56aa0   70ad4a34   PreJIT System.Object.ToString()
70c56ac0   70ad4a3c   PreJIT System.Object.Equals(System.Object)
70c56b30   70ad4a6c   PreJIT System.Object.GetHashCode()
70cc7550   70ad4a90   PreJIT System.Object.Finalize()
0012c030   001230c4      JIT DebugApp.Person..ctor(System.String, Int32, Char, System.Decimal, Char)
0012c038   001230d4     NONE DebugApp.Person.GetSomeDetails()

You can see the JIT column says none for the GetSomeDetails method and thats because it hasnt been called yet. After its called for the first time, the method is JIT compiled and the MethodDesc shows the code address where the compiled code can be found. Note however, that the MethodDesc is not the usual route for the runtime to execute methods, it is rather done directly. Only when the method is invoked by its name, is the MethodDesc required.

!DumpMT -MD 001230f0
EEClass: 001213d0
Module: 00122c5c
Name: DebugApp.Person
mdToken: 02000005  (D:\Ganesh Ranganathan\Documents\Visual Studio 2005\Projects\DebugApp\DebugApp\bin\Debug\DebugApp.exe)
BaseSize: 0x24
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
70c56aa0   70ad4a34   PreJIT System.Object.ToString()
70c56ac0   70ad4a3c   PreJIT System.Object.Equals(System.Object)
70c56b30   70ad4a6c   PreJIT System.Object.GetHashCode()
70cc7550   70ad4a90   PreJIT System.Object.Finalize()
0012c030   001230c4      JIT DebugApp.Person..ctor(System.String, Int32, Char, System.Decimal, Char)
0012c038   001230d4      JIT DebugApp.Person.GetSomeDetails()
 
!DumpMD 001230d4
Method Name: DebugApp.Person.GetSomeDetails()
Class: 001213d0
MethodTable: 001230f0
mdToken: 06000007
Module: 00122c5c
IsJitted: yes
CodeAddr: 009f01c8

In this post we saw basic functioning of the CLR and how it creates and stores internal data structures to facilitate code execution at the same time abstracting away all the gory details from the developer and allowing him to solely concentrate on his applications. Below the hood everything is simply memory addresses pointing to each other and a bunch of assembly code. To give it such a high degree of structure and definition is by no means an easy task. Hats off to the developers in the .NET and Java teams!! Hope I am able to reach their skill levels one day. 🙂

Nov 032009
 
Ever wondered how .NET actually works? What goes on behind the scenes with the CLR and JIT compilation. Now Microsoft gives you a chance to actually view all that by release SSCLI – Shared Source Common Language Interface or popularly called Rotor. This is the open source implementation of the CLI which runs .NET.
For Microsoft, which usually is the scourge of open source crowd and doesn’t believing in giving code away, Rotor is a huge step forward. For one, It shows its sincerity in getting more and more developers to code and extend its products. After all, with its huge financial muscle, MSFT has got the best developer talent to develop the framework and the IDE, but making it open source goes one step forward, revealing the internal workings encourages developers to improve the core product rather than just remain end users.
Setting up Rotor is fairly simple

Ever wondered how .NET actually works? What goes on behind the scenes with the CLR and JIT compilation. Now Microsoft gives you a chance to actually view all that by release SSCLI – Shared Source Common Language Interface or popularly called Rotor. This is the open source implementation of the CLI which runs .NET.

For Microsoft, which usually is the scourge of open source crowd and doesn’t believing in giving code away, Rotor is a huge step forward. For one, It shows its sincerity in getting more and more developers to code and extend its products. After all, with its huge financial muscle, MSFT has got the best developer talent to develop the framework and the IDE, but making it open source goes one step forward, revealing the internal workings encourages developers to improve the core product rather than just remain end users.

Setting up Rotor is fairly simple. Here are the steps:-

  • Download Rotor Here. It is a 21.6 MB and the tar.gz file will get downloaded.
  • Decompress the tar.gz file to a local disk.
  • Now Rotor requires Perl to be installed as well. So download ActivePerl from here.
  • The Perl environment variables need to be set. In Vista, this required a reboot
  • In the directory where Rotor was extracted, run env.bat. This batch file will set the various environment variables.
  • Once done, run buildall.cmd from the same location. It takes forever to complete and when its done, you end up with 1.2 GB of code you can go through.

After it completes, go the Samples\Hello directory and execute csc Hello.cs from the command prompt. The file should compile. After this do clix Hello.exe. If your program says “Hello World”, you are all set!!

Once done, this is a good time to start looking at the source code. A good place would be sscli20\clr\src\bcl to get the source code for the BCL without Reflector. More on Rotor later.