Ganesh Ranganathan

Aug 112015

If there is one place where developers feel least guilty about allocating large amounts of memory, it’s in the local variables of the method. After all, local variables are short lived and as the method execution is over, the call stack is wound down and the return value is popped. This frees all of them for garbage  collection. This developer assumption might hold true for most methods, but not for all.

Let’s consider a simple method first. It allocates a large  1 million element long integer array and  returns the length of the array.

    class Program
        static void Main(string[] args)
            var a = new Foo();
    class Foo
        public int Bar()
            var arr = Enumerable.Repeat(1, 1000000).ToArray();
            return arr.Length;

Let’s compile the program and open it in WinDbg. The commands for doing that are

  • .symfix (fixes the symbol path)
  • sxe ld:clrjit.dll (telling the code to break when clrjit is loaded)
  • g (continuing execution till the clrdll is loaded)
  • .loadby sos clr (This loads the SOS managed debugging extension)
  • !bpmd (Break when Program.main is executed)

Looking at the IL code of the Foo.Bar method, it’s pretty straightforward. A 1 million element array is created and then the ldloc.0 instruction loads the local variable on the stack. After the method returns, the local variable pointer no longer exists and the garbage collector is free to reclaim the memory for other objects.


This works quite well, but imagine a scenario where you might need access to the local variable even after the method execution is over.  One scenario is when the method returns  a Func delegate instead of an integer.

class Foo
    public Func<int> BarFunc()
        var arr = Enumerable.Repeat(1, 1000000).ToArray();
        return () => arr.Length;

Though this method will always return the same result as the previous method, the CLR cannot make that assumption and mark the integer array for collection. Because there is no guarantee that the returned delegate will be executed immediately or even just once, the CLR has to maintain a reference to the local variable even after the method execution is completed. The compiler resolves this dilemma by promoting the local variable on to the heap as a field of an autogenerated Type. Let’s see the IL generated when this new method is called.



This IL is considerably different. The newobj instruction creates an object of a new Type c__DisplayClass1 which we never created. That is the type which the compiler autogenerated and used for storing the local variable. Since the new type lives on the heap it’s lifetime is guaranteed till the return delegate’s reference is held on by the calling method.  We can verify this by examining the managed Heap



…and the object of the autogenerated type shows our local variable now as a field.



If we modify our main method a bit and store the resulting delegate into a class level field , we can see that the GC maintains an explicit root to the object. In essence the object lives till the application execution is completed. This is unnecessary memory usage by the application.

class Program
    private static Func<int> classLevelVariable;

    static void Main(string[] args)
        var a = new Foo();
        classLevelVariable = a.BarFunc();

Finding GCRoots for the object, we see that Garbage collector can never collect this object.


This particular scenario might seem trivial, but in a LINQ-heavy production application it is very easy to lose track of the methods that are creating closures.  Awareness about the promotion of local variables can help prevent memory leaks and improve application performance.


Aug 072015

Microsoft recently open sourced the CLR and the framework libraries and published them on Github. Though a non production version has been open sourced for a long time under the name Rotor or SSCLI, this time there were no half measures. It gives the community the opportunity to raise issues and also fix them by creating pull requests.

The journey from source to executable code has two phases – first the compiler compiles the source code into the Intermediate Language (MSIL) and then the execution engine (CLR) converts the IL to machine specific assembly instructions. This allows .NET code to be executable across platforms and also be language agnostic as the runtime only understands MSIL.

When the program is executed, the CLR reads the type information from the assembly and creates in-memory structures to represent them. The main structures that represent a type at runtime are the MethodTable and the EEClass. The MethodTable contains “hot” data which is frequently accessed by the runtime to resolve method calls and for garbage collection. The EEClass on the other hand is a cold structure which has detailed structural information about the type including its fields and methods. This is used in Reflection. The main reason for splitting these structures is to optimize performance and keep the frequently accessed fields in as small a data structure  as possible. Every non-generic type has its own copy of the MethodTable and the EEClass, and the pointer to the MethodTable is stored in the first memory address location of each object. We can observe this by loading the SOS managed debugging extension in WinDbg



The DumpHeap command gives us the information of our type along with the the addresses of all the objects for the type. Using the WinDbg command dq to read the address at the memory address we see that the first memory address points to its MethodTable. There is another structure called the SyncBlock which exists at a negative offset to the MethodTable in the memory. This structure handles the thread synchronization information for the object.

This diagram from the SSCLI Essentials Book explains the relationship between various data structures very clearly.


As you can see the object header points to the MethodTable which in turns point to the EEClassSince the EEClass is not frequently used during runtime, this extra level of indirection doesn’t hurt performance. The MethodTable itself is followed by a call table – a table which contains the addresses of the virtual and non virtual methods to be executed for the type. Since the dispatch table is laid out at a fixed offset from the MethodTablethere is no pointer indirection to access the right method to call. One more thing to be noted about the CLR is that everything is loaded only when it’s needed to be executed. This holds true for both types and methods. When the CLR executes a method which creates another type, it creates the memory structures for the new type. However, even then the methods themselves are not compiled till the absolute last moment when they are needed to be executed.

In the above diagram, you can see the MethodTable vtable pointing to a thunk, which is called a prestub in .NET. When the method is first called, the prestub calls the JIT compiler. The JIT compiler is responsible for reading the MSIL opcode and generating the processor specific assembly code. Once the JIT Compilation is done, the address at which the compiled code resides is backpatched on to the call table. Subsequent calls to the method are directly executed without having to go through the compilation phase

Loading the MethodTable for our calculator type using the command DumpMT with the MD switch which also loads the MethodDescriptors.


At this stage in the application execution, the object for Calculator class has been created but the AddTwoNumbers method hasn’t been executed yet. So the MethodDesc table shows that only the constructor method has been jitted but not the AddTwoNumbers method.  Seeing the MethodDescriptors for both the methods using the command !DumpMD



The Constructor method now contains a code address, but the AddTwoNumbers doesn’t have code yet. Let’s step forward and see what happens after the method is jitted. Now the code address is replaced by an actual memory address which contains our machine specific assembly code. The next time this method is called, this assembly code will be directly executed.


To view the assembly, use the !u command followed by the code address.  Like in most languages, there are two registers ebp and esp to keep track of each stackframe. During a method call a new stackframe is created and the ebp maintains a pointer to the base of the stack. As code executes the esp register keeps track of how the stack grows and once execution completes, the stack is cleared and the epb value is popped.



Now lets look at this from a code level. Detailed building and debugging instructions are given at the coreclr repo. The MethodTableBuilder class contains the method which loads the types. You could put a breakpoint here but it will keep breaking when system types are loading. To avoid this , put a breakpoint in the RunMain method in assembly.cpp class, and once it breaks then put the breakpoint in the CreateTypeHandle method. This will start breaking on your custom type creation.


Below is the simple Calculator class code that we are calling. I just used the name of the executable as a Command Argument to run CoreRun.exe in the coreclr solution (Detailed instructions given in Github repo)



Now for the fun part – we start debugging the solution. The first step (after loading allocators) is to make sure all parent types are loaded. Since our type doesn’t inherit any class, its parent is System.Object. Once the Parent type is found (can’t be an interface, only a concrete type), it’s method table is returned to the MethodTableBuilder



Then there are some additional checks to handle cases like enums, Generic method, excplicit layouts etc. I’ll skip over them for brevity. At this time we have started to build the MethodTable but not the EEClass. That is done in the next step.



At this  stage, the CLR checks if the type implements any interfaces. Since interface calls are a bit more complex, there needs to be a relationship from the interface vtable to the implementing type, the calls are mapped using a slot map maintained on the implementing type’s MethodTable which maps it to the vtable slot on the interface. Since our Calculator Class doesn’t inherit interfaces, it will totally skip this block.


Now we go into the final and most crucial method which will finally return the TypeHandle. If this method succeeds, then our type has been successfully loaded into memory.


The first thing the BuildMethodTableThrowing class does is to walk up the inheritance hierarchy and load the parent type. This holds for all methods except interfaces. An interface’s vtable will not contain the System.Object’s method calls. So the method builder will simply set the parent Type to null if the type being loaded is an interface.


After this, the method makes sure the type in question is not a value type, enum, remoting type, or being called via COM Interop. All this would be loaded differently then simple reference types deriving directly from System.Object. Then the MethodImpl attributes are checked since they impact how a type a loaded. Our Calculator class just skips over these checks. The next method is EnumerateClassMethods which iterates through all the methods and adds them to the MethodTable.

Now that the implemented methods are added to the MethodTable, we need to also add the parent type’s method calls to the current vtable. this is done by the methods ImportParentMethods, AllocateWorkingSlotTables and CopyParentVtable in the MethodBuilder class. Here virtual methods have to be handled differently since they can be overridden by the current type. For non virtual methods, a direct entry to the methods implemented by the Parent type should suffice.

First the maximum possible vtable size is computed. Next a temporary table is allocated for the maximum possible size of the table


Then the parent vTable methods are loaded to the Calculator type.


After the Parent methods are added, the current type’s methods are added. We just have two methods – the Constructor and the AddTwoNumbers method. Here first the Virtual Methods are added and then the Non-Virtual ones. Since we didn’t define a custom constructor, it will just inherit the Default constructor and add it in the vtable. Once all virtual methods are added, the remaining methods will get the non vtable slots.


Now that the type’s methods have been completely been loaded, the MethodDescriptors are  created. However the code for the methods is not called even once so it will simply be pointing to a stub waiting to be JIT compiled on execution. After this stage the remaining fields are placed in the MethodTable and some additional integrity checks are done. Finally the Type is loaded and is ready to be published



Jun 122015

My general approach to extracting data from any API is to extract the data into a relational database and then write SQL queries on top of it to get the required information. Though this works, its often tedious and involves running multiple applications.

The R programming language works great for statistical computation and plotting graphics and I have been tinkering around with it for the last few weeks. While learning R, I thought of using R to extract data from the API as well. This would allow extracting the latest data from the API and compute stats with a single script. And though the XML package in R doesn’t make for the most intuitive parsing code, the vectorized operations reduces the need for frequent loops and keeps the code concise and readable.

And though this code is written for the Socialcast API, it can be easily tweaked to pull data from any social API like Facebook, Yammer etc. The first step is to pull the data from the API – the RCurl package gets us the data which can then be parsed using the XML package.


page = 1
finalDataFrame <- NULL

getInnerText <- function(inputData,parentNode,childNode) {
  test <- xpathSApply(inputData,parentNode,function(x){
    if(is.null(x[childNode][[childNode]])) {
    }else {

while(continueLoading) {

  messagesData <- getURL(paste("",page,sep=""),
                         userpwd="", ssl.verifypeer = FALSE, httpauth = 1L)
  print(paste("LOADING PAGE:",page))
  data <- xmlParse(messagesData)
  totalMessages <- length(getNodeSet(data,"//messages/message"))

The totalMessages property is to check the number of messages returned by the API. When it’s zero, the while loop is exited, else the execution continues. The xmlParse function gives us a in memory structure of the document which can be iterated upon. we use the sapply function which applies a function to each element of a list and returns a vector. I’ll come to the getUserNodeValue function later

if (totalMessages == 0){
    continueLoading = FALSE
  else {
    tempDataFrame <- data.frame(
      InteractionType = "Message",
      ID = sapply(getNodeSet(data, "//messages/message/id"),xmlValue),
      Author = sapply(getNodeSet(data,"//messages/message/user/name"),xmlValue),
      Body = sapply(getNodeSet(data,"//messages/message/body"),xmlValue),
      Url = sapply(getNodeSet(data,"//messages/message/permalink-url"),xmlValue),
      Type = sapply(getNodeSet(data,"//messages/message/message-type"),xmlValue),
      CreatedAt = sapply(getNodeSet(data,"//messages/message/created-at"),xmlValue),
      Location = sapply(getNodeSet(data,"//messages/message/user/id"),function(x){getUserNodeValue(x,"Location")}),
      Country = sapply(getNodeSet(data,"//messages/message/user/id"),function(x){getUserNodeValue(x,"Country")}),
      Sector = sapply(getNodeSet(data,"//messages/message/user/id"),function(x){getUserNodeValue(x,"Sector")}),
      Title = sapply(getNodeSet(data,"//messages/message/user/id"),function(x){getUserNodeValue(x,"Title")}),
      Department = sapply(getNodeSet(data,"//messages/message/user/id"),function(x){getUserNodeValue(x,"Department")})

    if (is.null(finalDataFrame)) {
      finalDataFrame <- tempDataFrame
      finalDataFrame <- rbind(finalDataFrame,tempDataFrame)

Now we have a data frame with all the Messages from the API. However, we also need the comments and likes. This is the only place where I needed to use a for loop to iterate through each individual message node and select their comments. The xpathSApply function reduces our code further by being able to query each node of the NodeSet with the given XPath expression and applying a function on it. Furthermore it returns a vector which fits in nicely into our existing data frame.

   for( i in 1:length(getNodeSet(data,"//messages/message"))) {
      if(length(getNodeSet(data,paste("//messages/message[position()=",i,"]/comments/comment"))) > 0){

        allComments <- getNodeSet(data,paste("//messages/message[position()=",i,"]/comments"))[[1]]


        commentFrame <-  data.frame(
          InteractionType = "Comment",
          ID = xpathSApply(allComments,"comment/id",xmlValue),
          Author = xpathSApply(allComments,"comment/user/name",xmlValue),
          Body = xpathSApply(allComments,"comment/text",xmlValue),
          Url = xpathSApply(allComments,"comment/permalink-url",xmlValue),
          Type = "",
          CreatedAt = xpathSApply(allComments,"comment/created-at",xmlValue),
          Location = xpathSApply(allComments,"comment/user/id",function(x){getUserNodeValue(x,"Location")}),
          Country = xpathSApply(allComments,"comment/user/id",function(x){getUserNodeValue(x,"Country")}),
          Sector = xpathSApply(allComments,"comment/user/id",function(x){getUserNodeValue(x,"Sector")}),
          Title = xpathSApply(allComments,"comment/user/id",function(x){getUserNodeValue(x,"Title")}),
          Department = xpathSApply(allComments,"comment/user/id",function(x){getUserNodeValue(x,"Department")})

        finalDataFrame <- rbind(finalDataFrame,commentFrame)

      if(length(getNodeSet(data,paste("//messages/message[position()=",i,"]/likes/like"))) > 0){

        allLikes <- getNodeSet(data,paste("//messages/message[position()=",i,"]/likes"))[[1]]

        likeFrame <-  data.frame(
          InteractionType = "Like",
          ID = xpathSApply(allLikes,"like/id",xmlValue),
          Author = xpathSApply(allLikes,"like/user/name",xmlValue),
          Body = "",
          Url = "",
          Type ="",
          CreatedAt = xpathSApply(allLikes,"like/created-at",xmlValue),
          Location = xpathSApply(allLikes,"like/user/id",function(x){getUserNodeValue(x,"Location")}),
          Country = xpathSApply(allLikes,"like/user/id",function(x){getUserNodeValue(x,"Country")}),
          Sector = xpathSApply(allLikes,"like/user/id",function(x){getUserNodeValue(x,"Sector")}),
          Title = xpathSApply(allLikes,"like/user/id",function(x){getUserNodeValue(x,"Title")}),
          Department = xpathSApply(allLikes,"like/user/id",function(x){getUserNodeValue(x,"Department")})

        finalDataFrame <- rbind(finalDataFrame,likeFrame)

  page <- page + 1



Now we come to the getNodeUserValue function. This is simply a performance optimization since calling the API to get the user details each time becomes very time consuming. So I generally keep the user data in a database and use the id in the xml response to query the data frame and fetch the correct user record. This step however is purely optional and you could easily call the api to get each user’s response and parse it.

getUserNodeValue <- function(inputNode,queryNode){
  if (nrow(users[users$ID == xmlValue(inputNode),]) == 0)
    users[users$ID == xmlValue(inputNode),][[queryNode]]

At this point we have all the API information parsed into a data frame (finalDataFrame). Now for the fun part! Though you can subset and count easily using the built in language functions, a package called dplyr makes this code more readable and intuitive. With dplyr you can perform multiple data manipulation operations like filter, select, order, group by etc and chain them together to get the final result

So to group the data frame by a column and count, the code is as simple as

#############Type of Activity in the Group#####################################

interactionType <- group_by(finalDataFrame,InteractionType) %>%
                   summarise(count = n())



#############Active Day of Week#############################

activeDay <- group_by(finalDataFrame,weekdays(as.Date(CreatedAt))) %>%
 summarise(count = n()) %>%


The top 5 users by total Activity

activeUsers <- group_by(finalDataFrame,Author) %>%
 summarize(TotalActivity=n()) %>%
 arrange(-TotalActivity) %>%


The Type of Messages being created

messageTypes <- filter(finalDataFrame,InteractionType == "Message") %>%
 group_by(Type) %>%
 summarize(count = n()) %>%


The stats shown here barely scratch the surface of what R is capable of computing.

Aug 232014

When using Omniauth’s oauth strategy for authenticating to any oauth enabled website, we often run into the problem of expired access tokens.

To refresh the access token, another call must be made to oauth2/token endpoint with the client id, client secret and the refresh token. Since this is not available out of the box in Omniauth-OAuth, I wrote some additional code in the User model file.

This call will return another JSON with a new access token and an updated expiry time. We need to save the access token and update the expiry time in our model. Keep in mind that this flow works only till the refresh token is valid. Once that expires, the entire oauth authorization workflow needs to be repeated.

This is the code I added in the user.rb model class of my rails application

def refresh_token_if_expired
  if token_expired?
    response    = "#{ENV['DOMAIN']}oauth2/token", :grant_type => 'refresh_token', :refresh_token => self.refresh_token, :client_id => ENV['APP_ID'], :client_secret => ENV['APP_SECRET'] 
    refreshhash = JSON.parse(response.body)
    self.token     = refreshhash['access_token']
    self.expiresat = + refreshhash["expires_in"].to_i.seconds
    puts 'Saved'

def token_expired?
  expiry = 
  return true if expiry < # expired token, so we should quickly return
  token_expires_at = expiry
  save if changed?
  false # token not expired. :D

The ENV[‘DOMAIN’] is the endpoint of the oauth provider. The client_id and client_secret would be provided at the time of application creation. And while making any authenticated calls, simply call this method which would check if the access token has expired already and calls the refresh method if it has.

#refresh the token if it has expired

P.S: There is a dependency on rest-client gem.

P.P.S: Click here to read OAuth standards page for the refresh-token and its workflow

Jul 252013

Here is a script that I wrote to back up a Trello board to my organization’s Fogbugz wiki. The code is available on GitHub and the script can be set up as a cronjob to take a regular back up of your Trello Board. The main advantage is that you don’t have to maintain the same information in two different places.

If you want to change the HTML formatting, just edit the get_output_html method. The file has instructions on how to get the API tokens and the list of dependencies that you need to install before running the ruby script.

require 'trello'
require 'open-uri'
require 'htmlentities'
require 'uri'
require 'net/http'
require 'net/smtp'
require 'fogbugz'
require 'json'

Author : Ganesh Ranganathan
Description: The script copies all the information on a Trello Board to a
fogbugz wiki page. It can be set as a cron job to copy the latest state of 
your trello board and avoid having to duplicate the information
to your organization's fogbugz wiki 

module Constants

	module Fogbugz
		URI = '<Fogbugz_URL>' #The URI Endpoint of your fogbugz deployment
		API_URL = '<FOGBUGZ_API_URL>' #The API url of your fogbugz deployment. Usually ends with api.asp
		FOGBUGZ_TOKEN = "<Enter API Token" #The API Token
		WIKI_ARTICLE_ID = 0 #Wiki Article ID where the information has to be copied.WARNING: Existing info will be deleted
		WIKI_PAGE_TITLE = 'Sample Trello Board Title' #Title of the Wiki page

	module Trello
		#Fill Trello OAuth Key, AppSecret and token and the board ID
		#only use a read only token for this script. Since we dont want to delete data even by mistake

	module Email
		#Email Details to notify user via Email
		SMTP_SERVER = 'Enter SMTP Server'
		FROM_EMAIL_ADDRESS = 'From Email Address'

module TrelloModule

 class Board

 	attr_accessor :title 
 	attr_accessor :members
 	attr_accessor :lists

 	def initialize(trello_board) 		
 		#initialize Arrays
 		@members =
 		@lists = 
 		#populate members 
 		@title =
 		trello_board.members.each{ |member| @members.push( }
  		trello_board.lists.each{ |list|  @lists.push( }

 class List
 	attr_accessor :cards
 	attr_accessor :name

 	def initialize(trello_list)
 		#initialize Arrays
 		@cards =
 		#populate basic variable
 		@name =
 		#Populate Cards Array{ |card| @cards.push( }

 class Card
 	attr_accessor :comments
 	attr_accessor :name 
 	attr_accessor :description
 	attr_accessor :members

 	def initialize(trello_card)
 		#Initialize Arrays
 		@comments = 
 		@members =

 		@name =
 		@description = trello_card.description

 		if trello_card.members.count > 0 
 		#populate Users
 			trello_card.members.each{ |member|
 		#populate Comments{ |action|
 			action.type == "commentCard"
 			}.reverse.each {|comment|

 class User
 	attr_accessor :full_name

 	def initialize(trello_member)
 		@full_name  = trello_member.full_name


 class Comment
 	attr_accessor :text
 	attr_accessor :creator

 	def initialize(trello_comment)
 		@text =["text"]
 		@creator =


 class Helper

 	def self.get_trello_member(member_id)

 	#this method generates the output html
 	def self.get_output_html(board)
 		body_html = "<h2>Members</h2>"
 		board.members.each{ |member| body_html << member.full_name << "<br />" }
 		body_html << "<br /><h2>Lists</h2>"
		board.lists.each { |list| body_html << << "<br />" }
		body_html << "<br /><h2>Cards</h2>"
		board.lists.each { |list|  {|card| 
									body_html << "<h3>" << << "</h3>" 
									body_html << "Description: " <<\n/, '<br />') 
 		 							body_html << "<br />Assigned To:"
 		 							card.members.each { |member| body_html << member.full_name << ", "  }
 		 							body_html << "<br />Comments:<br /><ul>"
 		 							card.comments.each { |comment|
 		 								body_html << "<li><span style=""font-size:14px; line-height:14px""><b>" << comment.creator.full_name << "</b>: " << << "</span></li>"
 		 							body_html << "</ul><hr />"

 	def self.write_to_fogbugz(body_html)

 		#The fogbugz-ruby gem doesnt work for larg wiki pages because it tries to send the Body in the URL and fails when the size limit is breached
		fogbugz = => Constants::FOGBUGZ_TOKEN, :uri => Constants::URI )
		response = fogbugz.command(:editArticle, :sBody => body_html, :ixWikipage => Constants::WIKI_ARTICLE_ID, :sHeadLine => Constants::WIKI_PAGE_TITLE)
		puts response
		uri = URI.parse("#{Constants::Fogbugz::API_URL}?cmd=editArticle&token=#{Constants::Fogbugz::FOGBUGZ_TOKEN}&ixWikipage=#{Constants::Fogbugz::WIKI_ARTICLE_ID}&sHeadLine=#{URI::encode(Constants::Fogbugz::WIKI_PAGE_TITLE)}")
		http =
		http.use_ssl = false
		http.verify_mode = OpenSSL::SSL::VERIFY_NONE
		request ="#{uri.path}?#{uri.query}")
		body = {'sBody' => body_html}
		response = http.request(request) 
		puts response.body

 	def self.send_email(output, recipient)
 		message = <<MESSAGE_END
From: Trello Admin <#{Constants::Email::FROM_EMAIL_SERVER}>
To: A Test User <#{recipient}>
MIME-Version: 1.0
Content-type: text/html
Subject: #{Constants::Fogbugz::WIKI_PAGE_TITLE} backup 

Net::SMTP.start(Constants::Email::SMTP_SERVER) do |smtp|
  smtp.send_message message,Constants::Email::FROM_EMAIL_ADDRESS, recipient

class Main

	def initialize
		board =
		output = Helper.get_output_html(board)

	def init_trello_api
		Trello::Authorization.const_set :AuthPolicy, Trello::Authorization::OAuthPolicy
		Trello::Authorization::OAuthPolicy.consumer_credential =  Constants::Trello::TRELLO_OAUTH_KEY, Constants::Trello::TRELLO_OAUTH_APPSECRET
		Trello::Authorization::OAuthPolicy.token = Constants::Trello::TRELLO_OAUTH_APPTOKEN , nil

Dec 212012

In my previous post, I blogged about how to access the Socialcast community data without using the API. This is usually necessary when the API doesnt support any particular functionality which is provided by the site.

This is true of the usecase of updating of the user’s profile avatar. Though there is a way to update the user profile in the API, but there is no obvious method of updating the user’s avatar. I asked Socialcast on twitter, but they didn’t answer so I went ahead with trying to use Mechanize to login to the site.

I was finally able to update the profile avatar using the below script. Works like a charm.

require 'Mechanize'
agent =
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
form =
puts "Please enter user email id" = gets.chomp
puts "Please enter password. caution: it is not masked"
form.password= gets.chomp
puts "Please enter username"
agent.get ("")
form ={ |f| f.file_upload_with(:name => "profile_photo[data]") }
puts "Please enter file path of the image to replace"
form.file_uploads.first.file_name = gets.chomp

Dec 212012

The Socialcast REST API provides programmatic access to the Socialcast community data with XML and JSON endpoints. The API provides most of the information one would require to extract out of the site but there are still gaps where the API is not up to date.

This made me look into the possibility of scraping the site directly using cUrl and parsing the generated HTML. However Socialcast is built on Rails and has a security feature which prevents cross site request forgery, using an authenticity token which is a random token generated and sent with every request embedded in a hidden form field. When the form is posted back, this token is checked and an error generated if it’s not found. This makes direct scraping of the page difficult and cUrl fails. Googling gave me a few articles which specified how to use cUrl with sites protected with the authenticity token (Link1, Link2) but unfortunately none of them seemed to work.

Then I came across a suggestion to use Mechanize, a ruby library to automate interaction with websites. Mechanize works like a charm with sites protected by an authenticity token. Here is the ruby script to login to the Socialcast Demo site.

require 'Mechanize'
agent =
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
form = = ""
form.password= "demo"

In Interactive Ruby, we can see that the authenticity token is returned when the first GET is called on the login page. And when the form is submitted the token is posted back to the server and we are redirected to the home page.


From here on, we can automate any interaction with the site just as a normal user would do without worrying about the authenticity token restriction. In my next post, I will explain how to automatically update a user’s avatar without relying on the API

Jul 122012

Oh how I wish there was a out of the box solution to do this where you could just enter the source and destination urls and voila! the entire blog was migrated. Unfortunately there isnt 🙁 (or atleast I am not aware of one).

So when I set out to migrate around a 1000 posts from 16 different Sharepoint blogs to WordPress, there were quite a few challenges on the way. The first step is to read the Sharepoint data.This would have been far easier with the proper database access to sharepoint db but unfortunately there was way too much red tape wrapped around it to get the access quick enough to start immediately, hence I took the dirty way out – to scrape the HTML pages and retrieving the data out of it. Scraping HTML is the usually the worst possible way to extract data from any source since the code is rarely ever reusable, bloated to handle nested tags and often filled with branching statements to handle special cases. Only go for the scraping approach if you don’t have any other way to access the data. What makes scraping bearable is the wonderful HTML Agility pack which I had blogged about earlier. Its XMLish approach to traverse HTML makes this activity quite easy.

This is the object that I used to represent each blog to be imported. It contains the source URL, destination blog, author Id. Each object represents the source site as well as the destination information needed to write to WordPress.

   class BlogsToBeMigrated
        public string Url { get; set; }
        public int SiteID { get; set; }
        public int AuthorId { get; set; }
        public string DestinationBlogName { get; set; }
        public BlogsToBeMigrated(string url, int siteId, int authorID,string destination)
            Url = url;
            SiteID = siteId;
            AuthorId = authorID;
            DestinationBlogName = destination;

This is the code to read each blog entry. Note that it takes in each blog object and extracts information out of it. The concatenation step would depend on how your sharepoint site is structured. Just make sure it points to the list view of the blog where all posts are listed in a tabular form. This helps us to take out each link and get the content to add in WordPress. To get the various xpath to navigate to the nodes, I used the chrome extension Xpath helper. This could be different for every sharepoint site. Just play around with the xpath till you get the required information

private static void MigrateSingleBlog(BlogsToBeMigrated blog)
    var siteUrl = String.Concat("YOUR SITE URL HERE", blog.Url, "/Lists/Posts/AllPosts.aspx");
    string siteList = ReadWebPage(siteUrl);
    var listDoc = new HtmlDocument();
    var siteListNodes = listDoc.DocumentNode.SelectNodes("//td[@class='ms-vb-title']/table/tr/td/a");

    foreach (var site in siteListNodes)
        var postUrl = String.Concat("YOUR SITE URL HERE", site.GetAttributeValue("href", "href"));
        var PageDump = ReadWebPage(postUrl);
        var postDoc = new HtmlDocument();
        var siteContent = postDoc.DocumentNode.SelectSingleNode("//div[@class='ms-PostWrapper']");

        string postDate = siteContent.ChildNodes[0].InnerText;
        string postTitle = siteContent.ChildNodes[1].ChildNodes[0].ChildNodes[0].InnerText.Trim();
        var postContent = postDoc.DocumentNode.SelectSingleNode("//div[@class='ms-PostWrapper']/div[@class='ms-PostBody']/div");

        var postHtml = postContent.InnerHtml;

        MoveToWordPress(postDate, postTitle, postHtml, blog.Url,blog.SiteID,blog.AuthorId,blog.DestinationBlogName);
        Console.WriteLine(postDate + postTitle);


Also remember to put this line in the constructor of your class before any of the HTML agility pack code is executed. This is needed because forms can be tricky elements in HTML due to their overlapping between tags which makes it difficult to parse the markup. This makes HTML Agility pack parse form tags as empty elements and the below line allows you to look inside them.


Now that I had access to all the sharepoint data the challenge was to enter this is in the right WordPress blogs. I took a look at CSV importer which allows bulk import of posts from CSV files. This step didnt work properly at all since the post content was way too large for a CSV file to be parsed properly. After numerous attempts to sanitize the CSV and escape each linebreak and comma, this step still ignored many valid posts and also filled gibberish in others. Then I thought of directly inserting the data in the WordPress database. Initially I was skeptical since wordpress might fill some related tables when a post was published, but found that there were no such problems. Directly inserting the data worked like a charm. It also allowed me to migrate the data multiple times each time I noticed an issue with improperly rendered markup

This is the method that enters in the WordPress blog table directly. Note that the multisite installation means different post tables which have a number in the table name. e.g. WP_2_posts, WP_3_posts etc. The index for the table name was in the BlogToBeMigrated object as I had manually created each wordpress blog which corresponded to a sharepoint blog. This step is fairly simple. It just creates a connection to the MySql database using the connector dlls, gets the maximum ID, increments it and uses that to insert a new post. The code isnt production standard but this isnt really something that I am looking to maintain for a long time. Till the migration is done right, we can just keep repeating with the required fixes and once its finished – you have a functioning site with no need to migrate anymore. Pragmatism wins.

private static void MoveToWordPress(string postDate,string postTitle,string postContent,string postUrl, int blogID, int authorID,string destinationBlogName)
   //Remember to include ConvertZeroDateTime=true in the connection string
    MySqlConnection wordpressConn = new MySqlConnection("DATABASE_CONNECTION STRING; ConvertZeroDateTime=true");

    using (wordpressConn)
        int maxID = 0;
        var id = new MySqlCommand(String.Format("Select Max(ID) from WP_{0}_Posts", blogID), wordpressConn).ExecuteScalar();
        if (id.GetType() == typeof(System.DBNull))
            maxID = 1;
            maxID = Convert.ToInt32(id);

        string SQLCommandText = "Insert into wp_{0}_posts (id,post_author,post_date,post_content,post_title,post_excerpt,post_status,comment_status,ping_status,post_name,to_ping,pinged,post_modified,post_modified_gmt,post_content_filtered,post_parent,guid,menu_order,post_type,comment_count)";
        SQLCommandText += " values (?ID,{1},?postDate,?postBody,?postTitle,'','publish','open','open',?postName,'','',?postDate,?postDate,'',0,?postGuid,0,'post',0)";
        HtmlDocument span = new HtmlDocument();

        MySqlCommand insertPost = new MySqlCommand(String.Format(SQLCommandText, blogID, authorID), wordpressConn);

        var siteName = "YOUR_BLOG_URL_HERE/{0}/files/{1}";
        var imagenodes = span.DocumentNode.SelectNodes("//a/img");
        if (imagenodes != null)
            foreach (var image in imagenodes)
                var imageUrl = image.ParentNode.GetAttributeValue("href", "href");
                var imageThumbUrl = image.GetAttributeValue("src", "src");
                if (imageUrl.Contains("/sites"))
                    var migratedFileName = imageUrl.Replace(String.Concat(postUrl, "/Lists/Posts/Attachments/"), string.Empty);
                    postContent = postContent.Replace(string.Format("href=\"{0}", imageUrl), String.Format("href=\"{0}", String.Format(siteName, destinationBlogName, migratedFileName)));
                    postContent = postContent.Replace(string.Format("src=\"{0}", imageThumbUrl), String.Format("src=\"{0}", String.Format(siteName, destinationBlogName, migratedFileName)));
                    postContent = postContent.Replace(string.Format("src=\"{0}", imageThumbUrl), String.Format("src=\"{0}", imageUrl));
            span = new HtmlDocument();
            var htmlNode = span.DocumentNode.SelectSingleNode("//span[@class='erte_embed']");
            if (htmlNode != null)
                string urlValue = htmlNode.GetAttributeValue("id", "id");
                urlValue = HttpUtility.UrlDecode(urlValue);
                //   urlValue = urlValue;
                postContent = postContent.Replace(htmlNode.OuterHtml, urlValue);

            insertPost.Parameters.AddWithValue("?ID", maxID);
            insertPost.Parameters.AddWithValue("?postDate", DateTime.Parse(postDate));
            insertPost.Parameters.AddWithValue("?postBody", postContent);
            insertPost.Parameters.AddWithValue("?postTitle", postTitle);
            insertPost.Parameters.AddWithValue("?postName", postTitle.Replace('#', '-').Replace(' ', '-'));
            insertPost.Parameters.AddWithValue("?postGuid", "YOUR_BLOG_URL_HERE/?p=" + maxID);


Note the additional processing around the image tags. This was because I had migrated the image files separately and wanted to update the image tag’s src attributes to reflect to the new path. If you plan on keeping your previous sharepoint installations up, then this step is optional since the attachments will be loaded from the sharepoint site anyway. But I would recommend migrating the images as well just for easier maintenance of the content.

The code to migrate a single image is below. The way to get all the image tags is very similar to how each blog content was retrieved. The only difference is that instead of entering in the database, we just use Agility pack to extract all image tags in the post contennt and call the below method to download it. The files are then saved in the wp-content directory and the path is updated in the migration logic.

  private static void DownloadImage(string url,string postUrl,string destination)
            string saveDir = String.Format("C:\\Wordpress_Images\\{0}\\", destination);
            string filename = string.Concat(saveDir, url.Replace(string.Concat("YOUR_SHAREPOINT_URL", postUrl, "/Lists/Posts/Attachments/"), string.Empty)).Replace("/","\\");
            FileInfo fi = new FileInfo(filename);
            if (!fi.Directory.Exists)
            //DirectoryInfo info = new DirectoryInfo(
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.UseDefaultCredentials = true;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            if ((response.StatusCode == HttpStatusCode.OK ||
                response.StatusCode == HttpStatusCode.Moved ||
                response.StatusCode == HttpStatusCode.Redirect) &&
                response.ContentType.StartsWith("image", StringComparison.OrdinalIgnoreCase))

                using (Stream inputStream = response.GetResponseStream())
                using (Stream outputStream = File.OpenWrite(filename))
                    byte[] buffer = new byte[4096];
                    int bytesRead;
                        bytesRead = inputStream.Read(buffer, 0, buffer.Length);
                        outputStream.Write(buffer, 0, bytesRead);
                    } while (bytesRead != 0);
Jul 082012

I did some changes to the plugin in the last post – including an option in the admin dashboard to enable/disable shortcodes and the ability to directly add buttons to the post without having to modify the theme php file. Future changes planned are to allow admins to add the widgets through the options screen as well.

The plugin is available here and the code is below. Caution: Some more testing is required

 * Returns major.minor WordPress version.
function sc_reach_get_wp_version() {
  return (float) substr(get_bloginfo('version'), 0, 3);

function sc_add_author_stream($email, $style='width:300px;height:400px') { 
  return get_div_email($email, 'profile_container_id', $style, get_option('sc_profile_token'));

function get_div_email($email, $id, $style, $token) {
	$socialcast_url = get_option('sc_host');
	if ($id != '' && $token != '') {
		return '<div id="' . $id . '" style="' . $style .
		'"></div><script type="text/javascript">_reach.push({container: "' . $id . '", domain: "https://' 
		. $socialcast_url . '", token: "' . $token . '", email:"'. $email . '"});</script>';
	} else {
		return '';

function sc_reach_content_handle($content, $sidebar = false) {
		case 'Dont Show':
		case 'Top':
			$content =  sc_add_button() . $content;
		case 'Bottom':
			$content = $content . sc_add_button();
	if (get_option('sc_use_microdata') == 'true') {
		$purl = get_permalink();
   		$content = "<div itemscope=itemscope itemtype=\"\"><a itemprop='url' href='" . $purl . "' /></a>" . $content . "</div>";

	return $content;

function get_div($id, $style, $token) {
	$socialcast_url = get_option('sc_host');
	if ($id != '' && $token != '') {
		return '<div id="' . $id . '" style="' . $style .
		'"></div><script type="text/javascript">_reach.push({container: "' . $id . '", domain: "https://' 
		. $socialcast_url . '", token: "' . $token . '"});</script>';
	} else {
		return '';

/* Name: Ganesh Ranganathan 
   Date: 7th July 2012
This function has been added to generate the div from the shortcut
which can be added to any text widget or in the post itself. It includes
an additional parameter - display which lets users insert shortcodes without
knowing the token as long as they are specified in the Plugin options screen.
Please make sure they are do_shortcode is called in your theme for widget text if 
you want to include this in a widget */
function get_shortcode_div($id,$style,$token,$display){
 $tokenInOptions ='';
  if($display != '')
			case 'button':
			case 'discussion':
				$tokenInOptions= get_option('sc_discussion_token');
			case 'profile':
			    $tokenInOptions = get_option('sc_profile_token');
			case 'trends':
				$tokenInOptions = get_option('sc_trends_token');
		if($tokenInOptions != '')
		   return get_div($id,$style,$tokenInOptions);
  return get_div($id,$style,$token);

function add_reach($atts) {
	extract( shortcode_atts( array(
			'id' => 'reach_container_id',
			'style' => '',
			'token' => '',
			'display' => ''
		), $atts ) );

  return get_shortcode_div($id, $style, $token,$display);

function reach_init_method() {

  if (sc_reach_get_wp_version() >= 2.7) {
    if (is_admin ()) {
      add_action('admin_init', 'sc_reach_register_settings');
  add_filter('the_content', 'sc_reach_content_handle');
  add_filter('admin_menu', 'sc_reach_admin_menu');
  add_option('sc_host', '');
  add_option('sc_button_token', '');
  add_option('sc_discussion_token', '');
  add_option('sc_profile_token', '');
  add_option('sc_use_microdata', 'true');
  add_option('sc_trends_token', '');
  add_option('sc_show_button','Dont Show');
  add_action('wp_head', 'sc_reach_header_meta');
  add_action('wp_footer', 'sc_reach_add_js');
  add_shortcode( 'reach', 'add_reach' );
//echo "Get option" . get_option('sc_show_button');


function sc_reach_header_meta() {
  echo '<script type="text/javascript">var _reach = _reach || [];</script>';

function sc_reach_register_settings() {
  register_setting('sc_reach', 'sc_host');
  register_setting('sc_reach', 'sc_button_token');
  register_setting('sc_reach', 'sc_discussion_token');
  register_setting('sc_reach', 'sc_trends_token');
  register_setting('sc_reach', 'sc_profile_token');
  register_setting('sc_reach', 'sc_use_microdata');
  register_setting('sc_reach', 'sc_enableShortcode');

function sc_add_button($style='width:300px;height:30px') {
	return get_div('like_container_id', $style, get_option('sc_button_token'));

function sc_add_discussion($style='width:300px;height:400px') {
	return get_div('discussion_container_id', $style, get_option('sc_discussion_token'));

function sc_add_token($style='width:300px;height:400px'){
	return get_div('trends_container_id',$style,get_option('sc_trends_token'));

function sc_reach_admin_menu() {
  add_options_page('REACH Plugin Options', 'Socialcast REACH',  'activate_plugins', __FILE__, 'sc_reach_options');

function sc_reach_options() {

  <div class="wrap">
    <h2>Reach Extensions by <a href="" target="_blank">Socialcast</a></h2>

    <form method="post" action="options.php">
    if (sc_reach_get_wp_version() < 2.7) {
    } else {

      <p>If you are not logged in to Socialcast. Please do so with a user that has administrative credentials.
        Once there either Create an HTML Extension or select one from the list. For more information please visit the <a href="">plugin page</a>.
      <h3>Socialcast Community</h3>
			<tr><td>https://</td><td><input style="width:400px" type="text" name="sc_host" value="<?php echo get_option('sc_host'); ?>" /></td></tr>
			<tr><td>Add HTML Microdata ?</td><td><input style="width:400px" type="text" name="sc_use_microdata" value="<?php echo get_option('sc_use_microdata'); ?>" />Type 'true' to use</td></tr>
			<tr><td>Button Token:</td><td><input style="width:400px" type="text" name="sc_button_token" value="<?php echo get_option('sc_button_token'); ?>" />Function: sc_add_button()</td></tr>
			<tr><td>Discussion Token:</td><td><input style="width:400px" type="text" name="sc_discussion_token" value="<?php echo get_option('sc_discussion_token'); ?>" />Function sc_add_discussion()</td></tr>
            <tr><td>Profile Token:</td><td><input style="width:400px" type="text" name="sc_profile_token" value="<?php echo get_option('sc_profile_token'); ?>" />Function sc_add_author_stream()</td></tr>
			<tr><td>Trends Token:</td><td><input style="width:400px" type="text" name="sc_trends_token" value="<?php echo get_option('sc_trends_token'); ?>" />Function sc_add_trends()</td></tr>
			<tr><td>Enable Shortcode: </td><td><input name="sc_enableShortcode" type="checkbox" value="1" <?php checked( '1', get_option( 'sc_enableShortcode' ) ); ?> /></td></tr>
			<tr><td>Show Button:</td> <td>
			<select style="width:400px" name="sc_show_button" id="sc_show_button">
<option value="Top" <?php if (get_option('sc_show_button')=='Top') echo 'selected="selected"';?>>Top</options>
<option value="Bottom" <?php if (get_option('sc_show_button')=='Bottom') echo 'selected="selected"';?>>Bottom</options>
<option value="Dont Show" <?php if (get_option('sc_show_button')=='Dont Show') echo 'selected="selected"';?>>Dont Show</options>
</select>Make sure the button token is set for this to work</td></tr>
    <?php if (sc_reach_get_wp_version() < 2.7) : ?>
      <input type="hidden" name="action" value="update" />
      <input type="hidden" name="page_options" value="sc_host" />
    <?php endif; ?>
      <p class="submit">
        <input type="submit" name="Submit" value="<?php _e('Save Changes') ?>" />
    <iframe width="100%" height="600px" src="">


    function sc_reach_add_js() {
      <script type="text/javascript">
          var e=document.createElement('script');
          e.async = true;
          e.src= document.location.protocol + '//<?php echo get_option('sc_host') ?>/services/reach/extension.js';
          var s = document.getElementsByTagName('script')[0];
          s.parentNode.insertBefore(e, s);

Jul 072012

I installed WordPress at work and have been trying to make a multisite installation work as an enterprise blogging platform. During one of the discussions around it, a colleague asked me if it was possible to integrate the blog comments system with Socialcast (the microblogging tool which is quite popular at work). I initially thought this would not be straightforward and would require development of a custom plugin from scratch.

However later, I did some searching and found that Socialcast already provides an infrastructure called Reach which can be used to integrate the comments, posts and trends with a variety of 3rd party sites. For an organization, this integration is extremely valuable as it introduces a common store for all the social interactions – be it Sharepoint, blogs, intranet pages or anything else. Since Reach is written in Javascript, it doesnt pose any restrictions on server side technology used for the sites.

So the primary goal was to make Reach work with WordPress. Initially I looked at options like HTML Javascript Adder which lets you add the reach code directly into a widget on the site. However, this posed too many issues given the lack of control one had on when the scripts were getting loaded and the difficulty to configure it. Since all Reach scripts look exactly the same except a token which is generated by Socialcast when the script is created in the admin panel, it is useless to keep replicating the same code everywhere.

Then I came across a plugin written by Monica Wilkinson of Socialcast. However this was last updated an year ago and both WordPress and Socialcast required some changes. So I forked the branch and made a few minor tweaks to suit my requirement. The plugin gives an option page to configure the tokens and the URL of your socialcast community. So I added the php file to my plugins directory and Network Activated it (This was a multisite installation). Once this is done you would get an options page on the dashboard

Now the options page has the tokens that need to be entered along with the url of your socialcast community.Remember to be careful while sharing the tokens as they have the options of allowing access to the community without the proper login credentials. I am not sure if Socialcast provides the option of revoking these tokens on a periodic basis and providing fresh ones, but this should be present to protect the company data.

There are four main kinds of reach extensions

  • Discussion – A comments system which would be shared with the Socialcast community
  • Stream – Any group or company stream
  • Trends – Trending topics, people etc.
  • Button – Like, Recommend, Share buttons. The exact verb can be configured on the admin screen.

All of these are rendered in the exact same way, by calling a script asynchronously (services/reach/extension.js) and then pushing the reach object with the javascript token. In the plugin there is a get_div function which generates the html tag

function get_div($id, $style, $token) {
	$socialcast_url = get_option('sc_host');
	if ($id != '' && $token != '') {
		return '<div id="' . $id . '" style="' . $style .
		'"></div><script type="text/javascript">_reach.push({container: "' . $id . '", domain: "https://'
		. $socialcast_url . '", token: "' . $token . '"});</script>';
	} else {
		return '';

There are two main ways of rendering the appropriate Reach control on your page

  • call the function in PHP code
  • Use the shortcode [reach]

Lets see the first option. The appropriate function that needs to be called in PHP code is given on the option screen. Let’s say I want to display the button just below the title. So I go to the theme’s postheader.php file and call the sc_add_button function in PHP code. Note: the call to sc_add_button function will only work if you have the button token configured in the plugin options. This step may differ from theme to theme.

<header class='post-header title-container fix'>
	<div class="title">
		<<?php echo $header_tag;?> class="posttitle"><?php echo suffusion_get_post_title_and_link(); ?></<?php echo $header_tag;?>>
	 echo sc_add_button('width:300px;height:30px'); 
 			if ($post_meta_position == 'corners') {
	<div class="postdata fix">

Or If you want the comments system to be replaced by the Socialcast discussion system, then go to the comment-template.php file in the wp_include directory and replace the comment markup with the call to sc_add_discussion(). Remember that you can pass the style to this method so it overrides the default styles in the plugin.

		<?php if ( comments_open( $post_id ) ) : ?>
			<?php do_action( 'comment_form_before' ); ?>
			<?php echo sc_add_discussion(''); ?>
			<?php do_action( 'comment_form_after' ); ?>
		<?php else : ?>
			<?php do_action( 'comment_form_comments_closed' ); ?>
		<?php endif; ?>

The resulting page looks like this

Now for the shortcode way. This plugin initially required the user to enter the token in the shortcode functions but I wasnt too happy with that way as revealing the tokens to non admin users seems risky. So I wrote a new function in the plugin which would allow the user to give a text on what should be displayed and the token would be read from the options. The previous way of specifying a token still exists as well.

function get_shortcode_div($id,$style,$token,$display){
 $tokenInOptions ='';
  if($display != '')
			case 'button':
			case 'discussion':
				$tokenInOptions= get_option('sc_discussion_token');
			case 'profile':
			    $tokenInOptions = get_option('sc_profile_token');
		if($tokenInOptions != '')
		   return get_div($id,$style,$tokenInOptions);
  return get_div($id,$style,$token);

This system makes it really easy for users to add the button to their blogs. Just insert a text based widget in the sidebar with the shortcode reach and it will render the widget when the page is run. Just make sure the theme calls the do_shortcode function on the widget_text parameter. If not a single line addition should do it.

Once the widget is saved, the reach extension is rendered on the page. The modified plugin can be downloaded here and the original version written by Monica can be downloaded here. I will make changes for the trends extension soon once I get the token to test it out.