Dec 242011

I have been spending some of my free time trying to build a complete cricket statistics database by parsing records from Cricinfo. However scraping HTML pages is an ardous task. There is simply no standard way of achieving it and often becomes a struggle with regular expressions. A good solution to this problem is the Html Agility Pack. Its a library which standardizes parsing of HTML pages and converts them into a XML style DOM object that you can extract data from. There are a good number of options for error checking (for HTML which is not XHTML compliant)

The API is very similar to the XmlDocument class in System.Xml namespace and hence there is hardly any learning curve. You can search for nodes based on the Xpath expression of the element you want to search. Now getting the xpath can be a bit tricky, so an easier way would be to use a chrome extension called XPath Helper. Once this extension is installed and activated, press Ctrl+Shift+X to activate and then shift to give the xpath of any particular element on which the mouse is hovering. The given XPath can be easily tailored to get the whole set of data which we need to extract.

Now, its time to start scraping. Download Html Agility pack from Codeplex and add a reference to the dll. Its a pretty simple code to get the webpage as a string , then load it in the HTML Agility pack and let it create the DOM structure. Then the XPath is used to get the list of rows in the table and each row is translated into an innings object and added to a collection. At the end its written to a csv file that can be converted to an excel spreadsheet. The code is pretty rough and I did it more for a trial. When the complete database will be built it will become much more difficult since it would involve parsing of different kind of pages and ensure integrity of data.

class Program
        static void Main(string[] args)
            new ReadText().StartParsing();


    class ReadText
        public void StartParsing()
            string TestUrl = ";filter=advanced;page={0};orderby=start;size=200;template=results;type=batting;view=innings;wrappertype=print";
            Console.WriteLine("Extracting Tests\n\n");
            ExtractInningsView(TestUrl, "..\\..\\AllTestInnings.csv",404);
            Console.WriteLine("Extracting ODIs\n\n");
            string ODIUrl = ";filter=advanced;page={0};orderby=start;size=200;template=results;type=batting;view=innings;wrappertype=print";


        private void ExtractInningsView(string statUrl,string fileName,int pageCount)
            List<InningsPlayed> AllInnings = new List<InningsPlayed>();
            for (int j = 1; j < pageCount; j++)
                Console.WriteLine("Reading Page: " + j.ToString());
                string pageText = ReadWebPage(String.Format(statUrl, j));
                var htmlDoc = new HtmlDocument();

                for (int i = 1; i < 200; i++)
                    string inningsXpath = "//tbody/tr[@class='data1'][{0}]/td";
                    var nodeList = htmlDoc.DocumentNode.SelectNodes(String.Format(inningsXpath, i));

                    if (nodeList != null)
                        AllInnings.Add(new InningsPlayed()
                            Name = nodeList[0].InnerText,
                            Runs = nodeList[1].InnerText,
                            Minutes = nodeList[2].InnerText,
                            BallsFaced = nodeList[3].InnerText,
                            Fours = nodeList[4].InnerText,
                            Sixes = nodeList[5].InnerText,
                            StrikeRate = nodeList[6].InnerText,
                            Innings = nodeList[7].InnerText,
                            Opposition = nodeList[9].InnerText,
                            Ground = nodeList[10].InnerText,
                            StartDate = nodeList[11].InnerText

        private void DumpToFile(List<InningsPlayed> AllInnings,string fileName)
            StreamWriter writer = new StreamWriter(fileName);
            StringBuilder builder = new StringBuilder();
            int iterations = 0;
            foreach (var inning in AllInnings)
                writer.WriteLine(string.Format("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10}", inning.Name, inning.Runs, inning.Minutes, inning.BallsFaced, inning.Fours, inning.Sixes, inning.StrikeRate, inning.Innings, inning.Opposition, inning.Ground, inning.StartDate));
                if (iterations % 10 == 0)

        private string ReadWebPage(string Url)
            // Reading Web page content in c# program
            //Specify the Web page to read
            WebRequest request = WebRequest.Create(Url);
            //Get the response
            WebResponse response = request.GetResponse();
            //Read the stream from the response
            StreamReader reader = new StreamReader(response.GetResponseStream());
            return reader.ReadToEnd();


    class InningsPlayed
        public string Name { get; set; }
        public string Runs { get; set; }
        public string BallsFaced { get; set; }
        public string Minutes { get; set; }
        public string Fours { get; set; }
        public string Sixes { get; set; }
        public string StrikeRate { get; set; }
        public string Innings { get; set; }
        public string Opposition { get; set; }
        public string Ground { get; set; }
        public string StartDate { get; set; }
May 232011

At the recently concluded I/O developer conference, Google made an much awaited announcement – The Google Places API has been opened to everyone (was in beta testing for some time). For the uninitiated, Google Places is a Google application for searching local businesses like hotels, ATMs etc. Places fits in beautifully with Google maps both on the web and the Android.

Now with the API being opened up, Its also possible for any location aware tools and websites to make uses of Places search option to add the functionality to your own sites. Check out the documentation here. I integrated the Places API search in my previous geolocation example explained in a blog post here. The application returns 20 places near the user’s current location and adds markers to the map for each.

Web Workers

Web workers are one of the most interesting concepts of HTML5. They are a standard based on running JS scripts on a background thread rather than the main UI thread. This is extremely important since the more time consuming scripts (like complex mathematical calculations) can be offloaded to a secondary thread rather than freezing up your application, having huge applications in graphics intensive work. I used Web Workers in the current application to work call a server side method which in turn calls the Google places API to search for a list of places near the user’s application.

Here are some limitations of Web Workers in their current implementation:-

  • Not supported on all browsers (Most notably Internet Explorer).
  • We cannot access any DOM object in the Web Worker script. All communication needs to be to the main thread using the postMessage function.
  • Because we cannot access DOM objects, it also doesn’t allow any script to be loaded which refers to DOM, which renders most JS libraries like JQuery and Prototype unusable.

In this example, I following components.

  • An ASP.NET MVC server side in order to call the Google places API. Its a controller method which calls the API url and a method which holds the data. On client side we have a similar method in JSON with same properties. The ASP.NET MVC Model binder converts the JSON object to a CLR object and passes it to the Action method. Client Side Javascript cannot call the Google places API directly because it would be a cross site request and not allowed by Google. Hence the Server’s broker method becomes necessary here
  • Client side main script which uses Geolocation to determine the user’s location. It then passes this location to the MVC Action. Before calling the action, it checks whether the browser supports Web Workers – If so they are offloaded to secondary thread. Else called on the main thread itself.
  • Worker script which makes the AJAX call to the Server using xmlHttpRequest object (Jquery cannot be used here) 🙁

Here is the code.

ASP.NET Server Side

        public ActionResult GoogleSearchAPI(SearchQuery query)
            //Base URL for calling the Google Places API
            string BaseAPIURL = String.Format("{0},{1}&radius={2}", query.Latitude, query.Longitude, query.Radius);
            if (!string.IsNullOrEmpty(query.Name))
                //Append the name parameter only if data is sent from Client side.
                BaseAPIURL = String.Concat(BaseAPIURL, String.Format("&name={0}", query.Name));
            //Include the API Key which is necessary
            BaseAPIURL = String.Concat(BaseAPIURL, String.Format("&sensor=false&key={0}", GetAPIKey()));
            //Get the XML result data from Google Places using a helper method whichc makes the call.
            string _response = MakeHttpRequestAndGetResponse(BaseAPIURL);
            //Wrap XML in a ContentResult and pass it back the Javascript
            return Content(_response);

        //Helper method to call the URL and send the response back.
        private string MakeHttpRequestAndGetResponse(string BaseAPIURL)
            var request = (HttpWebRequest)WebRequest.Create(BaseAPIURL);
            request.Method = WebRequestMethods.Http.Get;
            request.Accept = "application/json";
            string text;
            var response = (HttpWebResponse)request.GetResponse();

            using (var sr = new StreamReader(response.GetResponseStream()))
                text = sr.ReadToEnd();

            return text;

    /// Our data object  which sends the object with data from client side to
    /// server. The Model binder takes care of conversion between JSON and
    /// CLR objects.
    public class SearchQuery
        public string Latitude { get; set; }
        public string Longitude { get; set; }
        public string Radius { get; set; }
        public string Type { get; set; }
        public string Name { get; set; }

Client Side Main script

Most of the Geolocation code is the same as my previous example. This is the additional code written after the geolocation data is found and the coordinates is passed on to another method which uses it to retrieive places data and mark it on map

var spawnWorkerThread = function (position) {
    //This is executed if the getPosition is successfull. Moving the map to the user's location
    map.panTo(new google.maps.LatLng(position.coords.latitude, position.coords.longitude));
    var coordinates = new google.maps.LatLng(position.coords.latitude, position.coords.longitude);
    //Create a JSON object with the details of search query.
    var placesQuery = { Latitude: position.coords.latitude, Longitude: position.coords.longitude, Type: "establishment", Radius: "500" };
    //Check if the browser supports WEbWorkers
    if (Modernizr.webworkers) {
        printMsg("Web Workers are supported on your browser. Searching for places nearby your location");
        //Load the Worker Script
        var myWorker = new Worker("/files/webworkersmvc/Scripts/worker.js");
        //Send the JSON object to the Worker thread after serializing it to string
        // receive a message from the worker
        myWorker.onmessage = function (event) {
            //Send the returned data to the processPlacesData method
    else {
    //Make the call in a standard way and not using Web Workers
        printMsg("Web Workers isnt supported on your browser.Calling Places API the conventional way");
        var xhr = new XMLHttpRequest();
        //Calling the controller method."POST", "");
        xhr.setRequestHeader("Content-Type", "application/json");
        xhr.onreadystatechange = function () {
            if (xhr.readyState == 4 && (xhr.status == 200 || xhr.status == 0)) {


var processPlacesData = function (data) {
    //Parse the XML result into an xml obect
    var places = $.parseXML($.trim(data));
    var xmldoc = $(places);
    var resultstring = "";
    //Iterate through each result object
    $("result", places).each(function () {
        var typestring = "";
        $("type", this).each(function () {
            typestring += $(this).text() + " , ";
        //Create a MapResult object for each result.
        var resObj = new mapResult($("name", this).text(),
                                      $('vicinity', this).text(),
                                      $('lat', this).text(),
                                      $('lng', this).text(),
                                      $('icon', this).text());
        //Create a Google Maps Marker and use the result object's latitude and
        var marker = new google.maps.Marker({
            position: new google.maps.LatLng(resObj.latitude, resObj.longitude),
            animation: google.maps.Animation.DROP
        //If the screen is smaller then zoom lesser else zoom more closer.
        //This is to make the markers visible
        if (screen.width < 1000) {
        else {
        //Set each marker on the map
        //this is for the window to show information when the marker is clicked.
        //A single infowindow is reused in order to display only one
        google.maps.event.addListener(marker, 'click', function () {
  , marker);


//Javascript object to hold the map data and the get the HTML required for the marker.
function mapResult (name, vicinty, types, latitude, longitude, icon) { = name;
    this.vicinity= vicinty;
    this.types = types;
    this.latitude = latitude;
    this.longitude = longitude;
    this.iconpath = icon;
//Prototype method to avoid creating seperate copies of the method
//for each object
mapResult.prototype.getMarkerHTML = function () {
var htmlstring = "
<div style="color: blue; font-weight: bold;">";
    htmlstring += "Name: " + + "";
    htmlstring += "Types: " + this.types + "";
    htmlstring += "Location: " + this.latitude + "," + this.longitude + "";
    htmlstring += "Vicinity: " + this.vicinity +"</div>";
    return htmlstring;

Worker Side Script

The worker side script is pretty straightforward. Just calls the controller and passes on the data to the main thread using the PostMessage function

// receive a message from the main JavaScript thread
onmessage = function (event) {
    // do something in this worker
    var info =;
    var xhr = new XMLHttpRequest();"POST", "");
    xhr.setRequestHeader("Content-Type", "application/json");
    xhr.onreadystatechange = function () {
        if (xhr.readyState == 4 && (xhr.status == 200 || xhr.status == 0)) {
            //Send message back to Main thread

Demo Page

Demo Pics


On Android

May 122011

Gone are the days when the web was a one-size-fits-all kind of global information vending machine. In the past few years, continuous efforts are being made towards making the web as local as possible, with websites being able to present content suited to the users browsing it.

One huge component of this new wave are the development of location aware websites. Websites which give relevant information without the user having to explicitly search for it. For e.g. A movie ticket booking website which can be used to book tickets in theatres all over the country. Very few people, if any would travel more than 20 km to watch a movie. Hence it makes business sense to display the theatres in the 20 km radius of the users location. Instead of subjecting the user to an information overload, the website is smart enough to simplify the entire process and cut down the time taken.

Geolocation isnt a new concept. Previously, websites used the IP address and did a lookup to get a rough idea of the user’s location. There were other methods like the locale setting too. So while framing the HTML5 spec, the W3C decided to arrive at a standard for providing the client’s location to the server which abstracts out the actual method used by the browser to determine it (Cell tower triangulation, IP address, GPS etc). Since its a privacy concern to reveal one’s location, the whole geolocation API depends on explicit permission separately provided for each website.

I created an example page which integrates Google Maps API and geolocation and displays a map as per the user’s location. Since not every browser would be able to support geolocation yet, I used the JS library Modernizr which detects whether the feature is supported by the end browser without us having to do the browser sniffing using the user agent. Here is the script code.

 * Title: Geolocation API access
 * Author: Ganesh Ranganathan

var map; //global variable for the map object
	//this is the default latitude /longitude to be set on the map
	var latlng = new google.maps.LatLng(-34.397, 150.644);
	var myOptions = {zoom: 8,
			center: latlng,
			mapTypeId: google.maps.MapTypeId.ROADMAP};

	//Map constructor which creates the map
	map = new google.maps.Map($('#map_canvas')[0],myOptions);

	//Modernizr is a JS library which lets us check if
	//the target browser supports these features
		printMsg('Waiting for permission from you',false);
		//The getCurrentPosition takes in two callbacks - one for
		//success and one for error. Both functions are defined anonymously
		//inside the method
			printMsg('Permission Granted: Your Coordinates are '+position.coords.latitude+
					 ' & '+position.coords.longitude);
			//This is executed if the getPosition is successfull. Moving the map to the user's location
			map.panTo(new google.maps.LatLng(position.coords.latitude,position.coords.longitude));
			//This is executed if the getPosition is unsucesfull.
			//Show the error reason to the user
			case error.TIMEOUT:
				printMsg ("Error: Timeout",false);
				printMsg ("Error: Position unavailable",false);
			case error.PERMISSION_DENIED:
				printMsg ("Error: Permission denied",false);
			case error.UNKNOWN_ERROR:
				printMsg ("Error: Unknown error",false);
	else //Print this message if GeoLocation itself isnt supported by the browser
		printMsg("geolocation is not supported on your browser",false)

//Helper function to set the status message to
//the span value
var printMsg = function(txt,append){
		var existingText = $('#statusMsg').text();
		$('#statusMsg').text(existingText+'  '+txt);

The getCurrentPosition method of the navigator.geolocation object takes in two functions as the parameters. First is the SuccessCallback and second is the errorCallback. The success callback is passed the position object which contains the location information like latitude and longitude. This is the simple markup for the page. Please note that the Modernizr, Google Maps, Jquery scripts are referenced in the order they are needed.

<script src="Scripts/modernizr-1.7.min.js" type="text/javascript"><!--mce:0--></script>
<script src="Scripts/jquery-1.6.min.js" type="text/javascript"><!--mce:1--></script>
<script src="" type="text/javascript"><!--mce:2--></script>
<script src="Scripts/scripts.js" type="text/javascript"><!--mce:3--></script>
<h1>Geolocation API with HTML5</h1>
<div id="status">
 <span id="statusMsg"> </span></div>

The demo page for this application can be seen at –

Here is the application when its run for the first time in Chrome. As you can see it asks for permission from the user. Meanwhile the map is centered towards the default location.

If we allow the application to access our location, then the map moves to the user’s coordinates – Bangalore in my case. Here is a screenshot – this time in Firefox

Since its a standardized API, the same code works for mobile browsers as well. Here are some screenshots of the page running on the Android stock browser which is also geolocation compliant.

After granting permission

Dec 272010

In my previous post I drew a heart shape using the HTML5 canvas. After that I tried my hand at simple animation by making the heart move within constrained boundaries.

After reading quite a bit on Google, the suggested way to animate a shape on a canvas seems to be do continuously keep redrawing the canvas and clearing it. When done on a short enough interval, this gives the illusion of animation. Though I’m not sure of the performance implications of this approach, it works quite well in simpler animation scenarios.

One more thing to change from the previous post is that I had hardcoded all the coordinates for the shape. With continuously changing coordinates in animation, this was no longer possible. So I modified the code to take a reference point X,Y and draw the shape relative to it. For the animation logic these X,Y (which are global variables) are added/ subtracted by smaller values (dx,dy) repeatedly. Since every time the image is drawn relative to reference points we are able to use the same code.

The code is below .

<html xmlns="">
    <script type="text/javascript">
       var WIDTH = 400;
       var HEIGHT = 450;
        var X = 50;
        var Y = 50;
        var dx = 1;
        var dy = 1;
        var context;

        function init() {
            var drawingCanvas = document.getElementById('drawing');
            // Check the element is in the DOM and the browser supports canvas
            if (drawingCanvas.getContext) {
                context = drawingCanvas.getContext('2d');
                setInterval("draw()", 20);

        function draw() {
            context.clearRect(0, 0, WIDTH,HEIGHT);
            context.lineCap = "round";
            context.lineWidth = 8;
            context.strokeStyle = "red";
            context.fillStyle = "red";
            context.moveTo(X, Y);
            context.quadraticCurveTo(X + 10, Y + 50, X + 75, Y + 100);
            context.quadraticCurveTo(X + 140, Y + 50, X + 150, Y);
            context.quadraticCurveTo(X + 112.5, Y - 50, X + 75, Y);
            context.quadraticCurveTo(X + 37.5, Y - 50, X, Y);
            if (X + 150 + dx >= WIDTH || X + dx <= 0)
                dx = -dx;
            if (Y+ 100 + dy >= HEIGHT || Y + dy <= 0)
                dy = -dy;

            X += dx;
            Y += dy;
<body onload="init()">
<canvas style="border:solid 1px black" id="drawing" width="400" height="450">

Dec 252010

The canvas element in HTML5 is akin to a drawingboard on a webpage. It allows dynamic drawing of images, animation through JavaScript. The programming model is relatively simple and easy to learn. I just started playing around with it and tried to draw a heart shape with it.

<html xmlns="">
    <script type="text/javascript">
        function onload() {
            var drawingCanvas = document.getElementById('drawing');

            // Check the element is in the DOM and the browser supports canvas
            if (drawingCanvas.getContext) {
                var context = drawingCanvas.getContext('2d');
                context.lineCap = "round";
                context.lineWidth = 8;
                context.strokeStyle = "red";
                context.fillStyle = "red";
                context.moveTo(100, 90);
                context.quadraticCurveTo(105,150,200, 215); 
                context.quadraticCurveTo(290, 150,300, 90);
                context.quadraticCurveTo(250, 30, 200, 90);
                context.quadraticCurveTo(150, 30, 100, 90);

<body onload="onload()">
<canvas style="margin:10px" id="drawing" width="500" height="500">

The whole shape is called a path. You can draw lines, arcs, curves etc to make up a path and then fill it with a color. Canvas also allows for more advanced stuff like animation etc, but that is fodder for later posts. Below is the demo. The page is available here. (Might not work in a bit older browsers. Use Chrome).