Dec 212012
 

In my previous post, I blogged about how to access the Socialcast community data without using the API. This is usually necessary when the API doesnt support any particular functionality which is provided by the site.

This is true of the usecase of updating of the user’s profile avatar. Though there is a way to update the user profile in the API, but there is no obvious method of updating the user’s avatar. I asked Socialcast on twitter, but they didn’t answer so I went ahead with trying to use Mechanize to login to the site.

I was finally able to update the profile avatar using the below script. Works like a charm.

require 'Mechanize'
agent = Mechanize.new
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
agent.get("https://demo.socialcast.com/login")
form = agent.page.forms.first
puts "Please enter user email id"
form.email = gets.chomp
puts "Please enter password. caution: it is not masked"
form.password= gets.chomp
form.submit
puts "Please enter username"
agent.get ("https://demo.socialcast.com/users/emilyjames/edit")
form = agent.page.forms.detect{ |f| f.file_upload_with(:name => "profile_photo[data]") }
puts "Please enter file path of the image to replace"
form.file_uploads.first.file_name = gets.chomp
form.submit

Dec 212012
 

The Socialcast REST API provides programmatic access to the Socialcast community data with XML and JSON endpoints. The API provides most of the information one would require to extract out of the site but there are still gaps where the API is not up to date.

This made me look into the possibility of scraping the site directly using cUrl and parsing the generated HTML. However Socialcast is built on Rails and has a security feature which prevents cross site request forgery, using an authenticity token which is a random token generated and sent with every request embedded in a hidden form field. When the form is posted back, this token is checked and an error generated if it’s not found. This makes direct scraping of the page difficult and cUrl fails. Googling gave me a few articles which specified how to use cUrl with sites protected with the authenticity token (Link1, Link2) but unfortunately none of them seemed to work.

Then I came across a suggestion to use Mechanize, a ruby library to automate interaction with websites. Mechanize works like a charm with sites protected by an authenticity token. Here is the ruby script to login to the Socialcast Demo site.

require 'Mechanize'
agent = Mechanize.new
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
agent.get("https://demo.socialcast.com/login")
form = agent.page.forms.first
form.email = "emily@socialcast.com"
form.password= "demo"
form.submit

In Interactive Ruby, we can see that the authenticity token is returned when the first GET is called on the login page. And when the form is submitted the token is posted back to the server and we are redirected to the home page.

login

From here on, we can automate any interaction with the site just as a normal user would do without worrying about the authenticity token restriction. In my next post, I will explain how to automatically update a user’s avatar without relying on the API