August 2005


This is one of my pet peeves. There are a lot of sites off late that show up on google’s search when you are looking for stuff but the content itself is not available unless you register or subscribe for the content. During one of my recent searches , I ended up in a page that looked very promising. What ticked me off was that when I clicked to get to that page the search term I used was not anywhere in the visible part of the article. I was told to signup for a monthly online pass if I wanted the rest of the content. Somehow, this did not sit well with me at all. They should either make the content available or not have it be indexed at all. This has happened to me in the past (especially with searches that take me to Experts Exchange) but I never thought twice about it.

When I thought more about how these sites managed to allow the Google robot to index their pages without any subscription but didn’t let me view it, the light bulb when off!. It was simple, they were just looking at the Browser’s user-agent (A HTTP Header that identifies the requesting Browser) to let Google’s robot through but not me. So all I had to do to see this content was pretend to be the Google’s robot.

Changing User-agent in IE is possible but very cumbersome. But I would not recommend it because a lot of other things like windowsupdate , sites that use browser detection instead of object detection in javascript will be very confused. I would instead suggest doing this in firefox (Shame on you if you don’t also have firefox on your desktop). There is a wonderful user agent switcher plugin in firefox that allows you setup your own user-agent. After download and install, restart firefox and go to Tools->User Agent Switcher -> Options -> Options, go to User Agents tab, add a new user agent and set


* Description ==> Google Bot,
* User Agent ==> Googlebot/2.1
* App Name ==> Googlebot
* App version ==> 2.1

Now go to Tools menu and select Tools->User Agent Switcher->Googlebot. If you go back to the same URL I mentioned in this blog above , you will now see the entire article!!. All I do now when I see sites using this technique: I simply switch my user-agent to Googlebot. Some may contend this is borderline hacking, but I am sorry , I think these sites deserve it considering the amount of my time that I have wasted wading through search results because of them.

I have been playing with rails whenever I found time the past few weeks. Which isn’t a lot. But I have to say with the little I have tried, it actually is very elegant and easy to use, especially for a database driven website. I was up and running in a few minutes. For the benefit of people who don’t know what Ruby or Rails is about: Rails is a web application framework on top of the programming language ruby. Ruby was invented by a japanese programmer and has recently become very popular due to english language articles and books on the language. The most interesting being the Ruby book by pragmatic programmers.