Friday, January 11, 2008

VoilaBot, behave!

My websites are in the local advertising space: employment, online dating, business listings. My audience is in the USA, Canada, and Mexico. Because of that, I have no chance of making revenue from traffic from Europe, China, or Nigeria.

So when I see a spider called voilabot, for Voila.fr, pounding my server farm, that's a spider I'd like to disallow. No problem, just add them to robots.txt. Only thing needed is how the bot identifies itself when scanning that file.

Typically, a spider will identify it's useragent and, parenthetically, give a link to info regarding the bot. Voilabot does not, it just points to their homepage:


2008-01-09 05:00:12 GET /robots.txt - - 193.252.149.16 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+fr;+rv:1.8.1)+
VoilaBot+BETA+1.2+(http://www.voila.com/) - - 200 1100 358 546


which redirects to http://www.voila.fr/, which sucks for me since I don't know French. I am able to find a page on their site about robots.txt, and how to block *all* spiders from visiting my site -- no thanks! I heart Google -- but nothing that mentions what user-agent top specify to block Voilabot.

The Google index, interestingly, includes the robots.txt files they scan. Examining these, some people specify Voila and just as many specify Voilabot. Other Google results include rant like mine -- apparently this bot has been in Beta since 2001.

So Voila.fr Webmaster: please tell us how to block your bot. Thanks!

Update: 1/15/08
No response from Voiala.fr so I'm now denying them at the firewall. If you don't want this bot, I encourage you to do the same.

We've seen it from

81.52.143.15
81.52.143.16

193.252.149.15
193.252.149.16

4 comments:

Anonymous said...

Can't believe you would give these jokers a free link (i.e. in their user agent reference).

Oh well, we had the some problem and did the same at blocking their IPs.

Anonymous said...

I also am getting hammered by this bot, but I can't find where in my logs the IP address is...

Any tips or ideas on where to try and find it to better block it?

Thanks, yours is the first real tips on blocking it too!

Elliott
blogexplosion.net

Steve Bywater said...

Elliott: block the following IP addresses...

81.52.143.15
81.52.143.16
193.252.149.15
193.252.149.16

Electronics Appliances said...

does it give up after crawling all the content?