Having troubles with bots that may be scraping, hammering on your sites or other inappropriate activities and you have a classic ASP site?
This is a common issue that a lot of people go through. The trick is to filter out the good bots from the bad. To filter out the good from the bad, the first thing you need to do is identify the user agent and the easiest way is to create a function that returns true / false.
function isGoodBot ' get user agent from the request variables userAgent = Request.ServerVariables("HTTP_USER_AGENT") isGoodBot = FALSE ' Check user agent for likeable bot signatures if instr(ucase(UserAgent), "GOOGLE") or instr(ucase(UserAgent), "FACEBOOK") or instr(ucase(UserAgent), "YSEARCH/SLURP") _ or instr(ucase(UserAgent), "MSNBOT") or instr(ucase(UserAgent), "BINGBOT") then isGoodBot = TRUE end end function
The next thing to do is to figure out a tolerance level that you are comfortable with. The idea is to block people who are abusing your website, but allow those who are using your site properly to continue. I chose 20/pages or more is a little more aggressive than I would like a user to be. I emailed myself every time I blocked so that I could research the IP address and I found that I was not blocking valid humans or bots until I was comfortable. One site that I use to research IP addresses is: http://www.ip-address.org/lookup/ip-locator.php
Here is the logic I used was the following and only if not isGoodBot:
- Log IP to a database table – this should be done for at least a day prior to the next steps. Allows for some research.
- Look up IP in the database, but only looking at the last 3 minutes.
- Get average number of hits per minute during the last 3 minutes for the given IP address
- If the number of hits per minute exceed the 20 pages per minute tolerance, then give 403 error
- Instead of a 403 you can redirect to Google or any other site.
- Create a SQL job to delete the old entries of the database anything older than a day ( the table will fill fast )
A 403 error in ASP is given like this:
Response.Status="403 Forbidden" Response.end
I created this based on ASP because that was the project I was working in. It can easily be translated to PHP or any other language fairly easily.