"Bomb-proofing" Plans |
|
|
Troop 53 Leadership Smart Books Members' Information: ![]() |
Dealing With Other "Plans" Calendar AnnoyancesThe previous page on Spam-proofing Plans tells how to use .htaccess to prevent your Pending Calendars from being spammed and to cut down on the bandwidth being used by such attempts. The following information has nothing to do with spam appearing on your calendar and so far these things are really only an annoyance to me as the webmaster, but you may find it useful. All this really does is keep those annoyances from wasting your bandwidth, and for me it's worth the satisfaction of knowing they're not getting anything for their efforts. "URL injections"Recently there have been quite a number of what I call "URL injection" attempts in our calendar. About once a week there will be 30-40 lines in the access log with a variety of URLs with "escaped" slashes and dots placed in various spots in the Plans calendar URL. You can add the following lines to your .htaccess file to serve a "403 - Forbidden" error page for those requests/attempts: RewriteCond %{QUERY_STRING} http
RewriteRule .* - [F]
The common term among all those URL injections is the "http" so the above takes care of that. "Bots" and "Scrapers" requesting calendars from stupid yearsOn April Fool's Day of 2008 our calendar was hit by a bot or scraper that ate up 51+ megabytes of bandwidth within 30 minutes. The vast majority of that bandwidth was the bot requesting every month's calendar for every year from December, 1901, through December, 2038. A similar occurrance happened in early March of the same year. Most organizations probably don't have valid information entered in their calendar for much more than 2-4 years prior to the present and for 2-4 years after the present. I figured that anything or anybody requesting a calendar from before 2000 and after 2019 was either up to no good or wasn't human, or both. Whatever the case there aren't any Troop 53 events to view in those years. So I added the following to our .htaccess file: RewriteCond %{QUERY_STRING} cal_start_year=19
RewriteRule .* - [F]
RewriteCond %{QUERY_STRING} cal_start_year=202
RewriteRule .* - [F]
RewriteCond %{QUERY_STRING} cal_start_year=203
RewriteRule .* - [F]
This serves a "403 - Forbidden" error to any request for a calendar from the 1900s, the 2020s, and the 2030s. Follow-up on this: It worked better than expected! I would have been happy with 100 years of 403s due to the lessened bandwidth, but another bot/scraper came through on April 3 and it stopped after getting a 403 for January, 2020 and another for December 1999. So it actually stopped further requests for banned years. Cleaning it upIf you've been following along (And why not? This is riveting prose!), you've now got quite a number of new lines in your .htaccess file. It'll look something like this (minus the comments at the ends of the lines) if you've done everything on the last 2 pages: RewriteEngine On # This line turns on the Rewrite Engine
RewriteCond %{QUERY_STRING} action=add # These 2 lines prevent spamming
RewriteRule .* - [F] # of your Pending Calendars
RewriteCond %{QUERY_STRING} http # These lines serve 403s to
RewriteRule .* - [F] # URL injection requests
RewriteCond %{QUERY_STRING} cal_start_year=19 # Serves 403s to requests for
RewriteRule .* - [F] # calendars from 1900-1999
RewriteCond %{QUERY_STRING} cal_start_year=202 # Serves 403s to requests for
RewriteRule .* - [F] # calendars from 2020-2029
RewriteCond %{QUERY_STRING} cal_start_year=203 # Serves 403s to requests for
RewriteRule .* - [F] # calendars from 2030-2039
Remember that the server has to parse each line of the file on every file request. So to help out the server we'll combine all of the above into one Condition/Rule set: RewriteEngine On
RewriteCond %{QUERY_STRING} (action=add|http|cal_start_year=19|cal_start_year=202|cal_start_year=203) [NC]
RewriteRule .* - [F]
That Condition/Rule set says "If a query string contains any of the following things, serve a 403". The "pipe" character between each condition is RegEx for "or". You can add more conditions for the query string by adding another pipe and then the condition. For example, to ban requests for calendars from the 2040s, add |cal_start_year=204 inside the parentheses. In conclusionWhy go to all this trouble to deal with a few issues that seem to only bother me? Well, if left unchecked they may start to affect others. Most websites nowadays are on shared hosting systems. troop53.net is included in that "most". By wasting bandwidth and server cycles these useless requests are, by default, using resources that could be put to use elsewhere. It's all part of being a good neighbor to those other sites hosted by the same server. |
| Disclaimer || Copyright © 2002-12 BSA Troop 53 || Privacy statement | |