My new software keeps adding new spammers to my htaccess list. It is getting very large. The spam, however, keeps arriving. I have blocked sever dozen Chinese networks, and dozens of hosting sites, but new spammers keep cropping up.
I am writing a program that takes my spam list and goes out to whois.lacnic.net and returns the range of the network where the ip originates, plus the country and the name of the network. I will let it run and then edit the results.
I don’t want to completely block European or North American networks such as COX, Verizon or other ISPs, because is where I make my money. I do want to block hosts like Nobis who allow spammers free reign. Looking up hundreds of networks by hand takes too long, so I need to finish the automation. This could knock off about two thirds of the IPs on the list and replace them with a single line of “deny” code for an entire network.
I want to organize the list so that people who want to allow access from Chinese networks or Russian networks can edit them easily but still leave the scummy hosting companies, and some other hot spots.
The code for this is event driven and plugs into the WordPress architecture. It is based on a plugin class I wrote. I am trying to make the plugin class generic with generic calls to many WP API functions so that I can translate them for use with JOOMLA or other forums and content managers. I hope to port to C and create an Apache handler so Hosting Companies can just plug it in for their users.
I am 63 years old and I hope to retire soon. My job takes most of my energy so I don’t have much time to work on this stuff. It is going very slowly. It won’t be long before this becomes a full time job for me.
Any ideas?
My hosting company is good, but it limits my “program executions” and I had to pay more for my WordPress sites because of this.
I started watching my logs and realized that there were ip addresses that hit me tens of thousands of times. Yandex robots hit one site 89,000 times in a 4 hour period. I tried slowing them down with robots.txt, and then just banned them. I don’t need a Russian language search engine, and legitimate traffic on my sites from Russia is miniscule.
I then found that dictionary attacks on admin passwords were hitting me thousands of times per hour. I altered Stop Spammers so I could easily add modules and created one that added spammers to htaccess automatically. I wrote others to detect robots, referrer spam and repeated hits to login, and set them all to kick off the htaccess update.
When the htaccess file started to grow, I noticed that the ip addresses of spammers were coming from the same subnets so I started banning spammers with a CIDR of /24. This was not enough and I started banning networks. I have automated the network look-up with a new set of programs, but I like to inspect the results because I don’t want to ban all of Verizon or Cox, which have paying customers. I am on the verge, however, of banning all of China and Russia.
An amazing thing has happened. At the start of this as high as 92% of the hits on my sites were from banned subnets. Now, however, the number is like 10%. The spammers have stopped trying to hit my site. Through all of this, my Quantcast statistics have remained steady and even gone up a little. It was like a switch was turned on and the spam sites all realized at the same time that hitting my sites is a waste of bandwidth.
I looked into a Bayesian solution, and it looks like the perfect solution, but it requires an “execution” so it will not work for me as just a plugin change. I found some code that looks easy enough and I was thinking about adding a new module to the altered stop spammers, but only to ban more hits with the deny IP in htaccess.
I looked into writing an Apache “MOD” which does all that stop spammers does, plus all of the above, but I decided that I would never have time to write it. (Also, I like PHP and haven’t written C in about 20 years.) I even thought about starting a kick-start project for $15,000 and taking off a month to write the MOD, but I don’t think that I could raise the money and I can’t afford to give up a good day job. At my age you don’t quit a job that pays $150k. I’m 63 years old, and I want to retire and keep bees, not start on difficult new projects.
Thanks for all of your support.
Keith
Hi Keith,
Have you thought about changing the approach from blocking based on IPs to a bayesian solution?
There was a very promising antispam product on the market 3-4 years ago but developed by someone way too busy to keep the development going so it just faded away and was discontinued even though I offered to “buy” – it was that good.
Have you ever looked into an bayesian anti spam solution? If you’re curious, email me and I’ll tell waht I know about that product.