bogofilter and the corrupt wordlist.db

Looks like something burped on my mailserver and my bogofilter wordlist got too big. Probably something to do with limits anyhow. In any case I was looking for a way to recover from the issue and came across this pearl in the Bogofilter FAQ. Well, the advice is incomplete. If you really hose up the database then bogoutil -d will stop printing  entries before the end of the database. The next recovery step is to use the db utilities: db_dump and db_load to fix the database. db_dump -r (on FreeBSD db_dump-<version>) dumps the database into a text file and db_load creates a text file from a word list. The problem is that the advice in the bogofilter faq is out of date. It looks like there are some parameters that have to be specified. My solution: use db_dump without the -r that creates a broken database with a default header. Copy the header into the new text file and then append the output of db_dump -r to that. Et voila!

Alright, it’s no longer 1998!

One thing that really ticks me off the web designer conversation where your web design guy insists on designing to an 800×600 screen resolution to ensure that your pages will be accessible by everyone on the web. Today I ran across this nugget (opens in a new window). I’ve always said that this is so 1998 yet I’ve had this conversation as recently as 2007. Well, if you dissect the table you come up with this:

1920×1200 2.27%
1680×1050 8.72%
1440×900 18.37%
1366×768 20.76%
1280×1024 —-
1280×800 —-
1280×768 58.09%
1152×864 61.04%
1024×768 94.94%
800×600 100.00%
1920×1200 2.27%
1680×1050 8.72%
1280×1024 21.97%
1440×900 31.62%
1152×864 34.57%
1280×800 56.92%
1366×768 —-
1280×768 —-
1024×768 94.94%
800×600 100.00%

That’s right. If you design for 1024×768 you reaching nearly 95% of all the web browsers that participated in this survey. Now web designers can partly like it’s 2004!


In Dante’s Inferno there were circles in hell designed to separate the ordinary sinner: the guy who designed the keyboard I’m working with (which provides no feedback when a key has been struck for example) from the guy who deliberately put the “global nuclear war” button right next to the “toast apple poptarts” button. My  “9th circle of hell award” goes to the guys who designed the firewall that I’m working with lately. It appears that in their wisdom they’ve chosen to implement the “Red Alert — all hands on deck” alarm for the following scenario. You have a server connected to a client via tcp. The server is a fairly recent linux box that can do RFC1323 extensions. The client is a boring Windows XP box with a TCP RWin size of 65536 bytes. Between them is a Comcast business class Cable connection. In this scenerio the Windows box is trying to download a file from the server on the Comcast connection. The problem is literally that the connection is too fast for the Windows XP Box to fully cope. Nowadays when I test Comcast Cable connections I’m surprised to see anything less than 25Mbit/s.In whole numbers thats 25,000,o000 bits / sec. In more familiar units that 312.5 kBytes /s. The problem is that I’m starting to see firewalls that see this as an issue because they have been programmed with very conservative specifications about what constitutes a denial of service attack. I’m seeing firewalls that scream DOS when they are connected to a Business Cable modem line and have clients with tcp receive window size of 65536 bytes. Why? it’s simple. On aBusiness Cable line with 25Mbits/s download rate you have to be able to buffer 96kbytes/s in tcp windows just to keep up with a server (or client) at the other end of a fast line. These firewalls are calling DOS because  the other end can fill their TCP window and then some. The right thing to do is to watch. If the otherside wants to DOS you he’ll send many packets after your Rwin is filled. If he’s just a really fast server on a really fast pipe. He’ll respect your RWin and quit sending. If you’re firewall decides to be agressive and drop the connection (by proactively sending a TCP RST) then you should probably act accordingly.

My thanks to Chuck Skuba on this post. I have to be 100% and fess up that I gathered the data but he did the homework.

— Chris