Monday, August 23, 2010

Don't Filter Without Being Aware of the Scunthorpe Problem

I just happened to stumble across this question today while browsing StackOverflow. In it, a user is searching for a library that will filter profanity. Some suggestions are posed - as well as a problem with one of them. The Scunthorpe Problem. The gist of it is that naive filtering implementations will often filter innocent words that contain sequences of letters that are considered dirty by themselves.

I highly recommend reading the full wikipedia page - it is highly entertaining.

If you ever find you need to implement a profanity filter - make sure you don't forget about the Scunthorpe problem, or you may cause your users unnecessary strife.

My favorite word resulting from the scunthorpe problem: buttbuttinate.

No comments:

Post a Comment