Monday, August 23, 2010

Don't Filter Without Being Aware of the Scunthorpe Problem

I just happened to stumble across this question today while browsing StackOverflow. In it, a user is searching for a library that will filter profanity. Some suggestions are posed - as well as a problem with one of them. The Scunthorpe Problem. The gist of it is that naive filtering implementations will often filter innocent words that contain sequences of letters that are considered dirty by themselves.

I highly recommend reading the full wikipedia page - it is highly entertaining.

If you ever find you need to implement a profanity filter - make sure you don't forget about the Scunthorpe problem, or you may cause your users unnecessary strife.

My favorite word resulting from the scunthorpe problem: buttbuttinate.

Thursday, August 5, 2010

More on e-reading, and the Pragmatic Bookshelf

I had some ranting the other day on my first experimentation with e-reading, but afterwards I learned something cool to offset the potential downer. I had completely forgotten that I have PDF versions of lots of books from the Pragmatic Bookshelf.

I love the Pragmatic Bookshelf. Not only is it full of awesome books, but books I bought 5 years ago in print+pdf I can now convert and put on my iPad at no additional charge. This is the definition of sweet. All I had to do is log into my account, tell the gerbils which books I wanted in what formats, and waited a minute. There are a couple of old ones that are still only available in PDF - but I can still view those on my iPad, it just isn't as convenient yet. The Pickaxe is a notable example...but I'd say more than 75% of my books had e-reader formats available. Again, iOS 4 for iPad can't come soon enough - in this case for iBooks PDF reading.

At a bare minimum, I know that I can enjoy a good set of books on my iPad going forward. Exciting. Especially since there are still a couple I haven't read yet, and some I intend to re-read soon. There are countless more that I desire...

It's high time I create a bookshelf on this site, come to think of it... I'll put one up in the next week.

Tuesday, August 3, 2010

Adventures in E-reading

I read my first book and a half on the Kindle reader for iPad last weekend.

Specifically I enjoyed:
  • Masters of Doom - a great book primary about John Carmack and John Romero. If you are a fan of the genre, it's a great read. I stayed up past 4am reading, it was so engrossing. I remember being on the Software Creations BBS, which is mentioned in the book.
  • Peopleware - Productive Projects and Teams (Second Edition) - a highly recommended book. I'm only half way done, so a full write up will have to wait, but it is a very insightful (and also quick) read. 
Why did I choose the Kindle reader over iBooks? Kindle has a much better selection of books. So far iBooks is 0 for 3 on books I have wanted. Also - the Kindle reader is available on other platforms.

I love reading on the iPad. I can read with a light on - or without. The text was very readable and my eyes did not get put off by the backlit screen even after reading for 5 hours straight as I tore through Master of Doom.  Even though I already owned Peopleware in print, I paid the 10 bucks for the Kindle version anyway. That's a testament to how much I enjoyed reading on the iPad.

All is not well in the land of e-reading though. While I had no quality issues with Masters of Doom, Peopleware seems to be full of missing punctuation, at least one blatant typo, and a chapter that is in the wrong place. I did a quick search - Apparently I  am not  the  only one. Since I have the print version, I have verified that none of the issues I have seen occur in my print copy. What is going on? Could it be that physical books are being OCR'd, or worse, re-typed by hand? What on earth??? Shouldn't digital copies already exist at the publisher?

Here is an example from the Table of Contents in Peopleware:
Actual TOC from the Paperback copy
Kindle version
The chapter appears in the wrong place in the book. Intermezzo should be between chapters 9 and 10, not between chapters 8 and 9. Also, less importantly, the titles seem to have been truncated, and in the case of #8, the quotes are missing.

I found at least several instances of missing punctuation while reading so far, as well as one glaring typo:
"The only acceptable interruption there was a fire alarm, and it had to be for a real Tire."
Somehow a lowercase f turned into a capital T...

Maybe some more veteran e-readers can tell me if they run into this a lot. I find it seriously distracting when sentences are incomplete or I find typos in books. Especially if they are an artifact of the e-book translation, and not something the original editors missed.

Is this the state of ebooks these days? I hope not, or my adventures in e-reading will be short lived.

Monday, August 2, 2010

Stage. Stage. Stage. Always Stage.

It doesn't matter what software we're talking about.
  • MySQL
  • Hudson
  • Java releases
  • thirdparty libraries
  • Subversion client
  • you name it - if it can be upgraded or replaced, it can and needs to be staged.
 Always stage. ALWAYS.

If you don't stage, you're just punishing yourself (and potentially your coworkers and/or users). So be smart - always stage your upgrades. You'll get burned if you don't. It's just Murphy's Law applied to technology, really.

This isn't in response to recently getting burned myself, it's been a post that's been brewing for a while.

Many years ago, a friend of mine once wondered why an admin wouldn't upgrade PINE (a popular terminal email client at the time) as soon as it came out. Now I know why - because that admin was wise. They wanted to be sure it worked and didn't cause any regressions before subjecting their users to it.

I am continually surprised not just by how many things can go wrong, but also how frequently they do, despite the best efforts and intentions of developers. Just the other week, the tinest name change by Oracle in JDK 1.6.0 Update 21 caused Eclipse to fail to launch. Who'd have thought? And yet it happened. There are numerous other examples. I am sure you can think of plenty in whatever software you use. I can certainly think of several, just in the past couple of months.

Before your whole office upgrades to the latest version of Visual Studio, or the latest Subversion client, or any other software. Make sure someone tests it out first.

Stage, because anything that can go wrong will go wrong.