regarding to the multiple word checking I found a way to do it, but in Ruby I have no idea to get the REAL length of a (I guess) UTF-8 string. The point is that s="Fähre" s.length returns 6 not 5. I tried to set $KCODE = 'UTF-8' in the script but it doesn't work.
Take a look at the new Rails Multibyte character support, I think they tackled this problem already and the patches might point you in the right direction. Hell, you might be able to swipe their string extensions outright ;)
- Ben