MySQL encoding (was: [TxMt] mysql bundle, password issues, utf8 encoding)

Allan Odgaard throw-away-1 at macromates.com
Wed Jul 18 12:42:47 UTC 2007


On 17. Jul 2007, at 20:17, Chris Adams wrote:

> [...]
> I believe there's a simple fix for this problem. MySQL has a number  
> of places where you set the character set - the server, database,  
> table and individual columns can all be set individually - but  
> there's another less-publicized issue: the MySQL client library  
> defaults to latin1 for communication, which mangles your otherwise  
> clean UTF-8 path. If you run "SHOW VARIABLES LIKE '%character%'"  
> it'll show something like this: [...]

For me the ‘character_set_server’ shows as latin1 (on both my remote  
and local database, it seems to be the default, and prior to MySQL  
4.1.1 UTF-8 was not supported AFAIK).

So if I issue a ‘SET NAMES utf8;’ then it will actually break things  
because I store utf-8 text in my tables, and with the client and  
connection encoding set to non-latin1 (utf-8) MySQL will re-encode my  
table values (from latin1) to utf-8 (it should just transfer them  
“verbatim”, since my client interprets it as utf-8 regardless of the  
MySQL encoding variables).

So my database setup is wrong, i.e. my server encoding does not match  
the actual one used. But ignoring that I can’t change it (as my.cfg  
is off limits on the server), if I actually did change it, I would  
break all clients which are presently not paying respect to MySQL’s  
encoding stuff (these just transfer bytes back and forth), because  
they will not tell MySQL that they are sending/expecting utf-8, and  
so MySQL will convert the data received/sent to what it think it is/ 
expects (latin1).

Here db clients include third party web applications.

I am not experienced with this stuff, so I would love to be told  
wrong, but I think there is a lot of databases out there which has a  
server encoding of ‘latin1’ but actually store ‘utf8’ and none of the  
clients issue any commands to tell the server about what encoding  
they use/expect -- and this presently works fine because as long as  
server and client has the same setting, no conversion is done by  
MySQL, and MySQL is 8-bit clean (so it is safe to store utf-8 as  
latin1 etc.).

Your fix (for the ideal setup) breaks this -- if the majority of  
servers out there as the proper encoding set, it seems we should take  
the fix, otherwise I think a more practical solution would be to  
ensure that we set the same encoding as the server use, and let the  
user configure a per-connection encoding, which we then convert to  
ourself (I believe this is what CocoaMySQL does).

Comments?




More information about the textmate mailing list