On 17. Jul 2007, at 20:17, Chris Adams wrote:
[...] I believe there's a simple fix for this problem. MySQL has a number of places where you set the character set - the server, database, table and individual columns can all be set individually - but there's another less-publicized issue: the MySQL client library defaults to latin1 for communication, which mangles your otherwise clean UTF-8 path. If you run "SHOW VARIABLES LIKE '%character%'" it'll show something like this: [...]
For me the ‘character_set_server’ shows as latin1 (on both my remote and local database, it seems to be the default, and prior to MySQL 4.1.1 UTF-8 was not supported AFAIK).
So if I issue a ‘SET NAMES utf8;’ then it will actually break things because I store utf-8 text in my tables, and with the client and connection encoding set to non-latin1 (utf-8) MySQL will re-encode my table values (from latin1) to utf-8 (it should just transfer them “verbatim”, since my client interprets it as utf-8 regardless of the MySQL encoding variables).
So my database setup is wrong, i.e. my server encoding does not match the actual one used. But ignoring that I can’t change it (as my.cfg is off limits on the server), if I actually did change it, I would break all clients which are presently not paying respect to MySQL’s encoding stuff (these just transfer bytes back and forth), because they will not tell MySQL that they are sending/expecting utf-8, and so MySQL will convert the data received/sent to what it think it is/ expects (latin1).
Here db clients include third party web applications.
I am not experienced with this stuff, so I would love to be told wrong, but I think there is a lot of databases out there which has a server encoding of ‘latin1’ but actually store ‘utf8’ and none of the clients issue any commands to tell the server about what encoding they use/expect -- and this presently works fine because as long as server and client has the same setting, no conversion is done by MySQL, and MySQL is 8-bit clean (so it is safe to store utf-8 as latin1 etc.).
Your fix (for the ideal setup) breaks this -- if the majority of servers out there as the proper encoding set, it seems we should take the fix, otherwise I think a more practical solution would be to ensure that we set the same encoding as the server use, and let the user configure a per-connection encoding, which we then convert to ourself (I believe this is what CocoaMySQL does).
Comments?