MySQL encoding (was: [TxMt] mysql bundle, password issues, utf8 encoding)
Allan Odgaard
throw-away-1 at macromates.com
Wed Jul 18 12:42:47 UTC 2007
On 17. Jul 2007, at 20:17, Chris Adams wrote:
> [...]
> I believe there's a simple fix for this problem. MySQL has a number
> of places where you set the character set - the server, database,
> table and individual columns can all be set individually - but
> there's another less-publicized issue: the MySQL client library
> defaults to latin1 for communication, which mangles your otherwise
> clean UTF-8 path. If you run "SHOW VARIABLES LIKE '%character%'"
> it'll show something like this: [...]
For me the ‘character_set_server’ shows as latin1 (on both my remote
and local database, it seems to be the default, and prior to MySQL
4.1.1 UTF-8 was not supported AFAIK).
So if I issue a ‘SET NAMES utf8;’ then it will actually break things
because I store utf-8 text in my tables, and with the client and
connection encoding set to non-latin1 (utf-8) MySQL will re-encode my
table values (from latin1) to utf-8 (it should just transfer them
“verbatim”, since my client interprets it as utf-8 regardless of the
MySQL encoding variables).
So my database setup is wrong, i.e. my server encoding does not match
the actual one used. But ignoring that I can’t change it (as my.cfg
is off limits on the server), if I actually did change it, I would
break all clients which are presently not paying respect to MySQL’s
encoding stuff (these just transfer bytes back and forth), because
they will not tell MySQL that they are sending/expecting utf-8, and
so MySQL will convert the data received/sent to what it think it is/
expects (latin1).
Here db clients include third party web applications.
I am not experienced with this stuff, so I would love to be told
wrong, but I think there is a lot of databases out there which has a
server encoding of ‘latin1’ but actually store ‘utf8’ and none of the
clients issue any commands to tell the server about what encoding
they use/expect -- and this presently works fine because as long as
server and client has the same setting, no conversion is done by
MySQL, and MySQL is 8-bit clean (so it is safe to store utf-8 as
latin1 etc.).
Your fix (for the ideal setup) breaks this -- if the majority of
servers out there as the proper encoding set, it seems we should take
the fix, otherwise I think a more practical solution would be to
ensure that we set the same encoding as the server use, and let the
user configure a per-connection encoding, which we then convert to
ourself (I believe this is what CocoaMySQL does).
Comments?
More information about the textmate
mailing list