Skip to content
This repository has been archived by the owner on May 19, 2020. It is now read-only.

Support full Unicode in database. #812

Open
Zegnat opened this issue Jan 31, 2015 · 7 comments
Open

Support full Unicode in database. #812

Zegnat opened this issue Jan 31, 2015 · 7 comments
Assignees

Comments

@Zegnat
Copy link

Zegnat commented Jan 31, 2015

See How to support full Unicode in MySQL databases by Mathias Bynens:

Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Currently Anchor only lets me chose a utf8 based collation, but it would be better to offer utf8mb4 based collations.

@CraigChilds94
Copy link
Member

Thanks for your input, we haven't encountered this as a problem as of yet. (AS far as I'm aware)

Because this would require changing the configuration of the database we'll have to look into a way to migrate utf8 collated databases to utf8mb4 or what ever collation we end up choosing in the future. I'm pretty sure that it would cause problems if we weren't to do so.

@Zegnat
Copy link
Author

Zegnat commented Apr 3, 2015

True, I didn’t think about migrating existing installations. I simply edited $vars['collations'] (s/utf8_/utf8mb4_/g) and DB::factory’s charset setting before installing Anchor and it seemed to have no problems setting up the clean database.

@profi248
Copy link

profi248 commented Aug 6, 2015

This is pretty important. Have you heard about this? 😔😚😅😊😆😐😅😈😐😓😠😉😈😋😔😠
Unicode emoji is removed from post.

@TheBrenny
Copy link
Member

Try making a span element with class emoji, and set the content equal to an emoji code: <span class="theme-emoji" content="\1F60E"></span>

Why the class? So you can fix spacing issues in CSS. I'm only guessing that this may work, I saw a theme implement this kind of idea with the right arrows. Instead of the usual &rarr; the theme used the Unicode hex code.

@profi248
Copy link

profi248 commented Aug 7, 2015

Thanks for response, but nothing shows up 😠
I think that you sholuld really use proper Unicode

ALTER TABLE anchor_posts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

Running this for every table will convert it transparently on upgrade

@TheBrenny
Copy link
Member

Do you have a working test for this?

@Zegnat
Copy link
Author

Zegnat commented Aug 8, 2015

Running this for every table will convert it transparently on upgrade

According to Mathias’ article you need to run a little more than that. But yes, that is the main gist of it.

Do you have a working test for this?

I can only say that I haven’t seen any weird things happening with a database that was set to utf8mb4 from day one using my slightly modified Anchor installer. Probably also because utf8mb4 is backwards-compatible with utf8.

Big parts of Anchor are already UTF8 aware, e.g. the slug-function uses both mb_strtolower and htmlentities with their optional encoding parameter set to UTF-8. This issue is all about getting the storage database inline with that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants