[2.1] Upgrader - Fix issues with json conversion#9271
Conversation
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
|
Found an issue, need to do some retests... |
|
The issue I have found is specific to MariaDB, and specific to older 2.0 DBs that were not converted to utf8. MySQL is fine. And in fact, any forum that has $db_character_set properly specified is fine, including MariaDB. Note that 2.0 didn't set $db_character_set upon install; 2.0 would only set it upon utf8 conversion. 2.0 supported other charsets, e.g., latin5 - if a forum did so, my understanding is that it was up to the admin to set $db_character_set properly and set the db defaults properly before install. (I can't find this doc'd anywhere, but it seems to work...) So I am pretty sure most non-latin1 forums are OK, because they'd have $db_character_set set (or they wouldn't work...). So I believe the issue is effectively confined to (1) latin1, (2) MariaDB, (3) no $db_character_set, (4) forum DB upgraded & somewhat current. Without $db_character_set, SMF never executes a SET NAMES. Without SET NAMES, character_set_results, character_set_connection, character_set_client all default to character_set_server, which is typically utf8mb4 these days. So when the upgrader is communicating with the DB, there is unwanted translation occurring. MySQL seems to just roll with the punches. MariaDB is very, very picky... Seems like it'd be an obvious fix, just set $db_character_set internally under the proper conditions. But this isn't working for some reason, I haven't figured out why yet... Also note that MariaDB seems to be ignoring SMF's specified collation. Things aren't ending up utf8mb3_general_ci at the end of the upgrade (it's all utf8mb3_uca1400_ai_ci)... So there are other things going on here. Note things are far better with this PR - no more crashes, serialized string fix logic works, fewer blank/garbled strings. Just not 100%. Still testing. |
|
Hmmm... Remaining issues are specifically with log_actions, which gets multiple rounds of data fixes during the database changes steps, prior to json conversion... |
|
Got it... If you check the box 'Migrate to a new Settings file', with no value for $db_character_set, then in step 2, Upgrade Options, a new file is written with $db_character_set = 'utf8'. All subsequent steps load the new Settings.php and issue a SET NAMES to utf8. But the DB is not utf8 yet, so this causes issues. There are a few updates to log_actions during the DB updates. This is where the issue lies - the strings don't even make it to the json conversion. To confirm, I just ran a conversion without the 'Migrate to a new Settings file' box checked & it ran just fine, the data is all preserved. Should have a fix tomorrow. |



Partial for #9259
The json conversion step had some issues.
The net result is that many strings were either not converted or blanked out. Settings might be affected, e.g,. memberlist_cache. Most of the issues were with various logs, e.g., log_actions. qanda. Places where user-entered text ended up in a serialized string.
I use a simple "cheezy 2.1 encoding" technique to preserve the non-utf8 strings across json_encode. (Trademark pending.) Got a better idea, let me know... But it works. The UTF8 conversion, which comes later, works as desired & the content is properly migrated to UTF8.
This one is ready to go. If approved, I'll work on the 3.0 version.