Skip to content

[2.1] Upgrader - Fix issues with json conversion#9271

Draft
sbulen wants to merge 3 commits into
SimpleMachines:release-2.1from
sbulen:21_upgr_fix_unserialize
Draft

[2.1] Upgrader - Fix issues with json conversion#9271
sbulen wants to merge 3 commits into
SimpleMachines:release-2.1from
sbulen:21_upgr_fix_unserialize

Conversation

@sbulen

@sbulen sbulen commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Partial for #9259

The json conversion step had some issues.

  • Due to a typo, safe_unserialize() string fixes were never applied.
  • json_encode can ONLY work on UTF8. Nothing else. Anytime a non-utf8 character was passed, the json_encode would fail & return false.
  • The result was not checked, & "false" was passed to updates, which, for example, might tell updateSettings() to decrement the value of the setting. MariaDB would crash rather than comply; MySQL would change affected settings to -1, e.g., memberlist_cache.
  • The json conversion occurs before the UTF8 conversion... So this would happen even in latin1 databases (umlauts & accents, etc.).

The net result is that many strings were either not converted or blanked out. Settings might be affected, e.g,. memberlist_cache. Most of the issues were with various logs, e.g., log_actions. qanda. Places where user-entered text ended up in a serialized string.

I use a simple "cheezy 2.1 encoding" technique to preserve the non-utf8 strings across json_encode. (Trademark pending.) Got a better idea, let me know... But it works. The UTF8 conversion, which comes later, works as desired & the content is properly migrated to UTF8.

This one is ready to go. If approved, I'll work on the 3.0 version.

sbulen added 3 commits June 14, 2026 14:43
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
Signed-off-by: Shawn Bulen <bulens@pacbell.net>
@sbulen

sbulen commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Examples of the types of issues you might see...

In smf_settings:

nax_mysql_settings2

In smf_log_actions:

nax_mysql_log_actions

@sbulen

sbulen commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

If json_encode DID find proper unicode, its default was to escape it, which was bad for SMF:

json_unicode_goof

@sbulen sbulen marked this pull request as draft June 19, 2026 07:05
@sbulen

sbulen commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Found an issue, need to do some retests...

@sbulen

sbulen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

The issue I have found is specific to MariaDB, and specific to older 2.0 DBs that were not converted to utf8.

MySQL is fine. And in fact, any forum that has $db_character_set properly specified is fine, including MariaDB.

Note that 2.0 didn't set $db_character_set upon install; 2.0 would only set it upon utf8 conversion. 2.0 supported other charsets, e.g., latin5 - if a forum did so, my understanding is that it was up to the admin to set $db_character_set properly and set the db defaults properly before install. (I can't find this doc'd anywhere, but it seems to work...) So I am pretty sure most non-latin1 forums are OK, because they'd have $db_character_set set (or they wouldn't work...). So I believe the issue is effectively confined to (1) latin1, (2) MariaDB, (3) no $db_character_set, (4) forum DB upgraded & somewhat current.

Without $db_character_set, SMF never executes a SET NAMES. Without SET NAMES, character_set_results, character_set_connection, character_set_client all default to character_set_server, which is typically utf8mb4 these days. So when the upgrader is communicating with the DB, there is unwanted translation occurring. MySQL seems to just roll with the punches. MariaDB is very, very picky...

Seems like it'd be an obvious fix, just set $db_character_set internally under the proper conditions. But this isn't working for some reason, I haven't figured out why yet... Also note that MariaDB seems to be ignoring SMF's specified collation. Things aren't ending up utf8mb3_general_ci at the end of the upgrade (it's all utf8mb3_uca1400_ai_ci)... So there are other things going on here.

Note things are far better with this PR - no more crashes, serialized string fix logic works, fewer blank/garbled strings. Just not 100%.

Still testing.

@sbulen

sbulen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Hmmm... Remaining issues are specifically with log_actions, which gets multiple rounds of data fixes during the database changes steps, prior to json conversion...

@sbulen

sbulen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Got it...

If you check the box 'Migrate to a new Settings file', with no value for $db_character_set, then in step 2, Upgrade Options, a new file is written with $db_character_set = 'utf8'.

All subsequent steps load the new Settings.php and issue a SET NAMES to utf8. But the DB is not utf8 yet, so this causes issues. There are a few updates to log_actions during the DB updates. This is where the issue lies - the strings don't even make it to the json conversion.

To confirm, I just ran a conversion without the 'Migrate to a new Settings file' box checked & it ran just fine, the data is all preserved.

Should have a fix tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant