Skip to content

Merge pbs-0.2.4 from 0.2 to main#143

Merged
percona-ysorokin merged 25 commits into
Percona-Lab:mainfrom
percona-ysorokin:merge_0_2_4
Jun 4, 2026
Merged

Merge pbs-0.2.4 from 0.2 to main#143
percona-ysorokin merged 25 commits into
Percona-Lab:mainfrom
percona-ysorokin:merge_0_2_4

Conversation

@percona-ysorokin

Copy link
Copy Markdown
Collaborator

No description provided.

percona-ysorokin and others added 25 commits March 17, 2026 13:52
…te (Percona-Lab#103)

https://perconadev.atlassian.net/browse/PS-10622

Version bump to '0.2.0':
* Updated 'app_version.hpp'
* Updated 'README.md'
* Updated Deb packaging files
* Updated RPMs packaging files

Enabled GitHub Actions for the '0.2' branch.

Added missing 'search_by_gtid_set'  command to the console application 'help' message.
…#108)

https://perconadev.atlassian.net/browse/PS-10910

Fixed an issue with forming URLs in search results for 'search_by_timestamp' /
'search_by_gtid_set' operations when for S3 storage backends with custom
endpoint URLs the name of the bucket was omitted.
https://perconadev.atlassian.net/browse/PS-11002

Having just single 'metadata.json' in the storage is no longer considered an
integrity violation for the 'storage' initialization logic. In this case we simply
validate this storage metadata file and return early (basically, consider this layout
an empty storage).
…a-Lab#114)

https://perconadev.atlassian.net/browse/PS-10281

'gtid_set_test' (BOOST_TEST_MODULE GtidSetTests) unit test extended with
new test case 'GtidSetSimulateSearchByGTIDSet' which simulates the logic of
'search_by_gtid_set' main application operation mode.

Updated unit tests 'CMakeLists.txt' file . We now run unit tests with
'--no_color_output' command line option which helps with better integration with
VS Code.
…ercona-Lab#115)

https://perconadev.atlassian.net/browse/PS-10911

Fixed problem with not being able to properly process ROTATE (artificial) events
after reconnection in GTID-based replication mode (in the 'pull' operation mode).
Original 'position' field checking in the artificial ROTATE event post-header makes
sense only for the position-based replication mode.

Extended 'log_storage_config_info()' - we now also print the masked storage URI
(the one with hidden 'user' / 'password' components) to the log.

Minor improvements in the log message formatting - now all 'trace' / 'debug' / 'info' / 'warning' / 'error' / 'fatal' log severity labels are padded to the same width.
…grading server from 8.0 to 8.4 (Percona-Lab#116)

https://perconadev.atlassian.net/browse/PS-10321

Fixed 'binlog_streaming.data_directory_8_0_to_8_4_upgrade' MTR test case that
started to fail after 30 days since it was created. The problem turned out to be
connected with enabled automatic purging of binary log files (set to 30 days) that
started to happen upon MySQL Server restart during 8.0 -> 8.4 data directory
upgrade. Fixed by adding the
"--log-error=$MYSQLD_LOG --binlog_expire_logs_auto_purge=OFF" command
line option to the restart command.

Stabilized running MTR in GitHub Actions by adding timeout parameters to the
"./mtr ..." commands:
* 'testcase-timeout' -  30 minutes
* 'suite-timeout'    -  60 minutes
* 'shutdown-timeout' - 600 seconds
…ary logs (Percona-Lab#118)

https://perconadev.atlassian.net/browse/PS-11054

It is now possible to start replication from MySQL servers that have non-empty
purged GTID set ('@@global.gtid_purged').
Internally, if we identify that the specified storage is empty, we try to extract the
set of GTIDs that were purged on the server side via
'SELECT '@@global.gtid_purged' and pass this info to the storage object. This
helps to make sure that the very first binlog will have this purged GTID set
stored in the 'previous_gtids' field of its metadata file (before this change, the
'previous_gtids' in the very first metadata file was always empty).  Because of
this change when we switch the connection to the replication mode we also
pass this purged GTID set as the initial GTID state to the 'mysql_binlog_open()'
client API call and don't get the
'Cannot replicate because the source purged required binary logs'
errors anymore.

'easymysql::connection' class extended with the new
'execute_select_query_string_result()' method that can be used for executing
single-value (single row + single column) queries returning a string value.

'binsrv::storage' class extended with new 'purged_gtids_' member that is used to
store information about purged GTIDs identified when storage was initialized for
the first time.

Raised log message severity from  'info' to 'error' for some connection exceptions.

Added new MTR test case 'binlog_streaming.gtid_purged' that checks if PBS can
start replicating from a server that has the very first binlog file purged.

Fixed README.md - the correct JSON filed name in the 'result' section of the
query responses is 'previous_gtids' (not 'initial_gtids').
…s; recovery requires manual intervention (Percona-Lab#120)

https://perconadev.atlassian.net/browse/PS-11033

Fixed problem with initializing a non-empty S3 storage that has more than 500
binlog files / binlog metadata files (1000 total).

Implemented pagination for the 'ListObjectsV2()' AWS SDK C++ API call in the
's3_storage_backend::aws_context::list_objects()' method. S3 API has a hard limit
that sets the max number of items in the 'ListObjectsV2' response to be no
greater than 1000. In case when the storage has more than 1000 objects, we now
perform several 'ListObjectsV2' calls with the continuation token.

Added 'binlog_streaming.thousand_binlogs' MTR test case that executes
'FLUSH BINARY LOG' 1000 times and executes PBS two times, on an empty and
populated storage.

Co-authored-by: Copilot <copilot@github.com>
…ication mode when network timeout interrupts a transaction (Percona-Lab#123)

https://perconadev.atlassian.net/browse/PS-11137

Always trim the in-memory event buffer to the last completed transaction
boundary on connection termination, regardless of the replication mode
(GTID-based or position-based).

Previously, 'storage::discard_incomplete_transaction_events()' was invoked
only when 'storage::is_in_gtid_replication_mode()' returned true, based on
the assumption that in position-based mode it is safe to resume streaming
from an arbitrary mid-transaction byte offset. This assumption turned out
to be incorrect: on reconnect, 'reader_context' always expects the first
logical event delivered after the pseudo-preamble (FDE + optional rotate)
to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see
'reader_context::process_event_in_gtid_log_expected_state'). If a
'mysql_binlog_fetch()' timeout / network error fires after the source
delivered 'anonymous_gtid_log' but before delivering the corresponding
'BEGIN' query event, the persisted stream offset would point at 'BEGIN',
and the next 'mysql_binlog_open()' attempt would resume the stream
mid-transaction, immediately triggering the
"expected gtid_log_event-like event" assertion in 'reader_context' and
terminating PBS.

Calling 'discard_incomplete_transaction_events()' unconditionally rewinds
the in-memory buffer (and therefore the offset persisted to the metadata
file) back to the previous transaction boundary, so reconnection in
position-based mode always restarts the stream from a position where the
next event is a GTID-style event, matching the reader state machine
expectations.
…part 1) (Percona-Lab#124)

https://perconadev.atlassian.net/browse/PS-11080

This is a prerequisite commit required to implement rewriting
'sequence_number' / 'last_committed' / 'transaction_length' in GTID events.

'gtis_log' / 'anonymous_gtid_log' / 'gtid_tagged_log' event body and post-header
classes from 'binsrv::events' extended with additional functionality that allows
to perform the following operations:
* manually construct
* modify (currently only 'sequence_number' and 'last_committed' fields)
* serialize
* deserialize

GTID_TAGGED_LOG event body deserialization (reading TLV pairs) extended with
checking that all non-optional fields were deserialized. Currently only
'original_commit_timestamp', 'original_server_version' and 'commit_group_ticket'
can be omitted.

'event_test.cpp' unit test extended with additional checks for encoding / parsing
GTID_LOG, ANONYMOUS_GTID_LOG and GTID_TAGGED_LOG events.

'byte_span_encoding_test.cpp' unit test extended with a new test case that checks
"packed int" conversion roundtrips.

Introduced new "common_types.hpp" include file in the "binsrv::events"
namespace that is supposed to define common types used between events.
Currently only 'seq_no_t' is defined there.

Added new "timestamp_helpers.hpp" helpers include file in the "util" namespace
that provides functions to convert between
std::chrono::high_resolution_clock::time_point' (aka 'high_resolution_time_point')
and an unsigned integer representing microseconds.

Fixed problem with 'immediate_server_version' / 'original_server_version' being
serialized / deserialized in invalid order inside GTID events.

Fixed problem in the 'util::insert_packed_int_to_byte_span_checked()' insertion
function with missing early returns causing invalid encoded data.

Fixed problem with invalid 'BOOST_TEST_MODULE' name in the 'uuid_test.cpp'.
Fixed problem with invalid 'BOOST_TEST_MODULE' name in the
'byte_span_encoding_test.cpp'.

Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…on, bypassing size/interval checkpointing (Percona-Lab#127)

https://perconadev.atlassian.net/browse/PS-11136

Problem:
In non-GTID (anonymous transaction) replication mode, PBS flushes its
in-memory event buffer to the storage backend on every transaction
boundary, ignoring the configured 'checkpoint_size_bytes' and
'checkpoint_interval_seconds' thresholds. For object-store backends
this turns into one PUT per transaction.

Cause:
'storage::write_event()' had a fast-path keyed on
'at_transaction_boundary && transaction_gtid.is_empty()' whose intent
was "flush regardless of thresholds because this is the file-final
ROTATE/STOP event". The condition wasn't tight enough: anonymous
transactions also satisfy it (they never populate 'transaction_gtid_'),
so every XID terminating an anonymous transaction was misidentified as
a file terminator and forced a synchronous flush.

Solution:
Removed the fast-path. The file-final ROTATE/STOP event is still
flushed - just through the already-existing 'storage::close_binlog()'
call on the 'process_rotate_or_stop_event()' / artificial-rotate
rename paths, which is the natural place for a file-boundary flush.
GTID-mode behavior is unchanged.
…part 2) (Percona-Lab#125)

https://perconadev.atlassian.net/browse/PS-11080

This is a prerequisite commit required to implement rewriting
'sequence_number' / 'last_committed' / 'transaction_length' in GTID events.

Along with last seen transaction GTID, 'binstv::events::reader_context' now also
keeps track of the last seen GTID event sequence_number.

At the same time 'binsrv::storage' expects one extra
'transaction_sequence_number' argument passed to its 'write_event()' method
(which is expected to be taken from the
'binstv::events::reader_context::get_transaction_sequence_number()'). It
implements the same logic of storing transaction's sequence_number for both
"ready to be flushed" buffered event data and for the "incomplete transaction"
buffered event data as for 'timestamps' and 'gtids'.

Binlog metadata file extended with one more value 'last_sequence_number' that
holds the 'sequence_number' value of the last GTID event (either GTID_LOG,
ANONYMOUS_GTID_LOG or GTID_TAGGED_LOG events) written to the binlog file.

Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…part 3) (Percona-Lab#128)

https://perconadev.atlassian.net/browse/PS-11080

Implemented generic event rewriting mechanism
('binsrv::events::rewriter::rewrite()' static method) that performs the following
operations.
* For GTID_LOG /ANONYMOUS_GTID_LOG it changes 'sequence_number' /
  'last_committed' fields in the post header and relocates the event (changes
  'next_event_position' in the common header).
* For GTID_TAGGED_LOG it changes 'sequence_number' / 'last_committed' in the
  serializer-encoded event body, which because of the variable length encoding
  of individual elements in the archive may change its size and may require
  updating 'transaction_length' in the event body and 'event_size' in the common
  header. In addition, it also performs event relocation.
* For every other event it simply performs event relocation.
All these changes are performed under a special guard
('binsrv::events::event_updatable_view::write_proxy') which guarantees that
the event checksum will be recalculated / added upon finalizing all field value
updates. We also make sure that all rewritten events will have a footer with
properly calculated checksum (event if it was not present in the original event).

The logic for updating 'sequence_number' /  'last_committed' in GTID events
is encapsulated inside ('fix_sequence_number_and_last_committed()' function).

Refactored the way how several event fields can be modified in a single run via
'event_updatable_view' with guarantee that event's checksum will be properly
recalculated. Introduced 'rewriter::generic_materialize()' template function that
accepts generic field modification functor. Also introduced two simplified helper
functions (which use the generic one underneath) for the most typical cases:
* 'rewriter::materialize()'
* 'rewriter::materialize_and_relocate()'

Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…part 4) (Percona-Lab#129)

https://perconadev.atlassian.net/browse/PS-11080

Added MTR test cases for the 'sequence_number' / 'last_committed' field
rewrite logic:
* 'gtid_renumbering' - checks for renumbering when remote rotation occurs.
* 'gtid_renumbering_local_rotation' - checks for renumbering when local rotation
  occurs.
* 'gtid_renumbering_resume' - checks that 'last_sequence_number' is restored
  properly from the binlog metadata file upon PBS restart.
* 'gtid_renumbering_resume_after_partial' - checks that 'last_sequence_number'
  in the binlog metadata file always corresponds to the actual data file (not the
  last seen value currently in the storage buffer). This helps with resuming PBS
  after the crash.

Currently 'gtid_renumbering_resume_after_partial' MTR test case requires
DEBUG_SYNC functionality to be available in the MySQL Server. Unfortunately,
at the moment we use release tarballs of the MySQL Server in GitHub Actions
workers and this test will always be skipped there.

Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…r directory is created in the current binary path rather than in /tmp (Percona-Lab#135)

https://perconadev.atlassian.net/browse/PS-11205

* Default buffer directory for S3 storage backend is now a unique subdirectory
under the OS temp directory (e.g. /tmp on Linux) if '<storage.fs_buffer_directory>'
configuration parameter is not set.
* Auto-created temp directory is removed on storage backend destruction;
  user-provided directories are never deleted.
* Updated README to document new behavior.
…part 5) (Percona-Lab#137)

https://perconadev.atlassian.net/browse/PS-11080

We now require that if we enable "rewrite" in gtid-based replication mode, all the
events coming from the MySQL server should include checksums (should be generated on the server with '@@global.binlog_checksum' set to 'CRC32').
We detect a violation of this rule during the inspection of the
FORMAT_DESCRIPTION event, report an error, and terminate the process with an
error status code.

This limitation is required for the scenario when MySQL server has one binlog file
with checksums enabled and another one with checksums disabled and we need
to combine them on the PBS side into a single one. To do this we need to add /
remove checksums from one group of events to make sure that all the events in
the rewritten binlog file have the same footer (either with or without checksum).
However, this cannot be easily done, because adding / removing footers to events
changes their sizes which in turn should be reflected in the 'transaction_length'
field of the GTID event preceding them. In other words, we cannot recalculate
this 'transaction_length' on the fly without receiving all events in its transaction.

Added new 'binlog_streaming.gtid_rewrite_enforce_checksums' MTR test case
with 2 combinations:
* we start with checksums enabled and rotate binlog to a new file with checksums disabled
* we start with checksums disabled and rotate binlog to a new file with checksums enabled
In both cases, we check that Binlog Server utility identifies the problem, logs an
error and terminates with an error status code.
…ransaction, bypassing size/interval checkpointing (Percona-Lab#139)

https://perconadev.atlassian.net/browse/PS-11136

Fixed problem with the 'binlog_streaming.binlog_flush' MTR test case which used
to pass absolute path instead of a relative one to AWS CLI 's3api head-object' invocation.
Co-authored-by: Vadim Yalovets <vadim.yalovets@percona.com>
@percona-ysorokin percona-ysorokin merged commit 82a93cc into Percona-Lab:main Jun 4, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants