Merge pbs-0.2.4 from 0.2 to main#143
Merged
Merged
Conversation
…te (Percona-Lab#103) https://perconadev.atlassian.net/browse/PS-10622 Version bump to '0.2.0': * Updated 'app_version.hpp' * Updated 'README.md' * Updated Deb packaging files * Updated RPMs packaging files Enabled GitHub Actions for the '0.2' branch. Added missing 'search_by_gtid_set' command to the console application 'help' message.
…#108) https://perconadev.atlassian.net/browse/PS-10910 Fixed an issue with forming URLs in search results for 'search_by_timestamp' / 'search_by_gtid_set' operations when for S3 storage backends with custom endpoint URLs the name of the bucket was omitted.
https://perconadev.atlassian.net/browse/PS-11002 Having just single 'metadata.json' in the storage is no longer considered an integrity violation for the 'storage' initialization logic. In this case we simply validate this storage metadata file and return early (basically, consider this layout an empty storage).
…a-Lab#114) https://perconadev.atlassian.net/browse/PS-10281 'gtid_set_test' (BOOST_TEST_MODULE GtidSetTests) unit test extended with new test case 'GtidSetSimulateSearchByGTIDSet' which simulates the logic of 'search_by_gtid_set' main application operation mode. Updated unit tests 'CMakeLists.txt' file . We now run unit tests with '--no_color_output' command line option which helps with better integration with VS Code.
…ercona-Lab#115) https://perconadev.atlassian.net/browse/PS-10911 Fixed problem with not being able to properly process ROTATE (artificial) events after reconnection in GTID-based replication mode (in the 'pull' operation mode). Original 'position' field checking in the artificial ROTATE event post-header makes sense only for the position-based replication mode. Extended 'log_storage_config_info()' - we now also print the masked storage URI (the one with hidden 'user' / 'password' components) to the log. Minor improvements in the log message formatting - now all 'trace' / 'debug' / 'info' / 'warning' / 'error' / 'fatal' log severity labels are padded to the same width.
…grading server from 8.0 to 8.4 (Percona-Lab#116) https://perconadev.atlassian.net/browse/PS-10321 Fixed 'binlog_streaming.data_directory_8_0_to_8_4_upgrade' MTR test case that started to fail after 30 days since it was created. The problem turned out to be connected with enabled automatic purging of binary log files (set to 30 days) that started to happen upon MySQL Server restart during 8.0 -> 8.4 data directory upgrade. Fixed by adding the "--log-error=$MYSQLD_LOG --binlog_expire_logs_auto_purge=OFF" command line option to the restart command. Stabilized running MTR in GitHub Actions by adding timeout parameters to the "./mtr ..." commands: * 'testcase-timeout' - 30 minutes * 'suite-timeout' - 60 minutes * 'shutdown-timeout' - 600 seconds
…ary logs (Percona-Lab#118) https://perconadev.atlassian.net/browse/PS-11054 It is now possible to start replication from MySQL servers that have non-empty purged GTID set ('@@global.gtid_purged'). Internally, if we identify that the specified storage is empty, we try to extract the set of GTIDs that were purged on the server side via 'SELECT '@@global.gtid_purged' and pass this info to the storage object. This helps to make sure that the very first binlog will have this purged GTID set stored in the 'previous_gtids' field of its metadata file (before this change, the 'previous_gtids' in the very first metadata file was always empty). Because of this change when we switch the connection to the replication mode we also pass this purged GTID set as the initial GTID state to the 'mysql_binlog_open()' client API call and don't get the 'Cannot replicate because the source purged required binary logs' errors anymore. 'easymysql::connection' class extended with the new 'execute_select_query_string_result()' method that can be used for executing single-value (single row + single column) queries returning a string value. 'binsrv::storage' class extended with new 'purged_gtids_' member that is used to store information about purged GTIDs identified when storage was initialized for the first time. Raised log message severity from 'info' to 'error' for some connection exceptions. Added new MTR test case 'binlog_streaming.gtid_purged' that checks if PBS can start replicating from a server that has the very first binlog file purged. Fixed README.md - the correct JSON filed name in the 'result' section of the query responses is 'previous_gtids' (not 'initial_gtids').
…s; recovery requires manual intervention (Percona-Lab#120) https://perconadev.atlassian.net/browse/PS-11033 Fixed problem with initializing a non-empty S3 storage that has more than 500 binlog files / binlog metadata files (1000 total). Implemented pagination for the 'ListObjectsV2()' AWS SDK C++ API call in the 's3_storage_backend::aws_context::list_objects()' method. S3 API has a hard limit that sets the max number of items in the 'ListObjectsV2' response to be no greater than 1000. In case when the storage has more than 1000 objects, we now perform several 'ListObjectsV2' calls with the continuation token. Added 'binlog_streaming.thousand_binlogs' MTR test case that executes 'FLUSH BINARY LOG' 1000 times and executes PBS two times, on an empty and populated storage. Co-authored-by: Copilot <copilot@github.com>
…ication mode when network timeout interrupts a transaction (Percona-Lab#123) https://perconadev.atlassian.net/browse/PS-11137 Always trim the in-memory event buffer to the last completed transaction boundary on connection termination, regardless of the replication mode (GTID-based or position-based). Previously, 'storage::discard_incomplete_transaction_events()' was invoked only when 'storage::is_in_gtid_replication_mode()' returned true, based on the assumption that in position-based mode it is safe to resume streaming from an arbitrary mid-transaction byte offset. This assumption turned out to be incorrect: on reconnect, 'reader_context' always expects the first logical event delivered after the pseudo-preamble (FDE + optional rotate) to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see 'reader_context::process_event_in_gtid_log_expected_state'). If a 'mysql_binlog_fetch()' timeout / network error fires after the source delivered 'anonymous_gtid_log' but before delivering the corresponding 'BEGIN' query event, the persisted stream offset would point at 'BEGIN', and the next 'mysql_binlog_open()' attempt would resume the stream mid-transaction, immediately triggering the "expected gtid_log_event-like event" assertion in 'reader_context' and terminating PBS. Calling 'discard_incomplete_transaction_events()' unconditionally rewinds the in-memory buffer (and therefore the offset persisted to the metadata file) back to the previous transaction boundary, so reconnection in position-based mode always restarts the stream from a position where the next event is a GTID-style event, matching the reader state machine expectations.
…part 1) (Percona-Lab#124) https://perconadev.atlassian.net/browse/PS-11080 This is a prerequisite commit required to implement rewriting 'sequence_number' / 'last_committed' / 'transaction_length' in GTID events. 'gtis_log' / 'anonymous_gtid_log' / 'gtid_tagged_log' event body and post-header classes from 'binsrv::events' extended with additional functionality that allows to perform the following operations: * manually construct * modify (currently only 'sequence_number' and 'last_committed' fields) * serialize * deserialize GTID_TAGGED_LOG event body deserialization (reading TLV pairs) extended with checking that all non-optional fields were deserialized. Currently only 'original_commit_timestamp', 'original_server_version' and 'commit_group_ticket' can be omitted. 'event_test.cpp' unit test extended with additional checks for encoding / parsing GTID_LOG, ANONYMOUS_GTID_LOG and GTID_TAGGED_LOG events. 'byte_span_encoding_test.cpp' unit test extended with a new test case that checks "packed int" conversion roundtrips. Introduced new "common_types.hpp" include file in the "binsrv::events" namespace that is supposed to define common types used between events. Currently only 'seq_no_t' is defined there. Added new "timestamp_helpers.hpp" helpers include file in the "util" namespace that provides functions to convert between std::chrono::high_resolution_clock::time_point' (aka 'high_resolution_time_point') and an unsigned integer representing microseconds. Fixed problem with 'immediate_server_version' / 'original_server_version' being serialized / deserialized in invalid order inside GTID events. Fixed problem in the 'util::insert_packed_int_to_byte_span_checked()' insertion function with missing early returns causing invalid encoded data. Fixed problem with invalid 'BOOST_TEST_MODULE' name in the 'uuid_test.cpp'. Fixed problem with invalid 'BOOST_TEST_MODULE' name in the 'byte_span_encoding_test.cpp'. Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…on, bypassing size/interval checkpointing (Percona-Lab#127) https://perconadev.atlassian.net/browse/PS-11136 Problem: In non-GTID (anonymous transaction) replication mode, PBS flushes its in-memory event buffer to the storage backend on every transaction boundary, ignoring the configured 'checkpoint_size_bytes' and 'checkpoint_interval_seconds' thresholds. For object-store backends this turns into one PUT per transaction. Cause: 'storage::write_event()' had a fast-path keyed on 'at_transaction_boundary && transaction_gtid.is_empty()' whose intent was "flush regardless of thresholds because this is the file-final ROTATE/STOP event". The condition wasn't tight enough: anonymous transactions also satisfy it (they never populate 'transaction_gtid_'), so every XID terminating an anonymous transaction was misidentified as a file terminator and forced a synchronous flush. Solution: Removed the fast-path. The file-final ROTATE/STOP event is still flushed - just through the already-existing 'storage::close_binlog()' call on the 'process_rotate_or_stop_event()' / artificial-rotate rename paths, which is the natural place for a file-boundary flush. GTID-mode behavior is unchanged.
…part 2) (Percona-Lab#125) https://perconadev.atlassian.net/browse/PS-11080 This is a prerequisite commit required to implement rewriting 'sequence_number' / 'last_committed' / 'transaction_length' in GTID events. Along with last seen transaction GTID, 'binstv::events::reader_context' now also keeps track of the last seen GTID event sequence_number. At the same time 'binsrv::storage' expects one extra 'transaction_sequence_number' argument passed to its 'write_event()' method (which is expected to be taken from the 'binstv::events::reader_context::get_transaction_sequence_number()'). It implements the same logic of storing transaction's sequence_number for both "ready to be flushed" buffered event data and for the "incomplete transaction" buffered event data as for 'timestamps' and 'gtids'. Binlog metadata file extended with one more value 'last_sequence_number' that holds the 'sequence_number' value of the last GTID event (either GTID_LOG, ANONYMOUS_GTID_LOG or GTID_TAGGED_LOG events) written to the binlog file. Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…part 3) (Percona-Lab#128) https://perconadev.atlassian.net/browse/PS-11080 Implemented generic event rewriting mechanism ('binsrv::events::rewriter::rewrite()' static method) that performs the following operations. * For GTID_LOG /ANONYMOUS_GTID_LOG it changes 'sequence_number' / 'last_committed' fields in the post header and relocates the event (changes 'next_event_position' in the common header). * For GTID_TAGGED_LOG it changes 'sequence_number' / 'last_committed' in the serializer-encoded event body, which because of the variable length encoding of individual elements in the archive may change its size and may require updating 'transaction_length' in the event body and 'event_size' in the common header. In addition, it also performs event relocation. * For every other event it simply performs event relocation. All these changes are performed under a special guard ('binsrv::events::event_updatable_view::write_proxy') which guarantees that the event checksum will be recalculated / added upon finalizing all field value updates. We also make sure that all rewritten events will have a footer with properly calculated checksum (event if it was not present in the original event). The logic for updating 'sequence_number' / 'last_committed' in GTID events is encapsulated inside ('fix_sequence_number_and_last_committed()' function). Refactored the way how several event fields can be modified in a single run via 'event_updatable_view' with guarantee that event's checksum will be properly recalculated. Introduced 'rewriter::generic_materialize()' template function that accepts generic field modification functor. Also introduced two simplified helper functions (which use the generic one underneath) for the most typical cases: * 'rewriter::materialize()' * 'rewriter::materialize_and_relocate()' Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…part 4) (Percona-Lab#129) https://perconadev.atlassian.net/browse/PS-11080 Added MTR test cases for the 'sequence_number' / 'last_committed' field rewrite logic: * 'gtid_renumbering' - checks for renumbering when remote rotation occurs. * 'gtid_renumbering_local_rotation' - checks for renumbering when local rotation occurs. * 'gtid_renumbering_resume' - checks that 'last_sequence_number' is restored properly from the binlog metadata file upon PBS restart. * 'gtid_renumbering_resume_after_partial' - checks that 'last_sequence_number' in the binlog metadata file always corresponds to the actual data file (not the last seen value currently in the storage buffer). This helps with resuming PBS after the crash. Currently 'gtid_renumbering_resume_after_partial' MTR test case requires DEBUG_SYNC functionality to be available in the MySQL Server. Unfortunately, at the moment we use release tarballs of the MySQL Server in GitHub Actions workers and this test will always be skipped there. Co-authored-by: Kamil Holubicki <kamil.holubicki@percona.com>
…r directory is created in the current binary path rather than in /tmp (Percona-Lab#135) https://perconadev.atlassian.net/browse/PS-11205 * Default buffer directory for S3 storage backend is now a unique subdirectory under the OS temp directory (e.g. /tmp on Linux) if '<storage.fs_buffer_directory>' configuration parameter is not set. * Auto-created temp directory is removed on storage backend destruction; user-provided directories are never deleted. * Updated README to document new behavior.
…part 5) (Percona-Lab#137) https://perconadev.atlassian.net/browse/PS-11080 We now require that if we enable "rewrite" in gtid-based replication mode, all the events coming from the MySQL server should include checksums (should be generated on the server with '@@global.binlog_checksum' set to 'CRC32'). We detect a violation of this rule during the inspection of the FORMAT_DESCRIPTION event, report an error, and terminate the process with an error status code. This limitation is required for the scenario when MySQL server has one binlog file with checksums enabled and another one with checksums disabled and we need to combine them on the PBS side into a single one. To do this we need to add / remove checksums from one group of events to make sure that all the events in the rewritten binlog file have the same footer (either with or without checksum). However, this cannot be easily done, because adding / removing footers to events changes their sizes which in turn should be reflected in the 'transaction_length' field of the GTID event preceding them. In other words, we cannot recalculate this 'transaction_length' on the fly without receiving all events in its transaction. Added new 'binlog_streaming.gtid_rewrite_enforce_checksums' MTR test case with 2 combinations: * we start with checksums enabled and rotate binlog to a new file with checksums disabled * we start with checksums disabled and rotate binlog to a new file with checksums enabled In both cases, we check that Binlog Server utility identifies the problem, logs an error and terminates with an error status code.
…ransaction, bypassing size/interval checkpointing (Percona-Lab#139) https://perconadev.atlassian.net/browse/PS-11136 Fixed problem with the 'binlog_streaming.binlog_flush' MTR test case which used to pass absolute path instead of a relative one to AWS CLI 's3api head-object' invocation.
Co-authored-by: Vadim Yalovets <vadim.yalovets@percona.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.