Testing cleanup#401
Open
Darlokt wants to merge 5 commits into
Open
Conversation
…ling
This commit refactors the test infrastructure to unify the handling of optional datasets across CI and local development.
Changes:
- Split existing tests into unit and integration tests, with clear guidelines in CONTRIBUTING.md, for better organization and clarity.
- Existing tests were split into unit and integration tests based on their scope.
- Split tests were moved to their new locations under `tests/unit/` and `tests/integration/`.
- Unified missing locally missing datasets handling
- Tests that require external datasets now check for their presence and skip with a clear message if unavailable.
- Previously only some test suites did so (macsima, visium hd, etc.) but in different ways, with a non unified behaviour.
- Added a new script `scripts/download_test_data.py` to download all optional datasets used by CI, with a clear CLI and documentation in CONTRIBUTING.md.
- This script centralizes the logic for downloading test datasets, making it easier for developers to set up their local environment with the same data used in CI.
- Added a new script `scripts/download_test_data_datasets.py` that defines the dataset keys and their metadata, which is used by both the downloader and the tests, to avoid duplication and ensure consistency.
- This script serves as a single source of truth for available test datasets, their keys, and metadata, improving maintainability.
- Allows for easier addition of new datasets in the future, as they only need to be added in one place.
- Allows for easier developer experience, as they can easily download the datasets using the same script used in CI, without needing to manually find and download them from their sources.
- Updated CI workflows to use the new downloader script, ensuring that the same datasets are used in both CI and local development.
- This change ensures consistency between CI and local testing environments, reducing the likelihood of discrepancies due to missing or different datasets.
- Bumped CI actions versions to their latest major versions
- Updated `actions/checkout` to v6 and `actions/setup-python` to v6 in all workflows.
- Updated `actions/cache` to v5 in the test workflow.
This commit refactors the test structure further into separate subdirectories for each reader under both `tests/integration/readers/` and `tests/unit/readers/`. In preparation for further test/reader improvements/refactoring to split the monolithic readers/tests into separate modules for more modularity, maintainability, testability, and clarity. Changes: - Move reader-specific integration tests into subdirectories under `tests/integration/readers/` (e.g. `tests/integration/readers/xenium/`). - Move reader-specific unit tests into subdirectories under `tests/unit/readers/` (e.g. `tests/unit/readers/xenium/`). - Update test imports and references to reflect the new directory structure. - Added per reader pytest markers (e.g. `@pytest.mark.xenium`) to allow for more flexible test selection. - Update contributing documentation to clarify the distinction between unit and integration tests, and to specify the use of dataset keys for integration tests that require external datasets.
This commit focuses on improvements to the CI workflow and test data management. Changes: - Updated the `prepare_test_data.yaml` - Adjusted the cron schedule to run on the first day of every other month at midnight, ensuring that test data is refreshed regularly while preventing unnecessary runs. - Increased the retention period for test data artifacts to 64 days, providing a longer window for access and reducing the likelihood of data expiration before it can be used. - Added a condition to update the artifact if the data handler script/dataset list has changed, ensuring that the test data is always up-to-date with the latest changes in the codebase. - Added comments and docstrings to the data handler scripts to improve code readability and maintainability. - Added small checker to DatasetList to ensure that all datasets have full integrity before collection. - Improved testing
This commit refactors the test data download scripts and splits the dataset list into a separate `.toml` file. Changes: - Moved the dataset list from `download_test_data.py` to `datasets.toml`. - Updated the download script to read from the new `.toml` file. - Updated the GitHub Actions workflow to reflect the changes in the download script. - Restructured the test data downloader into a subdirectory/script package for better organization.
This commit updates the datasets.toml file to include comments for each dataset from the original CI. Changes: - Added comments for each dataset in the datasets.toml file to preserve the information from the original CI.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #401 +/- ##
==========================================
+ Coverage 63.38% 63.47% +0.09%
==========================================
Files 26 26
Lines 3217 3217
==========================================
+ Hits 2039 2042 +3
+ Misses 1178 1175 -3 🚀 New features to boost your workflow:
|
Member
|
Thank you very much! Great work. I would probably suggest what I suggested in #400 (comment) -> let's try to adhere to the template first and then maybe get this in? |
Zethson
reviewed
May 16, 2026
Member
Zethson
left a comment
There was a problem hiding this comment.
Also, concerning downloads: We had great experiences with pooch and will likely slowly move all scverse packages to use it. I think this PR could also tackle this. Feel free to have a look at scirpy for example to learn how to use it well.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hej everyone,
This is the second patch series, this time its a bit bigger. It focuses on restructuring and improving the testing infrastructure. It doesn't touch the current test cases, but improves the infrastructure, provides a central place for dataset management (to also allow easier local dev setups), unifies skipping behavior and adds proper pytest markers for the different readers for easy differential tests.
Changes:
CONTRIBUTING.md, for better organization and clarity.tests/unit/andtests/integration/.tests/integration/readers/(e.g.tests/integration/readers/xenium/).tests/unit/readers/(e.g.tests/unit/readers/xenium/).@pytest.mark.xenium) to allow for more flexible test selection.scripts/test_data_downloaderdatasets.toml)actions/checkoutto v6 andactions/setup-pythonto v6 in all workflows.actions/cacheto v5 in the test workflow.