Delete legacy DPO implementation#3997
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
🤖 Hi @igorts-git, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request successfully deletes the legacy "native" Direct Preference Optimization (DPO) implementation from the pre-training code paths. This significant cleanup simplifies the codebase by removing redundant training logic, specialized state handling for reference models, and legacy input pipelines, directing users toward the newer Tunix-based post-training DPO pipeline.
🔍 General Feedback
- Comprehensive Cleanup: The removal is thorough across training (
train.py), state management (train_state_nnx.py), input pipelines (tfds_data_processing.py,grain_data_processing.py), and logging (metric_logger.py). - Excellent UX: The addition of a
ValueErrorinvalidate_train_configto explicitly guide users totrain_dpo.pyis a high-quality detail that improves discoverability and prevents confusion during the transition. - Maintainable Metrics: Transitioning
metric_logger.pyto use key-based checks for DPO reward accuracy instead of theuse_dpoflag is a more robust approach that supports both current and future post-training workflows.
90aecdd to
469b14d
Compare
1fb607f to
a5a4c57
Compare
a5a4c57 to
da7e9f9
Compare
|
Generally lgtm.
I think both can dropped. |
Description
This pull request deletes the legacy "native" Direct Preference Optimization (DPO) implementation from the pre-training code paths, leaving only the recently added, Tunix-based post-training DPO and ORPO pipelines in place.
Along the way also renamed the metric
eval/dpo_reward_accuracytoeval/avg_dpo_reward_accuracyto reflect the fact that Tunix aggregates the eval metrics and provides us with the average value. The legacy metric aggregation code was removed.As follow up to this PR I plan to send out PRs with the following:
BUGS: b/485626968
Tests
CI tests
Manual runs of DPO, ORPO, and SFT on a local VM.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.