Pivotal Token Search
-
Updated
Dec 20, 2025 - Python
Pivotal Token Search
A Survey of Direct Preference Optimization (DPO)
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
[TPAMI 2026] A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
[ICLR 2026] Official repository of "Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs".
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
RankPO: Rank Preference Optimization
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
[ICML 2025 Workshop FM4BS] AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
End-to-end pipeline for video safety alignment: SFT + DPO on Qwen3-VL with structured outputs, benchmark design, and evaluation of over-refusal.
The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.
a small, research-focused Python library for post-training Large Language Models with autotuning
[CC 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach
Red-teaming harness for open-weight LLMs (LLaMA, Mistral, Pythia). LoRA-SFT on 580 examples raised refusal rate from ~6% to 89% and cut harmful replies to 8%. Includes adversarial prompt dataset, SFT + DPO training scripts, and 6 published adapters.
Homework assignments for CMU 11-611 Natural Language Processing (Spring 2026) — covering language identification, n-gram LMs, text classification, machine translation evaluation, and DPO fine-tuning.
Experiments, and how-to guide for the lecture "Large language models for Scientometrics"
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
End-to-end DPO fine-tuning pipeline for paraphrase-type generation (M.Sc. thesis, arXiv:2506.02018). DPO on 1,040 human-ranked pairs raised type accuracy +3 pp and human preference +7 pp over SFT baseline. Llama-3.1-8B + BART-large. Models on HuggingFace.
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."