An easy-to-configure, extensible veRL extension that brings the Anthropic Skill Creator into agentic RL training. Full control over skill versioning, sampling, bundle testing, and skill-policy co-evolution.
Official code for the paper: ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL.
- [2026-06] π Paper and codebase are now public. More are on the way... stay tracked!
(a) Inspired by Anthropic's human-in-the-loop Skill Creator, ReSkill recasts skill creation as an RL-in-the-loop process. (b) Compared with decoupled skill-update methods, ReSkill exposes a highly configurable loop for jointly evolving skills and policies.
ReSkill combines three pieces:
- RL training with per-turn skill customization: veRL handles distributed RL, while ReSkill follows the verl-agent design of decomposing multi-turn agent rollouts and adds skill loading into each turn.
- RL-in-the-loop skill creation: ReSkill adapts the structure of Anthropic's skill creator into an RL feedback loop for analyzing rollout experience and proposing skill updates during training.
- Skill versioning and sampling: ReSkill tracks skill versions, loads active skills, samples/testing skill bundles, and supports skill-policy co-evolution over training.
git clone https://github.com/amazon-science/reskill.git
cd reskill
git submodule update --init --recursive verl
pip install -e .Install only the benchmark and backend extras you need:
pip install -e ".[<env>,vllm]"Validated stack pins are recorded under requirements/.
The current benchmark extras are alfworld, search, and scienceworld.
Additional environment support will be added over time.
Prepare data for an environment:
python scripts/data_prep/prepare_<env>.py --output_dir data/<env>Run training:
python scripts/train.py --config-name <env>Concrete configs live under configs/, and cluster launch examples live under
scripts/launch/.
ReSkill is designed so both sides of the co-evolution loop can be customized.
- Policy side: customize the environment, rollout format, action projection, rewards, group rollout settings, and backend profiles.
- Skill side: customize skill-generation prompts, trigger behavior, active skill budgets, version testing/sampling, and skill library persistence.
This codebase is under active restructuring and testing as we work toward a stable release. Thank you for your patience and interest!
- Track newer veRL releases.
- Add SGLang rollout backend support.
- Add backend config profiles for vLLM and SGLang.
- Expand validated environment examples.
We thank the contributors to veRL, verl-agent, and Anthropic Skill Creator for their open-source foundations and inspiration, which ReSkill builds upon.
Apache 2.0
If you find this work helpful, please kindly consider citing our paper and starring the repository.
@article{he2026reskill,
title={ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL},
author={He, Zelin and Lin, Haotian and Han, Boran and Zhu, Wei and Fang, Haoyang and Wang, Bernie and Zhu, Xuan and Li, Runze and Reimherr, Matthew},
journal={arXiv preprint arXiv:2606.01619},
year={2026}
}