Open-RL: self-hosted API for your RL Infrastructure

Open-RL implements Tinker compatible API for fine-tuning language models that you can run on your own infrastructure (machine or a kubernetes cluster). You can use the Tinker SDK to orchestrate RL training loops by writing imperative Python code directly from your local machine.

Why Tinker

We love Tinker. Tinker simplifies LLM post-training for developers and researchers. The Tinker API provides a smarter abstraction that decouples the underlying infrastructure from the RL training loop. This gives AI researchers complete control over their training algorithms, data loops, and loss functions and platform engineers the ability to scale the infrastructure independently.

Bonus: you can use tinker-cookbook that has awesome tutorials/recipes and utilities!

Quick Start

Follow the Pig Latin notebook or Text-to-SQL notebook to see supervised fine-tuning in action. Follow the RLVR example to see reinforcement learning in action.

Snippet below shows a sample Reinforcement Learning loop like GRPO, where the 4 API primitives are used to create a generate-and-reward-train loop:

import asyncio
import tinker
from tinker import types

# Placeholder Environment & Reward Functions
def generate_math_problem() -> str: ...
def compute_advantages(rewards: list[float]) -> list[float]: ...
def parse_and_score_response(text: str) -> float: ...

async def rlvr_loop():
    service_client = tinker.ServiceClient(base_url="http://localhost:8000")

    # 1. Create Model
    training_client = await service_client.create_lora_training_client_async(
        base_model="Qwen/Qwen3-4B-Instruct-2507", rank=16
    )

    for epoch in range(10):
        # 2A. Extract sampling client from current weights
        sampling_client = training_client.save_weights_and_get_sampling_client(
            name=f"rlvr_epoch_{epoch}"
        )
        
        prompt_text = generate_math_problem()
        
        # 2B. Sample multiple rollouts (e.g. N=8) from the prompt
        response = sampling_client.sample(
            prompt=types.ModelInput.from_ints(tokens=[...]),
            num_samples=8,
            sampling_params=types.SamplingParams(max_tokens=100, temperature=0.9)
        ).result()
        
        # 3. Score the rollouts using the environment
        rewards = []
        for seq in response.sequences:
            text = decode(seq.tokens)
            rewards.append(parse_and_score_response(text))
            
        advantages = compute_advantages(rewards)
        
        # ... package sequences, text, and advantages into datums ...

        # 4. Forward-Backward Pass (Importance Sampling)
        # We pass the advantages to RL objective function
        await training_client.forward_backward_async(
            datums, 
            loss_fn="importance_sampling",
            loss_fn_config={"clip_range": 0.2} 
        )
        
        # 5. Optimizer Step
        await training_client.optim_step_async(types.AdamParams(learning_rate=1e-5))

asyncio.run(rlvr_loop())

Documentation & Guides

Detailed guides have been structured in the docs/ directory:

Roadmap

Full Finetuning support
Model Checkpoints API
Use advance k8s primitives such as gang scheduling, kueue for capacity/quota management

Contributing

This project is licensed under the Apache 2.0 License.

We welcome contributions! Please see docs/contributing.md for more information.

We follow Google's Open Source Community Guidelines.

Disclaimer

This is not an officially supported Google product.

This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.github/workflows		.github/workflows
assets		assets
client		client
docs		docs
ops/cloud-monitoring		ops/cloud-monitoring
scripts		scripts
server		server
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ruff.toml		ruff.toml
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-RL: self-hosted API for your RL Infrastructure

Why Tinker

Quick Start

Documentation & Guides

Roadmap

Contributing

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open-RL: self-hosted API for your RL Infrastructure

Why Tinker

Quick Start

Documentation & Guides

Roadmap

Contributing

Disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages