Skip to content

Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b#4028

Draft
gagika wants to merge 1 commit into
mainfrom
gagik-distill-perf
Draft

Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b#4028
gagika wants to merge 1 commit into
mainfrom
gagik-distill-perf

Conversation

@gagika
Copy link
Copy Markdown
Collaborator

@gagika gagika commented May 31, 2026

Description

One-command launchers for running distillation on TPU v7x. Each script sets the
right XLA flags, mounts a grain arrayrecord dataset via gcsfuse (ClimbMix by
default; configurable via XPK_DATASET_BUCKET / XPK_DATASET_SUBPATH),
configures distillation knobs, stages the HF tokenizer when needed, and submits
a workload via XPK.

Usage

# qwen3-30b-a3b-base distillation (~20% MFU)
bash scripts/distillation/distill_qwen3_30b_base.sh submit

# gpt-oss-20b distillation (~17% MFU)
bash scripts/distillation/distill_gpt_oss_20b.sh submit

# qwen3-30b at pdbs=8 with activation offload (~22% MFU)
XPK_DISTILL_CONFIG=src/maxtext/configs/post_train/distillation_qwen3_30b_base_pdbs8.yml \
XPK_YAML_GCS=gs://agagik-us/distill-configs/distillation_qwen3_30b_base_pdbs8.yml \
  bash scripts/distillation/distill_qwen3_30b_base.sh submit

Each launcher takes a mode argument (default submit):

  • submit — stage the YAML to GCS and create the xpk workload
  • monitor — stream logs for the last submitted workload
  • resume_until_done — auto-resubmit on failure until the run completes

Tests

End to end test for both gpt-oss and qwen3-30b models.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This Pull Request introduces distillation launchers and configurations for qwen3-30b-a3b-base and gpt-oss-20b models on TPU v7x. The additions are useful for standardizing distillation runs, but there are a few issues regarding redundancy and hardcoded personal paths.

🔍 General Feedback

  • Redundant Patch File: The file distillation-wrappers.patch appears to be a redundant diff of the entire PR and should be removed.
  • Hardcoded Defaults: Several scripts and configuration files contain default GCS paths and images pointing to personal buckets (agagik-us, yujiedeng-maxtext-dev). These should ideally be replaced with generic placeholders or public resources to improve maintainability and portability for other users.
  • Environment Management: The use of /dev/shm for TMPDIR and Hugging Face caches is a good performance optimization to avoid ephemeral storage limits, but setting it globally as TMPDIR should be done with caution.

export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}"
export XPK_ZONE="${XPK_ZONE:-us-central1}"
export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 These default paths point to personal GCS buckets and images (agagik-us). For a shared repository, it is better to use more neutral placeholders or public buckets if these resources are intended to be shared, or rely on environment variables without specific personal defaults.

Suggested change
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://your-bucket/distillation}"
export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/your-project/maxtext_base_image:latest}"

export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}"
export XPK_ZONE="${XPK_ZONE:-us-central1}"
export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}"
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Personal GCS paths used as defaults should be replaced with placeholders.

Suggested change
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}"
export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://your-bucket/distillation}"
export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/your-project/maxtext_base_image:latest}"

"$image_flag=$XPK_BASE_IMAGE" \
--command "export PYTHONPATH=/deps/src:/app/src; \
export BASE_OUTPUT_DIRECTORY=${OUTPUT_DIR}; \
export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Setting TMPDIR to /dev/shm globally for the workload execution can be risky if any process creates large temporary files, as it might exhaust the shared memory and lead to pod eviction or crashes. While beneficial for small files, consider if this is necessary for all processes or if it should be more scoped.

Suggested change
export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \
export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \
export JAX_COMPILATION_CACHE_DIR=/dev/shm/jax_cache; \

@@ -0,0 +1,450 @@
diff --git a/scripts/distillation/distill_gpt_oss_20b.sh b/scripts/distillation/distill_gpt_oss_20b.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This file seems to be a redundant patch containing the changes already present in other files of this PR. It should likely be removed before merging.

load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items"
teacher_overrides:
model_name: "gpt-oss-20b"
load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b_v2/teacher/0/items"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Similar to the student overrides, the teacher's load path points to a personal bucket. Consider using a neutral placeholder.

Suggested change
load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b_v2/teacher/0/items"
load_parameters_path: "gs://your-bucket/distillation/gpt_oss_20b_v2/teacher/0/items"

base_num_query_heads: 32
head_dim: 128
base_num_kv_heads: 4
load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Placeholder suggestion for student parameters path.

Suggested change
load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items"
load_parameters_path: "gs://your-bucket/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items"

export XPK_DATASET_SUBPATH="${XPK_DATASET_SUBPATH:-array-record/climbmix/*.arrayrecord}"

# Stage HF tokenizer files (not in the image for gpt-oss).
export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://agagik-us/distill-configs/tokenizer-gpt-oss-20b/}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Placeholder suggestion for the tokenizer path.

Suggested change
export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://agagik-us/distill-configs/tokenizer-gpt-oss-20b/}"
export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://your-bucket/distill-configs/tokenizer-gpt-oss-20b/}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant