Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b#4028
Add distillation launchers for qwen3-30b-a3b-base and gpt-oss-20b#4028gagika wants to merge 1 commit into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
94bcf37 to
e202120
Compare
|
🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request introduces distillation launchers and configurations for qwen3-30b-a3b-base and gpt-oss-20b models on TPU v7x. The additions are useful for standardizing distillation runs, but there are a few issues regarding redundancy and hardcoded personal paths.
🔍 General Feedback
- Redundant Patch File: The file
distillation-wrappers.patchappears to be a redundant diff of the entire PR and should be removed. - Hardcoded Defaults: Several scripts and configuration files contain default GCS paths and images pointing to personal buckets (
agagik-us,yujiedeng-maxtext-dev). These should ideally be replaced with generic placeholders or public resources to improve maintainability and portability for other users. - Environment Management: The use of
/dev/shmforTMPDIRand Hugging Face caches is a good performance optimization to avoid ephemeral storage limits, but setting it globally asTMPDIRshould be done with caution.
| export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}" | ||
| export XPK_ZONE="${XPK_ZONE:-us-central1}" | ||
| export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}" | ||
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}" |
There was a problem hiding this comment.
🟢 These default paths point to personal GCS buckets and images (agagik-us). For a shared repository, it is better to use more neutral placeholders or public buckets if these resources are intended to be shared, or rely on environment variables without specific personal defaults.
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}" | |
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://your-bucket/distillation}" | |
| export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/your-project/maxtext_base_image:latest}" |
| export XPK_PROJECT="${XPK_PROJECT:-cloud-tpu-multipod-dev}" | ||
| export XPK_ZONE="${XPK_ZONE:-us-central1}" | ||
| export XPK_DEVICE_TYPE="${XPK_DEVICE_TYPE:-tpu7x-4x4x4}" | ||
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}" |
There was a problem hiding this comment.
🟢 Personal GCS paths used as defaults should be replaced with placeholders.
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://agagik-us/distillation}" | |
| export XPK_BASE_OUTPUT_DIR="${XPK_BASE_OUTPUT_DIR:-gs://your-bucket/distillation}" | |
| export XPK_BASE_IMAGE="${XPK_BASE_IMAGE:-gcr.io/your-project/maxtext_base_image:latest}" |
| "$image_flag=$XPK_BASE_IMAGE" \ | ||
| --command "export PYTHONPATH=/deps/src:/app/src; \ | ||
| export BASE_OUTPUT_DIRECTORY=${OUTPUT_DIR}; \ | ||
| export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \ |
There was a problem hiding this comment.
🟡 Setting TMPDIR to /dev/shm globally for the workload execution can be risky if any process creates large temporary files, as it might exhaust the shared memory and lead to pod eviction or crashes. While beneficial for small files, consider if this is necessary for all processes or if it should be more scoped.
| export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \ | |
| export LIBTPU_INIT_ARGS='${libtpu_init_args}'; \ | |
| export JAX_COMPILATION_CACHE_DIR=/dev/shm/jax_cache; \ |
| @@ -0,0 +1,450 @@ | |||
| diff --git a/scripts/distillation/distill_gpt_oss_20b.sh b/scripts/distillation/distill_gpt_oss_20b.sh | |||
There was a problem hiding this comment.
🟡 This file seems to be a redundant patch containing the changes already present in other files of this PR. It should likely be removed before merging.
| load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items" | ||
| teacher_overrides: | ||
| model_name: "gpt-oss-20b" | ||
| load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b_v2/teacher/0/items" |
There was a problem hiding this comment.
🟢 Similar to the student overrides, the teacher's load path points to a personal bucket. Consider using a neutral placeholder.
| load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b_v2/teacher/0/items" | |
| load_parameters_path: "gs://your-bucket/distillation/gpt_oss_20b_v2/teacher/0/items" |
| base_num_query_heads: 32 | ||
| head_dim: 128 | ||
| base_num_kv_heads: 4 | ||
| load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items" |
There was a problem hiding this comment.
🟢 Placeholder suggestion for student parameters path.
| load_parameters_path: "gs://agagik-us/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items" | |
| load_parameters_path: "gs://your-bucket/distillation/gpt_oss_20b/student_orbax_188_grad_lora16_kl_50ep/0/items" |
| export XPK_DATASET_SUBPATH="${XPK_DATASET_SUBPATH:-array-record/climbmix/*.arrayrecord}" | ||
|
|
||
| # Stage HF tokenizer files (not in the image for gpt-oss). | ||
| export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://agagik-us/distill-configs/tokenizer-gpt-oss-20b/}" |
There was a problem hiding this comment.
🟢 Placeholder suggestion for the tokenizer path.
| export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://agagik-us/distill-configs/tokenizer-gpt-oss-20b/}" | |
| export XPK_TOKENIZER_GCS="${XPK_TOKENIZER_GCS:-gs://your-bucket/distill-configs/tokenizer-gpt-oss-20b/}" |
Description
One-command launchers for running distillation on TPU v7x. Each script sets the
right XLA flags, mounts a grain arrayrecord dataset via gcsfuse (ClimbMix by
default; configurable via
XPK_DATASET_BUCKET/XPK_DATASET_SUBPATH),configures distillation knobs, stages the HF tokenizer when needed, and submits
a workload via XPK.
Usage
Each launcher takes a mode argument (default
submit):submit— stage the YAML to GCS and create the xpk workloadmonitor— stream logs for the last submitted workloadresume_until_done— auto-resubmit on failure until the run completesTests
End to end test for both gpt-oss and qwen3-30b models.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.