This tutorial covers the prompt templates that mirror action-translation's production prompts, and how glossary injection works.
action-translation is a GitHub Action that translates QuantEcon lecture
files from English to Chinese. It uses Claude with carefully crafted prompts
that include MyST Markdown preservation rules, language-specific formatting
guidelines, and a glossary of 300+ economics/math terms.
The action-basic and action-new prompt templates let you benchmark these
production-style prompts against the simpler default and academic prompts,
measuring whether the extra instructions improve translation quality.
| Template | File | Key features |
|---|---|---|
default |
prompts/default.txt |
Minimal — just "translate this" |
academic |
prompts/academic.txt |
Academic register, preserve math/code |
action-basic |
prompts/action-basic.txt |
MyST-aware rules: preserve directives, no mixed fence markers |
action-new |
prompts/action-new.txt |
MyST rules + glossary injection |
The action-basic prompt adds MyST Markdown-specific instructions without
requiring a glossary:
uv run qebench run --prompt action-basic --type sentences -n 20This tells the LLM to:
- Preserve all MyST directives and roles
- Keep code blocks and math equations untranslated
- Maintain heading structure
- Never mix fence markers (
$$vs```{math})
The action-new prompt includes all the rules from action-basic plus
a {glossary} section populated from action-translation's glossary:
uv run qebench run --prompt action-new --type sentences -n 20When qebench run detects the {glossary} placeholder in a prompt template,
it automatically:
- Loads the glossary from the URL in
config.yaml(glossary_path) - Caches it locally in
.cache/glossary.jsonfor offline use - Formats terms as
English → Chineselines - Injects the formatted glossary into the prompt
The glossary is fetched from action-translation's GitHub repository:
# config.yaml
glossary_path: https://raw.githubusercontent.com/QuantEcon/action-translation/main/glossary/zh-cn.jsonThe load_glossary() function in utils/dataset.py:
- Detects whether
glossary_pathis a URL or a local file path - For URLs: fetches via
urllib.requestand caches to.cache/glossary.json - For local paths: reads directly
- Falls back to the cached version if the network is unavailable
- Returns a
list[dict], where each dict has at leastenandzh-cnkeys
The glossary is fetched once and cached. To force a refresh:
from qebench.utils.dataset import load_glossary
glossary = load_glossary(force_refresh=True)If you have action-translation cloned locally, you can point to its
glossary file directly:
# config.yaml
glossary_path: /path/to/action-translation/glossary/zh-cn.jsonThe primary use case is comparing prompt effectiveness. Run the same entries through multiple prompts, then judge the results:
# Run all 4 prompts on the same 20 sentences
uv run qebench run --prompt default --type sentences -n 20
uv run qebench run --prompt academic --type sentences -n 20
uv run qebench run --prompt action-basic --type sentences -n 20
uv run qebench run --prompt action-new --type sentences -n 20
# Judge the results
uv run qebench judge -n 20The judge reveal panel shows automated formatting scores alongside your human ratings, so you can see whether the MyST-aware prompts actually prevent formatting errors.
To create a new prompt template:
-
Create a
.txtfile in theprompts/directory -
Use these placeholders (all required unless noted):
{source_lang}— source language (e.g., "en"){target_lang}— target language (e.g., "zh-cn"){domain}— entry domain (e.g., "economics"){text}— the text to translate{glossary}— (optional) glossary terms, auto-populated from config
-
Use double braces
{{...}}to escape literal braces in your prompt (e.g.,{{math}}renders as{math}in the final prompt)
Example — a prompt with glossary:
Translate from {source_lang} to {target_lang}.
Domain: {domain}
Use these terms consistently:
{glossary}
Text:
{text}
Then use it:
uv run qebench run --prompt my-custom-prompt -n 10- Running LLM Benchmarks — full tutorial on
qebench run - Judging Translations — evaluate results with formatting scores
- Seeding from Lectures — populate sentences/paragraphs for testing (developer guide)