fix: ROUGE-1 eval returns 0 for non-English languages (ASCII-only tokenizer) by tcconnally · Pull Request #6136 · google/adk-python

tcconnally · 2026-06-15T21:21:00Z

Problem

When evaluating text in non-Latin scripts (Thai, Chinese, Japanese, Arabic, etc.), the v1 ROUGE-1 evaluator returns scores of 0.0 even when the response matches the expected output exactly.

Root cause: The rouge_score library's default tokenizer uses re.findall(r'\\w+', text) which only matches ASCII [a-zA-Z0-9_]. Non-Latin characters produce zero tokens → ROUGE-1 score of 0.0 regardless of correctness.

Reproduction (from #3111)

agent = Agent(
    model="gemini-2.5-flash",
    instruction='Reply with only the word "สวัสดี"',
)
# Agent responds "สวัสดี" → ROUGE-1 score: 0.0 (should be 1.0)

Fix

Added _unicode_tokenize function that:

Uses re.UNICODE flag for ASCII-majority text (preserves existing behavior)
Splits on Unicode whitespace/punctuation for non-ASCII text
Falls back to character-level tokens for scripts without word boundaries (Chinese, Japanese)

Closes #3111

rohityan · 2026-06-17T18:31:37Z

Hi @tcconnally , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors.

The default RougeScorer tokenizer uses r'\\w+' regex which only matches ASCII [a-zA-Z0-9_]. For non-Latin scripts (Thai, Chinese, Japanese, etc.), this returns zero tokens, causing ROUGE scores of 0.0 even when the response matches the expected output exactly. Added _unicode_tokenize function that uses re.UNICODE flag and falls back to character-level tokenization for non-ASCII scripts. Closes google#3111

- Replace function _unicode_tokenize with _UnicodeTokenizer class implementing the tokenize() method expected by RougeScorer - Move import re to module level - Fix double-escaped regex patterns (\w -> \w, remove unsupported \p{P}) - Add return type annotation for tokenize() to satisfy mypy strict mode - Fix RougeScorer constructor indentation

tcconnally · 2026-06-17T18:40:46Z

Fixed the pre-commit formatting issue (pyink). Rebased on main.

tcconnally force-pushed the fix/non-english-eval-rouge branch from e275a87 to 6dff0a2 Compare June 15, 2026 21:22

rohityan self-assigned this Jun 15, 2026

wyf7107 self-assigned this Jun 16, 2026

rohityan added the eval [Component] This issue is related to evaluation label Jun 17, 2026

rohityan removed their assignment Jun 17, 2026

rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Jun 17, 2026

tcconnally added 3 commits June 17, 2026 18:39

chore: apply pyink formatting

98396a4

tcconnally force-pushed the fix/non-english-eval-rouge branch from 9beec74 to 98396a4 Compare June 17, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ROUGE-1 eval returns 0 for non-English languages (ASCII-only tokenizer)#6136

fix: ROUGE-1 eval returns 0 for non-English languages (ASCII-only tokenizer)#6136
tcconnally wants to merge 3 commits into
google:mainfrom
Perseus-Computing-LLC:fix/non-english-eval-rouge

tcconnally commented Jun 15, 2026

Uh oh!

rohityan commented Jun 17, 2026

Uh oh!

tcconnally commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tcconnally commented Jun 15, 2026

Problem

Reproduction (from #3111)

Fix

Uh oh!

rohityan commented Jun 17, 2026

Uh oh!

tcconnally commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants