Skip to content

Project name strips non-ASCII (CJK) characters from path, resulting in truncated/unrecognizable names #571

Description

@XX888QM

Bug Description

When indexing a project with a path containing non-ASCII characters (e.g. Chinese/CJK), the generated project name strips all non-ASCII characters and only retains the ASCII portions of the path.

Steps to Reproduce

  1. Have a project at a path containing Chinese characters, e.g.:
    /Users/yunxin/Desktop/开发/后端/信租风控通后端
  2. Run index_repository on this path
  3. The generated project name becomes Users-yunxin-Desktop — all Chinese segments are silently dropped

Expected Behavior

The project name should either:

  • Preserve Unicode characters as-is: Users-yunxin-Desktop-开发-后端-信租风控通后端
  • Or percent-encode them: Users-yunxin-Desktop-%E5%BC%80%E5%8F%91-...
  • Or allow users to provide a custom name parameter in index_repository to override the auto-generated name

Actual Behavior

All non-ASCII path segments are stripped. For the path above, the result is just Users-yunxin-Desktop, losing all meaningful project identification.

Impact

  • Duplicate entries in the UI when the same project is indexed via both its real path and a symlink with an ASCII name
  • Project names are meaningless for users with non-ASCII (Chinese, Japanese, Korean, etc.) directory structures
  • No way to distinguish between multiple projects under the same ASCII-only parent path

Environment

  • codebase-memory-mcp v0.8.1
  • macOS (Apple Silicon)
  • Project path with Chinese characters

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingux/behaviorDisplay bugs, docs, adoption UX

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions