Skip to content

Releases: FunAudioLLM/Fun-ASR

FunASR llama.cpp runtime v0.1.3

28 Jun 17:26
4492da2

Choose a tag to compare

Portable rebuild of the FunASR llama.cpp / GGUF runtime binaries.

This release rebuilds the v0.1.2 runtime with conservative CPU flags for the generic x64 assets. In particular, it disables ggml native CPU tuning and high-ISA AVX/AVX2/AVX512/VNNI/FMA/F16C/BMI2 flags so the prebuilt binaries do not depend on the GitHub Actions runner CPU instruction set.

Use this version if v0.1.2 crashes with Illegal instruction, SIGILL, or Windows error 0xC000001D.

FunASR llama.cpp runtime v0.1.2

21 Jun 17:55
f1a55b0

Choose a tag to compare

Prebuilt self-contained binaries for running Fun-ASR-Nano (and SenseVoice / Paraformer) locally with the FunASR llama.cpp / GGUF runtime — built-in FSMN-VAD, whisper.cpp-style on-device ASR, strong on Chinese.

New: q8 GGUF models are ~half the size of f16 with the same accuracy.

bash download-funasr-model.sh nano ./gguf
llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf -a audio.wav --vad ./gguf/fsmn-vad.gguf

No Python, no build. Linux (x64/arm64), macOS (arm64), Windows (x64). Docs: runtime/llama.cpp/README.md.

FunASR llama.cpp runtime v0.1.1

21 Jun 13:55
7bc0bef

Choose a tag to compare

Prebuilt, self-contained binaries to run Fun-ASR-Nano (and SenseVoice / Paraformer) locally with the FunASR llama.cpp / GGUF runtime — built-in FSMN-VAD, whisper.cpp-style on-device ASR, strong on Chinese.

bash download-funasr-model.sh nano ./gguf
llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf -a audio.wav --vad ./gguf/fsmn-vad.gguf

No Python, no build, no torch. Binaries for Linux (x64/arm64), macOS (arm64), Windows (x64). Docs: runtime/llama.cpp/README.md. CER (micro-avg): Fun-ASR-Nano 8.30% vs whisper.cpp small 22%.

v1.0.0: Fun-ASR-Nano — 31-Language End-to-End Speech Recognition

25 May 16:40
4d84613

Choose a tag to compare

Fun-ASR-Nano v1.0.0

The first official release of Fun-ASR-Nano, an end-to-end speech recognition large model trained on tens of millions of hours of real speech data.

Highlights

  • 31 languages — Chinese (+ 7 dialects, 26 regional accents), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, and 20+ European languages
  • Speaker diarization — identify who spoke when
  • Timestamps — word-level and sentence-level
  • Hotwords — boost recognition of domain-specific terms
  • Lyrics recognition — accurate transcription under music background
  • vLLM acceleration — 3-5x faster batch inference with WebSocket streaming

Quick Start

```python
from funasr import AutoModel

model = AutoModel(
model="FunAudioLLM/Fun-ASR-Nano-2512",
trust_remote_code=True,
device="cuda:0",
hub="hf"
)
result = model.generate(input="audio.wav")
```

Models

Model Languages Parameters Download
Fun-ASR-Nano-2512 Chinese/English/Japanese + dialects 800M ModelScope · HuggingFace
Fun-ASR-MLT-Nano-2512 31 languages 800M ModelScope · HuggingFace

Links