Releases: FunAudioLLM/Fun-ASR
FunASR llama.cpp runtime v0.1.3
Portable rebuild of the FunASR llama.cpp / GGUF runtime binaries.
This release rebuilds the v0.1.2 runtime with conservative CPU flags for the generic x64 assets. In particular, it disables ggml native CPU tuning and high-ISA AVX/AVX2/AVX512/VNNI/FMA/F16C/BMI2 flags so the prebuilt binaries do not depend on the GitHub Actions runner CPU instruction set.
Use this version if v0.1.2 crashes with Illegal instruction, SIGILL, or Windows error 0xC000001D.
FunASR llama.cpp runtime v0.1.2
Prebuilt self-contained binaries for running Fun-ASR-Nano (and SenseVoice / Paraformer) locally with the FunASR llama.cpp / GGUF runtime — built-in FSMN-VAD, whisper.cpp-style on-device ASR, strong on Chinese.
New: q8 GGUF models are ~half the size of f16 with the same accuracy.
bash download-funasr-model.sh nano ./gguf
llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf -a audio.wav --vad ./gguf/fsmn-vad.ggufNo Python, no build. Linux (x64/arm64), macOS (arm64), Windows (x64). Docs: runtime/llama.cpp/README.md.
FunASR llama.cpp runtime v0.1.1
Prebuilt, self-contained binaries to run Fun-ASR-Nano (and SenseVoice / Paraformer) locally with the FunASR llama.cpp / GGUF runtime — built-in FSMN-VAD, whisper.cpp-style on-device ASR, strong on Chinese.
bash download-funasr-model.sh nano ./gguf
llama-funasr-cli --enc ./gguf/funasr-encoder-f16.gguf -m ./gguf/qwen3-0.6b-q8_0.gguf -a audio.wav --vad ./gguf/fsmn-vad.ggufNo Python, no build, no torch. Binaries for Linux (x64/arm64), macOS (arm64), Windows (x64). Docs: runtime/llama.cpp/README.md. CER (micro-avg): Fun-ASR-Nano 8.30% vs whisper.cpp small 22%.
v1.0.0: Fun-ASR-Nano — 31-Language End-to-End Speech Recognition
Fun-ASR-Nano v1.0.0
The first official release of Fun-ASR-Nano, an end-to-end speech recognition large model trained on tens of millions of hours of real speech data.
Highlights
- 31 languages — Chinese (+ 7 dialects, 26 regional accents), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, and 20+ European languages
- Speaker diarization — identify who spoke when
- Timestamps — word-level and sentence-level
- Hotwords — boost recognition of domain-specific terms
- Lyrics recognition — accurate transcription under music background
- vLLM acceleration — 3-5x faster batch inference with WebSocket streaming
Quick Start
```python
from funasr import AutoModel
model = AutoModel(
model="FunAudioLLM/Fun-ASR-Nano-2512",
trust_remote_code=True,
device="cuda:0",
hub="hf"
)
result = model.generate(input="audio.wav")
```
Models
| Model | Languages | Parameters | Download |
|---|---|---|---|
| Fun-ASR-Nano-2512 | Chinese/English/Japanese + dialects | 800M | ModelScope · HuggingFace |
| Fun-ASR-MLT-Nano-2512 | 31 languages | 800M | ModelScope · HuggingFace |
Links
- Documentation: https://www.funasr.com
- vLLM Guide: docs/vllm_guide.md
- FunASR toolkit: https://github.com/modelscope/FunASR