Bruce-Lee-LY

Follow

Bruce-Lee-LY Bruce-Lee-LY

Follow

LLM Infer, AI Infra, CUDA

215 followers · 2 following

Tsinghua University
https://www.zhihu.com/people/mu-zi-zhi-6-28
https://bruce-lee-ly.medium.com

Achievements

Achievements

Pinned Loading

cuda_auto_tune cuda_auto_tune Public

NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.

Python 15 1
decoding_attention decoding_attention Public

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

C++ 46 4
cuda_hgemm cuda_hgemm Public

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 541 91
cuda_hook cuda_hook Public

Hooked CUDA-related dynamic libraries by using automated code generation tools.

C 172 47
cuda_hgemv cuda_hgemv Public

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 74 9
flash_attention_inference flash_attention_inference Public

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++ 45 7