高速推論

書いてみる

人気の記事一覧

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

6か月前

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

6か月前