VTech Math Lab 0 - Search News

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Thanks to AWQ, TinyChat can deliver more efficient responses with LLM/VLM chatbots through 4-bit inference. TinyChat with LLaMA-3-8b on RTX 4090 (2.7x faster than FP16): TinyChat with LLaMA-3-8b on ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Trending now