Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan, Matan Kalman, Yossi Matias. [pdf], [code], 2022.11. Accelerating Large Language Model Decoding with Speculative Sampling Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, John Jumper. [...
MULTI-DRAFT SPECULATIVE SAMPLING: CANONICAL DECOMPOSITION AND THEORETICAL LIMITS MAGICDEC: BREAKING THE LATENCY-THROUGHPUT TRADEOFF FOR LONG CONTEXT GENERATION WITH SPECULATIVE DECODINGTOWARDS OPTIMAL MULTI-DRAFT SPECULATIVE DECODINGSTAFF: SPECULATIVE CORESET SELECTION FOR TASKSPECIFIC FINE-TUNINGMIXTURE OF ATTENTIO...
Speculative sampling speeds up LLMs by breaking the process into two parts: guessing and verifying. Eagle predicts text features instead of tokens, improving both speed and accuracy significantly. Auto-regressive decoding is slow and costly for large language models (LLMs). Speculative sampl...
Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan, Matan Kalman, Yossi Matias. [pdf], [code], 2022.11. Accelerating Large Language Model Decoding with Speculative Sampling Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, John Jumper. [...
Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan, Matan Kalman, Yossi Matias. [pdf], [code], 2022.11. Accelerating Large Language Model Decoding with Speculative Sampling Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, John Jumper. [...