They use a separate ngram model to generate the proposed sequence instead of ext...

		tripplyons on April 22, 2024 \| parent \| context \| favorite \| on: Lossless Acceleration of LLM via Adaptive N-Gram P... They use a separate ngram model to generate the proposed sequence instead of extra heads on top of the main model. The process of verifying the proposed sequence appears to be the same.