Popular on Hacker News, this technique lets you run LLM inference significantly faster by predicting what comes next and verifying in parallel instead of waiting token-by-token. If you're building with open models or running local inference, this is directly applicable — faster responses without retraining.