Ars Technica May 06, 2026 • 15:44

Google’s Gemma 4 open AI models use “speculative decoding” to get up to 3x faster - Ars Technica

By Ryan Whitwam

Google’s Gemma 4 open AI models use “speculative decoding” to get up to 3x faster - Ars Technica

Up to 3x the speed with no loss of quality—is it too good to be true?

More from Ars Technica