Learn extra at:
Google has unveiled a preview of Gemini 2.5 Flash-Lite, a reasoning mannequin optimized for price and velocity, and introduced that two different Gemini fashions, Gemini 2.5 Professional and Gemini 2.5 Flash, at the moment are usually accessible.
Google made the bulletins June 17. Gemini 2.5 fashions are pondering fashions, able to reasoning by means of ideas earlier than responding, leading to enhanced efficiency and improved accuracy, Google mentioned.
Gemini 2.5 Flash-Lite has the bottom price and lowest latency within the Gemini 2.5 mannequin household, Google mentioned. Flash-Lite is a reasoning mannequin that permits dynamic management of the pondering finances by way of an API parameter, however as a result of Flash-Lite is optimized for low latency and low price, pondering is turned off by default. This mannequin is “nice” for top throughput duties equivalent to classification or summarization at scale, Google mentioned. Constructed as an improve to Gemini 1.5 Flash and a couple of.0 Flash fashions, Gemini 2.5 Flash-Lite affords higher efficiency throughout most evals and decrease time to the primary token, whereas additionally attaining greater tokens per second decode, based on Google. Every Gemini 2.5 mannequin has management over the pondering finances, giving builders the flexibility to decide on when and the way a lot the mannequin thinks earlier than producing a response.