Google on Wednesday debuted its new Gemini generative AI model. The platform serves as Google’s answer to Microsoft-backed (MSFT) OpenAI’s GPT-4, and according to DeepMind CEO Demis Hassabis, it’s the company’s “most capable and general model” yet.

Gemini is what is referred to as a natively multimodal model, meaning it can analyze text, audio, video, images, and code. While other multimodal offerings exist, Google says Gemini stands apart because the model was designed to take all of those mediums into account from the beginning.

Other platforms, the company said, train separate models to tackle things like text, video, and photos and then string them together into a single model.


Advertisement


This difference, according to Hassabis, means that Gemini can better understand multimodal data and produce better results for everything from handwritten content to images and videos.

As part of the announcement, Google released a series of videos demonstrating Gemini’s capabilities. In one video, a presenter showed a program running Gemini with a drawing of a blue duck as well as a rubber blue duck, both of which the AI was able to identify.

In another demonstration, the presenter showed the AI a hand-drawn picture of a roller coaster without a loop and another one with a loop. When the presenter asked which one is likely more fun, the AI said the one with the loop, which is the right answer unless you hate going around loops or on roller coasters in general.

Another example showed how parents can use Gemini to help their children with their homework. Not only is the AI able to read a student’s written answers to math questions, but it is also able to tell if they are correct or not and explain where the student went wrong and why.

On the coding front, Google said Gemini is one of the leading models for coding around, claiming that the AI can understand programming languages such as Python, Java, C++, and Go.

Google is rolling out three different versions of Gemini: Gemini Ultra, Gemini Pro, and Gemini Nano. Gemini Ultra is the top-of-the-line data center version of the AI model meant for what Google says are highly complex tasks. Gemini Pro is the mid-range version of the model, while Nano is the version designed to run on devices such as Google’s Pixel 8 Pro.