OpenAI has introduced its most comprehensive artificial intelligence endeavor yet: a multimodal model that can communicate to users through both text and voice.

The company said Monday that GPT-4o, which will be rolling out in ChatGPT and the API over the next few weeks, can also recognize objects and images in real-time.

The model synthesizes a slew of AI capabilities that are already separately available in various other OpenAI models. But by combining all these modalities, OpenAI’s latest model is expected to process any combination of text, audio and visual inputs more efficiently.


Advertisement


Users can relay visuals — through their phone camera, by uploading documents, or by sharing their screen — all while conversing with the AI model as if they are in a video call. The company announced that the technology will be available for free, but paid users will have five times the capacity limit.

OpenAI, which Microsoft backs, also announced a new desktop app for ChatGPT, its popular chatbot first launched in 2022, on MacOS.

Mira Murati, chief technology officer at OpenAI, said during a livestream demonstration that making advanced AI tools available to users for free is a “very important” component of the company’s mission.

“This is the first time that we are really making a huge step forward regarding the ease of use. And this is incredibly important because we’re looking at the future of interaction between ourselves and the machines,” Murati said.

“And we think that GPT-4o is really shifting that paradigm into the future of collaboration where this interaction becomes much more natural and far, far easier.”

Team members demonstrated the new model’s audio capabilities during the livestream, and shared clips to social media.

An AI assistant that can reason in real time using vision, text and voice would enable the technology to perform a creative range of tasks — such as walking users through a math problem, translating languages during a conversation and reading human facial expressions.

Murati said in the livestream that GPT-4o’s response time is much faster than that of previous models. The model also significantly improves the quality and speed of its performance in 50 different languages.

Author

  • End Time Headlines

    End Time Headlines is a Ministry that provides News and Headlines from a "Prophetic Perspective" as well as weekly podcasts to inform and equip believers of the Signs and Seasons that we are living in today.

    View all posts