Google Gemini now “better” than ChatGPT4p

Google has done it: the new Gemini model is here and goes straight to the top of the LLM leaderboard!

Google has done it: the new Gemini model is here and goes straight to the top of the LLM leaderboard!

What is the Chatbot Arena LLM Leaderboard?

The Chatbot Arena LLM Leaderboard gives an overview of how well the different large language models (LLMs) perform under real-world conditions. Specifically, it tests performance in areas such as language understanding, knowledge coverage, problem solving and generative capabilities.

These tests are carried out by one million users, each of whom must review anonymised results. Important: the assessments often rely on subjective impressions, and not all real-world use cases are taken into account. Given the large number of users, I still think this benchmark is highly meaningful.

The current results

Gemini currently outperforms everything else, including ChatGPT. But depending on the discipline, other models have the edge.

A brief explanation: Gemini is Google's large language model, ChatGPT is of course OpenAI's, Grok is from “Twitter” (xAI), Claude from Anthropic and Llama from Meta. The latter is the only one that is “quasi” open source. That is why NVIDIA also uses the model and is listed as publisher. Zhipu AI is a Chinese AI company.

But what does that mean in practice? The real question is: how well can a model be integrated into day-to-day work? Here, ChatGPT remains unbeatable at the moment with its broad user interface – web search, Canva integration and other productivity tools – when it comes to practical use.

When we develop new AI-based tools and processes at our company or Powdience, we sometimes use different tools for different sub-processes. In the social media content analyses for LinkedIn or Instagram (www.powdience.com/instagram-ai), ChatGPT is used for image analysis, Gemini for certain video analyses and our own models, for example, to generate the conclusions.