Google Gemini vs ChatGPT: Everything We Know So Far

Fahim ul Haq
Dev Learning Daily
Published in
9 min readDec 18, 2023

--

Just last week, Google unveiled the first legitimate competitor to OpenAI’s ChatGPT: Gemini. Its Pro version is available to the public today.

The newest brainchild of Google Deepmind (Alphabet’s AI lab) may be the most impressive generative AI to date. Where Gemini differs from other LLMs is its focus on multimodality (essentially, its ability to parse different media formats). Gemini can reportedly have full conversations while constantly switching between text, audio, image, video, and code.

In Google’s video demonstration, they showcase Gemini’s ability to parse and respond to visual inputs. We see that Gemini can recognize abstract human drawings and make deductions based on simple contextual elements. Google wants us to think that this all happens in real-time with almost no latency, but this may not actually be the case (more on this controversy later).

That being said, there’s a lot of hype around Gemini right now — not all of it as positive as Google may hope. Is it possible that Gemini isn’t the huge leap in Gen AI that Google claims it is? And what’s going on in the AI arms race such that Google is debuting this tool now?

Today, I’d like to discuss Google’s big-picture strategy with Gemini, and what developers can do to prepare for the next wave of Gen AI products, frameworks, and applications to come.

Developers can now build with Gemini Pro through Google Cloud Studio!

Let’s dive in.

Why Google? Why now?

Sundar Pichai, CEO of Alphabet, recently outlined how Gemini directly aligns with Google’s mission statement — “to organize all the world’s information and make it universally accessible and useful.”

Here’s how I read this: As the breadth and depth of information has expanded in recent years, Google needs a breakthrough in technology in order to keep up. But, there is a little more to the Gemini release than just a new way to organize the world’s information.

The future of Google depends on its investment in Generative AI.

If users turn to chatbots to ask questions instead of typing keywords into a search bar, Google’s entire business model falls apart. This change wouldn’t just affect Google. Millions of sites that get promoted by Google when a user searches would be under threat, too.

Google knows this and is investing heavily in generative AI. If they can get out in front of companies like OpenAI and Anthropic, they can control the narrative of how people search for information.

I think it is safe to assume that Gemini will likely become part of Google Search. Traditional search engines are such a widely understood technology that I doubt Google will completely abandon Search, but I think Gen AI products will be built into the search experience.

When it comes to how they position against OpenAI (and Microsoft), I think Google is primed to catch up. They have a massive amount of money to invest in Gemini, and are not limited by the immense compute costs that are costing OpenAI truckloads of money. Google has ample data centers that can keep up with the toll of running models as large as Gemini Ultra.

Additionally, Google has a leg up on its competitors when it comes to training data. Google runs an enormous web crawling operation and is able to scrape and ingest massive amounts of human-generated information. OpenAI was able to sneak in and scrape content farms like Reddit, Quora, and Twitter before they were shuttered; new AI models don’t have unregulated access to training data like they did a few years ago.

I predict that more of the big players in the AI space (mostly Google and Microsoft) will make bids to buy content farms like Reddit and Quora. One of the biggest challenges for generative AI moving forward is recency. Gen AI tools don’t have immediate access to current events. This poses a real challenge to the engineers designing these systems, but to even begin they need access to news and current events content as training data.

X (formerly Twitter) is uniquely positioned to tackle current events with Gen AI. Grok, X’s in-house AI, has access to tweets and can theoretically be the first Gen AI to understand current event prompts in real-time.

Ultimately, it’s clear that Google Gemini is Alphabet’s play at future-proofing themselves for the AI arms race. But let’s see how it stacks up against the main player in the space today: Open AI’s GPT-4.

GPT-4 vs. Gemini: Key Similarities

GPT-4 and Gemini are both Large Language Models that are capable of understanding and generating responses across a variety of mediums.

At a high level, they look fairly similar.

Currently, both LLMs are offered in different sizes. GPT-4 is OpenAI’s largest offering and compares to Gemini Ultra in terms of total parameters and functionality. The free version of ChatGPT now runs on ChatGPT-3.5. And, according to benchmarks, GPT-3.5 compares to Gemini Pro, which is currently available in Google’s chatbot, Bard.

The full offering of Gemini’s sizes are outlined below.

  • Gemini Ultra: Full size model for highly complex tasks (unknown parameter size).
  • Gemini Pro: A balance of performance and deployability at scale, integrated into Bard
  • Gemini Nano: The smallest model designed to run in on-device applications. Broken into two different sizes.
  • Nano-1: 1.8B parameters
  • Nano-2: 3.5B parameters

GPT-4 vs. Gemini: Key Differences

So far, Gemini’s main claim to fame is that it is capable of outperforming human experts in terms of accuracy. In Google’s release notes for Gemini they walk through extensive benchmark tests that compare GPT-4 and Gemini.

The one test that they call out most prominently is the Massive Multitask Language Understanding (MMLU) benchmark. MMLU covers a wide range of subject matter — from world knowledge to problem solving — and ranges from elementary to professional difficulty levels. According to Google’s data, most human experts pass around 86% accuracy, but Gemini Ultra is benchmarked at 90%.

That said, MMLU is just one benchmark in a long list of tests that LLMs are measured on. Out of the eight total text benchmarks, Gemini beats out GPT-4 on seven of them. The one loss to GPT-4 was in commonsense reasoning and problem-solving, a benchmark called HellaSwag.

You can check out how the two stack up on paper below.

Note: the values included in this figure were obtained from Google Deepmind’s own calculations

Despite benchmarking higher, Gemini’s wins seem relatively minor. The models are separated by only a few percentage points on each test.

How do Gen AI Benchmarks work?

The long list of benchmarks for assessing Gen AI performance are a relatively recent step forward in measuring these vastly complex tools.

AI benchmarks are tests for Gen AI systems. AIs are asked questions and their responses are measured against correct or expected results. In the case of the MMLU benchmark, the test consists of multiple choice questions that cover a wide range of subjects and difficulties. These test questions are often directly sourced from educational material and real-world exams.

That said, researchers at Stanford claim that current benchmarks are insufficient and do little to shape the development of AI systems in meaningful ways. Currently there are no benchmarks in place that are capable of measuring multiple AI features/traits at a time. Plus, there is very little benchmark testing that can be done on a model’s proclivity to generate toxic or unsafe responses.

In order to better understand generative AI we will need more varied and more comprehensive benchmarks.

Is Gemini more multimodal that ChatGPT?

When it comes to Generative AI, “multimodality” refers to a model’s ability to parse and generate data across multiple media formats. Historically, Gen AI have been relegated to reading and writing text, but recently this has expanded to include code, images, video, and audio.

ChatGPT also expanded to incorporate multimodal prompts (in addition to producing images with Dall-E), but it appears that Gemini may be a step ahead. According to the demo, Gemini looks to have much more sophisticated multimodal prompting and response capabilities, but is that actually the case?

Is Google’s demo video really “fake”?

The demo video that Google released showcased Gemini’s ability to reason, understand context, and respond in a variety of media formats. The video is especially impressive given that it appears to be a live feed. Gemini’s low latency makes the far-off dreams of real-time over-the-shoulder AI assistance seem right around the corner.

That is, until you check the description of the YouTube video where Google writes a brief disclaimer:

“For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”

So, the live camera feed wasn’t live after all. Instead, Google prompted Gemini using a series of still images combined with text. Gemini really responded to all of these visual inputs, just not through a live camera feed. You can watch the video for yourself below:

Many have accused Google of being misleading about Gemini’s capabilities, saying they are cherry-picking situations that uncharacteristically emphasize the model’s speed and performance.

So, since ChatGPT and Gemini offer similar multimodal capabilities and Gemini can’t really produce responses in real-time yet, are the only differences between ChatGPT and Gemini a slight upgrade in on-paper performance?

It’s impossible to say yet. We will have to wait to find out for sure. Currently, we can compare Gemini Pro (available in Bard) to GPT-3.5, but the real heavyweight matchup remains undecided. When Google releases Gemini Ultra to the public, and users get a real opportunity to A/B test Gemini and ChatGPT, we will be able to discern more minute differences.

What will the future of generative AI (and Cloud) look like?

Competition in the Gen AI space will only help develop better, more sophisticated AI products in the near future.

It seems that AI, Machine Learning, and computing have a long way to go before we reach personal AI assistants, but Gemini represents an intentional step in that direction.

For now, OpenAI is tied at the hip with Azure, so they are limited by latency. OpenAI has massive compute requirements for running GPT-4, especially considering their number of users. In fact, a lot of the money that Microsoft invested into OpenAI was in Azure credits.

Due to the GPU requirements for running LLMs, it’s possible that future applications that leverage Gen AI will have to be built on the cloud service that is associated with the model. For example, if you’re building an app that incorporates Gemini, you would likely expect the best integration (and potentially hosting costs) through Google Cloud. Similarly, Microsoft Azure would probably offer better support for developers integrating GPT-4.

It is entirely unclear if this will be the case, but it could have significant effects when it comes to the future of cloud computing. Both Google Cloud and Microsoft Azure host a fraction of the traffic that AWS does, but as AI applications grow, that may change.

I’ll be curious to see how Gemini (and other Gen AI products) evolve in 2024 and beyond. If the future of Gen AI comes down to a compute power arms race, then Google can definitely catch up to OpenAI. Put simply, Google has more resources (money, data centers, infrastructure, etc.).

How will AI tools like Gemini affect software engineers?

I’ve said before that the expansion of AI across products will create new demand for developers, and I think that Google’s Gemini announcement reinforces this claim. It’s the first real competitor to ChatGPT and GPT-4, but it seems unable to significantly push the envelope of what is possible with Gen AI– at least for now.

Despite Gen AI being able to read and write code, it does not pose a threat to the job security of developers. Good developers are paid to solve problems, not to just write code. Gen AI will help spike productivity across fields, software engineering included, but I question its ability to completely replace talent professionals.

As AI and machine learning become more commonplace product offerings, software developers will need to understand the basics of integrating AI tools into existing applications, or designing large-scale applications that incorporate AI building blocks.

Very few engineers will be working on the actual models themselves, but hundreds of thousands of engineers will need to leverage AI starting very soon. In terms of technical skills, if you don’t already have experience with any of the following fields you should consider upskilling:

  • Prompt Engineering
  • Data Science Basics (data preprocessing, data visualization, etc)
  • Machine Learning Fundamentals (neural networks, Gen AI API integration, etc)
  • Gen AI Frameworks (LangChain)

It’s also our year-end sale right now, which means that now is a really good time to lock in access to the entire Educative platform at solid discount.

If you want to add AI/ML/Data skills to your toolkit in 2024, we have plenty of hands-on Data Science resources to get you started.

I hope you enjoyed this discussion of Google Gemini, let me know if you have any suggestions for what I should write about next!

Happy learning!

--

--

Co-founder at Educative.io. Creating hands-on courses for Software Engineers & Engineering Enablement solutions for dev teams. Previously: Facebook, Microsoft.