Yet, the research paper disclosed by Google indicates that the Ultra version outperformed both GPT-4 and GPT -3.5. Intriguingly, a more detailed examination unveils a crucial technical aspect.
According to Reddy’s tweet, Google has implemented COT@32 instead of 5-shot learning to enhance Gemini’s performance.
Google’s Duplex Demo: History of Faking It
Google Gemini AI Video Analysis
1. Misleading Video Content
The video titled “Hands-on with Gemini: Interacting with Multimodal AI” raises concerns about the accuracy of Gemini’s capabilities. Key points include:
- Lack of Disclaimers: The video attempts to mislead the audience by not providing disclaimers on how inputs are generated.
- Edited Demo: Google admitted to staging parts of the viral duck video, showcasing Gemini’s capabilities. The demo involved using still images and text prompts, rather than real-time interactions.
Pencil maker DOMS Industries IPO GMP Today- to open on Dec 13
2. Response from Google
Oriol Vinyals, VP of Research & Deep Learning Lead at Google DeepMind, addressed the concerns in a post on X:
Digging deeper into the MMLU Gemini Beat – Gemini doesn't really Beat GPT-4 On This Key Benchmark.
The Gemini MMLU beat is specifically at CoT@32. GPT-4 still beats Gemini for the standard 5-shot – 86.4% vs. 83.7%
5-shot is the standard way to evaluate this benchmark. You… pic.twitter.com/2OIzF8tL1a
— Bindu Reddy (@bindureddy) December 6, 2023
- Gemini Usage: The video explains how Gemini was used to create the demo, with sequences of different modalities (image and text).
- Real Prompts and Outputs: Vinyals clarified that user prompts and outputs in the video are real but shortened for brevity. The purpose was to inspire developers.
Google Gemini vs. GPT-4 Controversy
1. Gemini’s Benchmark Performance
Google introduced Gemini in three sizes: Ultra, Pro, and Nano. Controversies emerged regarding Ultra’s performance:
- Overshadowing GPT-4: Ultra was claimed to be more powerful than GPT-4 in various metrics.
- Contradictory Observations: Bindu Reddy, CEO of AbacusAI, questioned Gemini’s performance, particularly in the Massive Multitask Language Understanding (MMLU) benchmark.
2. MMLU Benchmark Analysis
- Dubious MMLU Beat: Reddy’s post on X discussed how Gemini did not truly outperform GPT-4 in the MMLU benchmark.
- Contradictory Claims: Despite Google’s assertions during the launch, concerns were raised about the accuracy of Gemini’s benchmark scores.
Gemini MMLU results
TL;DR: Gemini does not really outperform GPT4 on MMLU, and it will likely show significantly in the product. However, this does not necessarily impact the other metrics, like HumanEval for code etc.
There's been a lot of discussions around the coherence… pic.twitter.com/yZj3yRT3y6
— Hadi Azzouni (@hadiazouni) December 8, 2023
3. Market Impact
- Share Price Decline: Reports of faked AI demos led to a decline in Google’s share price.
- Growing Skepticism: The world’s fascination with Gemini AI was met with skepticism as controversies unfolded.
This analysis brings attention to the controversy surrounding Google’s Gemini AI, from misleading video content to concerns about benchmark performance.