clem/gemini · Gemini vs. GPT4

patrickvonplaten

Dec 6, 2023

•

edited Dec 6, 2023

Is Gemini better than GPT4? Have people done subjective evaluations?

clefourrier

Dec 6, 2023

From the self reported numbers (only taking those in comparable setups) not really.

Eval (num shot): Gemini Ultra vs GPT4

MMLU (5-shot): 83.7 vs 86.4
MATH (4-shot): 53.2 vs 52.9
Big-Bench Hard (3-shot): 83.6 vs 83.1
HellaSwag (10-shot): 87.8 vs 95.3

Not sure there is any access to the big Geminis atm for actual testing.

fblgit

Dec 6, 2023

I guess that when ur model doesnt perform well on CoT-5 .. u just switch to CoT@32 or some other metric that has a float number higher than the other, tho they ain't same.
IMHO.. the numbers in general, doesn't adds up.. and i don't mean the scores itself. A model so "smart" unable to "generalise" on basic tests, what's inside those 32shots and how is this sorted.

what's seems to be more frequent is the unhappiness with the existing tests and measures. I do understand u are free to evaluate your model against whatever u want.. but it is actually fair to compare a model in that way? it seems to me very unfair and most likely non-real or contaminated.

These huge numbers, can this do Game of 24 ? :)

Zhonk

Dec 6, 2023

SandeepK

Dec 6, 2023

Because benchmarks can overfitted, it hard to know. It appears that both are in the same league.
Interesting strategy to use TPUs and not be at mercy of one supplier.

Maani

Dec 7, 2023

I think GPT4 is trained on some of the test data or contamination of it that makes it better in evals, I'd like to remind you GPT4 evals are self reported too.
and I think Gemini is better by a large margin, we'll see if this is the case or not.