• T00l_shed@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    24 days ago

    Is it easier to not use? That’s my most important benchmark when it comes to llms

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    23 days ago

    It seems to have regressed vs Gemini 2.5 in some long context comprehension, like asking stuff about papers or stories… Which is basically the only thing I use Gemini for, since open/local models are so good at shorter contexts now.

    This isn’t suprising. For that stuff, Gemini’s peak was somewhere in the 2.0/2.5 previews, but then they deep-fried it to benchmaxx coding and lm-arena.