A team at DeepMind publishes a web-agent that scores 78% on WebArena — up from last year's 52% leader. Paper breaks down the changes in exploration policy.

51 comments Discuss on r/news

418

398

arXiv·106d agoResearch

Research: small models specialize better when paired with a big-model coach

Paper shows 3B student models can reach 90% of GPT-4 quality on narrow domains when coached by a larger model during training.

47 comments Discuss on r/news

398

287

Mistral·95d agoResearch

Mistral publishes a 7B model competitive with GPT-4-class systems

The new open-weight model outperforms prior 7B leaders on MATH and GSM8K, and nearly matches GPT-4-mini on long-context reasoning.

33 comments Discuss on r/news

287

Trending today

Agents+42
LLMs+38
Funding+24
Research+18
Tools+14
Policy+7

Refreshed live · ranked by reader engagement in the last 24h.

Latest issue

Issue #12 — The 19x multiple problem

Why AI business multiples compressed 20% this month, plus three listings worth your time.

Read archive