Manifold-5

An underwhelming model release for OpenAI, a fiery redistricting battle for the House, and some WNBA disruptions

Aug 08, 2025

AI Progress May Be Starting to Slow

You can debate benchmark trend lines, public sentiment, and corporate strategy, but Manifold users found the release of GPT-5 to be a slight negative update on AI capabilities progress. Over the last 6 months, a market on whether a poll of Manifold users, a month after the release of GPT-5, would find that the model exceeded expectations had been hovering between 30% and 50%, closing at 45% on the day of the release.

Now, after the dust has settled, with that marked closed to trading pending the poll, a meta-market has bottomed out, with traders quite confident that the market will resolve in the negative once the poll is conducted in a month’s time.

Live trading on the release showed pretty consistently that GPT-5 did not meet the expectations of traders and failed to live up to the hype. While people on twitter litigated the graph snafus from the livestream, even once properly plotted, the results tended to focus on incremental improvements to coding ability and saturated exam benchmarks. Expectations for GPT-5 to top the chatbot leaderboard fell sharply:

While some have framed GPT-5’s 2 hr 15 min result on METR’s core task length benchmark (50% success rate) as continued evidence of a fast doubling time, I think this claim invites skepticism.

This METR result was in the lower of the two modal brackets (2-2.5 hrs and 2.5-3 hrs) on the Manifold market forecasting the value, indicating that the result lags behind forecasters’ median expectations for the model!
Many expected OpenAI’s new release to be a leading indicator, with subsequent data points from other frontier labs lagging behind. This puts a drag on the trend line.
At an 80% success rate threshold, it’s even more clear that GPT-5 underperformed the previous trend in doubling time.
If you squint at the graph below (and use your log-scale-removing mental powers), the curve is starting to look ever so slightly more like a sigmoid than an exponential, if I may be so brave to suggest.

Models' 50% Time Horizon chart — Screenshot from METR’s report on GPT-5: https://metr.github.io/autonomy-evals-guide/gpt-5-report/

Peter Wildeford points out that GPT-5 is best viewed as strong improvement in usability for “everyday users,” who comprise the vast majority of OpenAI’s customers. I think this framing is ultimately the best way of viewing the release, however I’m not sure I agree that OpenAI would view this as a success. For their half a trillion dollar valuation to pan out, they need to be on track to automate large sectors of the economy within the next decade. A plateauing in capabilities gains would put many of these use-cases farther out than they might like.

Wildeford also highlights how model scaling now exists on four dimensions rather than one: training more (the first “scaling law” for LLMs), post-training reinforcement learning, inference-time compute, and scaffolding / tool-use integration. GPT-5 might be already starting to scratch the bottom of the barrel in all four. With compute, data centers growth, energy availability, and public sentiment on AI growing much more slowly than a “7-month doubling time” could possibly accommodate, the first axis has been turned down by frontier labs for the time being, and GPT-5, if anything, uses features like model routing to minimize the need for burning through inference-time tokens. Of course, new scaling laws may be discovered, but Sam Altman is no Gordon Moore, and I think dependence on forecasting continued gains based on speculative 2-year-long logarithmic trend lines is highly uncertain.

Ah, well, it’s never too early to start betting on GPT-6, which Manifold traders expect to release in early 2027.

LLMs Play Chess

As a consolation prize, OpenAI scored a victory in the first-of-its-kind LLM Chess tournament on Kaggle. Their o3 model didn’t lose a single game en route to a crushing (upset!) victory over Grok in the finals.

Kaggle Game Arena Chess Exhibition Tournament 2025 bracket finals — Screenshot of the tournament bracket from chess.com, who I believe got it in turn from Kaggle’s website.

Manifold traders, perhaps reacting to the praise from American grandmaster and commentator Hikaru Nakamura, had Grok as the favorite going into the finals, only for the model to lose 0-4 to OpenAI’s to-be-deprecated reasoning model.

For this tournament, the models were not permitted to utilize actual chess engines, and weren’t fine-tuned for chess performance, which would make comparing between the models essentially pointless. Moreover, to avoid boring conclusions from hallucinations, the models were given a couple mulligans if they provided illegal moves, although this wasn’t a factor in the later rounds after the outmatched Claude Opus, Gemini 2.5 Flash, Kimi K2, and DeepSeek R1 were all swept in the first round.

House Midterms Hijinks

Meanwhile, some things have been happening back in the real world. The President, as he did in 2021, has directed the Commerce Department to conduct a new census, excluding illegal immigrants. The previous attempt was shot down by the courts, but in a very different judicial and executive environment, the initiative may have more teeth.

Traders remain skeptical this will occur, but the implicit goal of such a census would be to gain a couple seats in the midterms, although it’s unclear whether such a census would even lead to those gains. Republicans are right to be fearful, they remain heavy underdogs to retain control of the House in 2026:

Republicans are also attempting to kick off redistricting efforts to eke out a few more House seats in the midterms, which has raised controversy in Texas.

The odds of Texas’ legislature managing to pass a redistricting bill during its current legislative session have dropped dramatically (although the markets still expect them to get this done before the midterms). If they do manage to redistrict, traders expect it to pay dividends.

The reason for the declining odds in the first market is because Texas state legislators have fled the state to prevent a quorum. While the governor has threatened to expel the legislators, and the FBI have been solicited to track them down and arrest them to bring them back to the Texas state house, Manifold traders think both scenarios are unlikely.

You can also bet on whether California’s governor Gavin Newsom will be successful in his own redistricting effort:

WNBA Hijinks

While I think the ethics of offering real-money markets that directly incentivize disruptive, antisocial behavior like throwing dildos onto the court at WNBA games are fraught, I have zero qualms about play-money markets offering trading on events like that, and indeed these kinds of edgy markets are best left to platforms like Manifold. In light of a spate of phallus-flinging at WNBA games which appears to be tied to some crypto marketing scam, here you go:

Indeed, there may even be a chance the trend persists all the way to the NBA finals nearly a year from now:

Happy Forecasting!

-Above the Fold

Paulin

Aug 9

"If you squint at the graph below (and use your log-scale-removing mental powers), the curve is starting to look ever so slightly more like a sigmoid than an exponential, if I may be so brave to suggest."

Admittedly my powers are subpar in this area, but I don't get it

If progress is slowing down, wouldn't the GPT-5 datapoint be below the trend line?

2 replies

2 more comments...

Above The Fold

Discussion about this post

Ready for more?