Forecasting Claude
Above the Fold is "Playing it Safe"
Unlike Gemini, which aims to be a generalist model that can slot into Google’s all-Internet, all-world ecosystem, Claude aims to be the best software engineer it can be.
Already laying a claim to the frontier placement that Google pushed forward just last week, Anthropic’s Claude Opus 4.5 is probably now the best model for at least a couple months in coding and computer use applications. Though Arena-style leaderboards are not always the most reliable benchmarks, they’re the ones most tracked on prediction market platforms, and traders think Anthropic will retain their hold on the computer agent leaderboard (as they had already been forecast to do since October when it became clear that Opus 4.5 would likely drop by the end of the year), as well as the website development leaderboard.
As we wait for Gemini 3’s METR score, we now are also waiting for Claude Opus 4.5’s. And just as Gemini’s was forecast to beat GPT 5’s, now Claude is forecast to outpace Gemini, with some forecasters informing their betting activity by attempting to reverse engineer sections of the METR time horizon benchmark related to software engineering tasks to compare the two models.
Claude has also renewed its quest to become a Pokémon Master, with a 25% chance of being the first to claim that benchmark! Well… not entirely the first. Gemini 2.5 Pro performed the feat earlier in the year, but was criticized for making use of extensive “agent harnesses” designed specifically for the challenge. Claude’s approach is a little less Machiavellian, perhaps. These kinds of benchmarks in a contained environment, when approached in good faith by the AI labs, can be a good proxy for broad agentic capabilities. You can also once again follow along on Twitch and bet on specific milestones in Claude’s run.
Anthropic is generally known as a more safety-minded AI lab, and there are now talks that Anthropic employees may be pushing for the company to take a leading role in political advocacy through forming two Super PACs targeted at influencing both Democrat and Republican lawmakers, with those goals in mind.
The market on whether AI will be as big of a political issue as abortion by 2028 has been ticking up slowly over the year, and might have been affected by the news from yesterday.
As one commenter notes, these kinds of large, financial commitments around AI issues in elections will likely only grow:
Anthropic appears likely to keep their hold on the leaderboard at “AI Lab Watch” as well. Perhaps some recognition for disrupting Chinese AI espionage attempts and releasing 100+ pages of safety evaluations for their latest model.
Happy Forecasting!
-Above the Fold









