Kamchatka, Baba Vanga, Lobster Stew, and Basal Ganglia

Forecasting returns to its primal roots

Aug 01, 2025

Japan’s Baba Vanga

From Manifold creator, Nikki:

“Ryo Tatsuki, a former manga artist known for her purportedly accurate disaster predictions, has forecasted a mega-tsunami to occur in July 2025. She describes visions of the sea south of Japan "boiling," which she interprets as signs of an underwater volcanic explosion leading to a tsunami three times larger than the one in 2011.”

Ryo’s predictions were very nearly resolved as accurate, at least for the Manifold market’s operationalization of her forecast:

With a couple days left in the market, an 8.8 magnitude earthquake, the largest in the world since the 2011 Tōhoku earthquake that Ryo referenced, struck the coast of Kamchatka, north of the Japanese island chain. As tsunami alerts blared in Japan, the Aleutian islands, and Hawaii, Manifold traders perhaps thought that Ryo’s forecast would be proved correct. Fortunately for everyone, a large-scale tragedy was avoided, although one indirect casualty due to evacuations did unfortunately occur.

While Ryo’s believability as a forecaster may be drawing to an end, I was intrigued by some of the forecasts of another prognosticator to whom she is frequently compared, the Bulgarian Baba Vanga. Baba Vanga, who lived through both World Wars and the Cold War, gained followers over her long life, and was consulted by a Bulgarian tzar, Boris III, as well as the leader of the Soviet Union, Leonid Brezhnev, and gained employment at the Bulgarian Institute of Suggestology. This institute’s work still has some cachet today in language acquisition circles, surprisingly, in addition to their work on… clairvoyance, with Baba Vanga. As to her predictions, well, I think her track record speaks for itself:

Indeed, even for one of her predictions which was proven false (the last one on the list), the jury is still out. If our current president ends democracy as we know it, the historical record may look back at POTUS 44 as the last president in the Principate era of American history, and POTUS 45 as the first in the Dominate era.

I think we, as prediction market enthusiasts, often forget that modern-day forecasting can be placed into a long historical record of divination practices, stretching back to the I Ching, Etruscan haruspicy, the Oracle of Delphi, the Urim and the Thummim, and the Sibylline Books. In a thousand years, perhaps we’ll look back on our forecasting methods of today as just as whimsical and antiquated.

Or perhaps, in just a few years, if AI progress continues exponentially upward…

Lobster

As I’ve discussed in past weeks, perhaps the most anticipated model release ever is nearly upon us. The capabilities of OpenAI’s long-heralded frontier model will tell forecasters a lot about whether progress is speeding up or slowing down, and how OpenAI is anticipating the use cases of its technology.

Even the precise date of its release (which has attracted several million dollars of volume on real money markets) can be traded on Manifold. It looks likely to be released in the first two weeks of August, so… any day now.

METR (Model Evaluation & Threat Research) has been doing some outstanding work on benchmarking the increasing capability of AI models to perform longer and more complex tasks. For many in the AI space, this benchmark serves as a good proxy for capabilities as other benchmarks become rapidly saturated. When an LLM is able to, with high % of success, do tasks that routinely take humans days/weeks/months, this may have serious ramifications for the future of labor.

Screenshot from the website of the fabulous team at METR: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

So, this is to say it’s pretty important to know whether the task lengths (as long as they continue to be a good proxy) that AI can achieve are doubling every 7 months, as METR has shown has been the case over the past couple years. Alternatively, if performance is speeding up or plateauing, that would tell us a lot as well. The spread of forecasts on GPT-5’s METR benchmark is surprisingly wide, with a score of about 1.5-2 hrs indicating that performance may be plateauing much more rapidly than we thought, and a score of 3 hrs or greater indicating that AI progress is continuing unabated. Traders estimate a modal outcome of between 2.5 and 3 hrs, more or less in line with METR’s trend line over the last year.

It’s possible you can get a sneak peak at the model before its broad release on lmarena, where traders think it’s likely OpenAI is piloting the model with codename “Lobster”.

I hope this is a reference to one of my favorite films, which, much like GPT-5, might involve a future of questionable utopian nature and concerns about non-human consciousness.

Crowdsourcing Radiology (Not Official Medical Advice)

After a Manifold market creator, Aella, crowdsourced forecasts on her radiological results, traders were able to quickly converge on an expectation that her basal ganglia appear fortunately unlikely to exhibit calcification, once she gets official confirmation from her radiologist.

The idea of crowdsourcing medical insight isn’t new, (and indeed, you should look up the etymology of the medical term, “prognosis”) but this is the first market I’ve seen where traders were able to examine the CT results directly and bet accordingly. Radiologists worry about AI taking their jobs, but perhaps they should also be worried about subsidized prediction market mechanisms outcompeting their own prognostication.

Roundup

A consensus is emerging that Tesla’s robotaxis may not meet the moment…

…conflict between Trump and Jerome Powell continues to simmer (and Trump appears to have just fired the chair of the Bureau of Labor Statistics)…

… and American Eagle is unlikely to apologize for its Sydney Sweeney “genes/jeans” ad.

Happy Forecasting!

-Above the Fold

Above The Fold

Discussion about this post

Ready for more?