Sunday, April 26, 2026

Good Sunday, NOLA. Today's vibe: the model wars cool off a bit and we get some legitimately interesting applications. An amateur solved a 60-year-old math problem with ChatGPT, Anthropic ran an agent-on-agent marketplace, and Cohere is acquiring Aleph Alpha. Plus a deeper look at what agents are actually good for.

Real-World Wins

Amateur mathematician solves Erdős problem with ChatGPT

A non-professional mathematician solved an open problem in combinatorics by using ChatGPT to explore ideas and sanity-check proofs. This isn't about AI doing the math — it's about a human using an AI as a thinking partner to tackle something that stumped professionals for decades. It's the kind of story that actually shows what these models are useful for.
Scientific American (via Hacker News)

Anthropic ran an agent marketplace: real agents, real deals, real money

Anthropic created a classified marketplace and let AI agents act as both buyers and sellers, striking actual deals for real goods with real money. The experiment worked — agents negotiated, made trades, and handled disputes. It's a proof-of-concept that agents can handle transactional complexity without human supervision, and it hints at what autonomous systems might look like in a few years.
TechCrunch

GPT-5.5 prompting guide: OpenAI shares the real tips

Now that GPT-5.5 is live in the API, OpenAI published a guide with practical prompting tricks specific to the new model. If you're testing it or thinking about migrating, this is worth a quick read — includes things like how to get better reasoning output and when to push the model toward specific formats.
Simon Willison

Big Moves & Consolidation

Cohere acquires Aleph Alpha, backed by Germany's Schwarz Group

Canadian startup Cohere is taking over Germany-based Aleph Alpha with backing from Lidl's parent company. The merger signals consolidation in the European AI space and gives Cohere stronger footing in regulated markets. Aleph Alpha was known for privacy-first AI; the deal suggests Cohere is doubling down on enterprise trust.
TechCrunch

DeepSeek V4: Two models, two sizes, benchmarks start shifting

DeepSeek shipped V4 Pro (1.6T parameters) and V4 Flash (284B parameters), both runnable on Huawei Ascend chips. The prodigal lab returns — DeepSeek is no longer the benchmarks leader after their big jump yesterday, but these models are optimized for inference efficiency and geopolitical independence from US chip exports. Worth understanding if you care about the long game.
LMSYS / Latent Space

Tools & Building Blocks

Lambda Calculus Benchmark: Testing AI on pure formal logic

A new benchmark designed to test how well models handle formal logic and lambda calculus. It's a departure from natural-language and coding benchmarks — useful if you care about whether your model can handle symbolic reasoning or automated theorem proving.
Hacker News

Browser Harness: Give your LLM full browser control

Open-source tool that lets an LLM take full control of a browser — click, type, navigate, read the DOM, interact with any website. If you're building agents that need to automate web tasks (data scraping, form filling, testing), this is worth integrating.
Hacker News (from previous brief)

Atomic: Personal knowledge base with AI augmentation, fully local

A local-first personal wiki that lets you store notes and facts, then query them with AI. Think of it as a searchable memory layer you control — no cloud, no vendor lock-in. Good building block if you're prototyping personal knowledge systems.
Hacker News (from previous brief)

Thinking Pieces

What's actually missing from the 'agentic' story

Mark Nottingham argues that the hype around agents skips over a critical question: who represents the user in agent-to-agent interactions? If your agent negotiates with another agent, who's looking out for you? It's a thoughtful take on governance and accountability that the industry is glossing over.
Hacker News

Why the public hates the AI industry (and what it means)

The AI industry is discovering that shipping hype and ignoring public sentiment doesn't build long-term trust. The piece covers the gap between what AI companies are telling investors and what regular people actually want. Worth reading if you're thinking about how AI products will actually be adopted in the real world.
Hacker News

Today’s Sources