AI Friday — Sunday, April 26, 2026

Real-World Wins

Amateur mathematician solves Erdős problem with ChatGPT

A non-professional mathematician solved an open problem in combinatorics by using ChatGPT to explore ideas and sanity-check proofs. This isn't about AI doing the math — it's about a human using an AI as a thinking partner to tackle something that stumped professionals for decades. It's the kind of story that actually shows what these models are useful for.

Scientific American (via Hacker News)

Anthropic ran an agent marketplace: real agents, real deals, real money

Anthropic created a classified marketplace and let AI agents act as both buyers and sellers, striking actual deals for real goods with real money. The experiment worked — agents negotiated, made trades, and handled disputes. It's a proof-of-concept that agents can handle transactional complexity without human supervision, and it hints at what autonomous systems might look like in a few years.

TechCrunch

GPT-5.5 prompting guide: OpenAI shares the real tips

Now that GPT-5.5 is live in the API, OpenAI published a guide with practical prompting tricks specific to the new model. If you're testing it or thinking about migrating, this is worth a quick read — includes things like how to get better reasoning output and when to push the model toward specific formats.

Simon Willison

Big Moves & Consolidation

Cohere acquires Aleph Alpha, backed by Germany's Schwarz Group

Canadian startup Cohere is taking over Germany-based Aleph Alpha with backing from Lidl's parent company. The merger signals consolidation in the European AI space and gives Cohere stronger footing in regulated markets. Aleph Alpha was known for privacy-first AI; the deal suggests Cohere is doubling down on enterprise trust.

TechCrunch

DeepSeek V4: Two models, two sizes, benchmarks start shifting

DeepSeek shipped V4 Pro (1.6T parameters) and V4 Flash (284B parameters), both runnable on Huawei Ascend chips. The prodigal lab returns — DeepSeek is no longer the benchmarks leader after their big jump yesterday, but these models are optimized for inference efficiency and geopolitical independence from US chip exports. Worth understanding if you care about the long game.

LMSYS / Latent Space

Tools & Building Blocks

Lambda Calculus Benchmark: Testing AI on pure formal logic

A new benchmark designed to test how well models handle formal logic and lambda calculus. It's a departure from natural-language and coding benchmarks — useful if you care about whether your model can handle symbolic reasoning or automated theorem proving.

Hacker News

Browser Harness: Give your LLM full browser control

Open-source tool that lets an LLM take full control of a browser — click, type, navigate, read the DOM, interact with any website. If you're building agents that need to automate web tasks (data scraping, form filling, testing), this is worth integrating.

Hacker News (from previous brief)

Atomic: Personal knowledge base with AI augmentation, fully local

A local-first personal wiki that lets you store notes and facts, then query them with AI. Think of it as a searchable memory layer you control — no cloud, no vendor lock-in. Good building block if you're prototyping personal knowledge systems.

Hacker News (from previous brief)

Thinking Pieces

What's actually missing from the 'agentic' story

Mark Nottingham argues that the hype around agents skips over a critical question: who represents the user in agent-to-agent interactions? If your agent negotiates with another agent, who's looking out for you? It's a thoughtful take on governance and accountability that the industry is glossing over.

Hacker News

Why the public hates the AI industry (and what it means)

The AI industry is discovering that shipping hype and ignoring public sentiment doesn't build long-term trust. The piece covers the gap between what AI companies are telling investors and what regular people actually want. Worth reading if you're thinking about how AI products will actually be adopted in the real world.

Hacker News

Also

OpenAI's GPT-5.5 Bio Bug Bounty — OpenAI is crowdsourcing biology safety testing for GPT-5.5 — if you have biotech expertise, they're paying for vulnerabilities.
Maine's governor vetoes data center moratorium — L.D. 307 would have paused new data centers until 2027. The veto signals states are moving toward AI-friendly policy despite public pushback.
Design Slop: What Show HN reveals about AI-generated UX — Analysis of Show HN submissions shows AI is generating a lot of visually plausible but fundamentally broken interfaces. Worth reading if you're shipping AI-assisted design.
How LLMs work: Interactive visual guide — A clean, interactive walkthrough of how transformers actually process text. Good for explaining to non-technical folks or refreshing your own intuition.
Different language models learn similar number representations — Research showing that despite different architectures and training, models converge on similar internal representations for numbers. Interesting for mechanistic understanding.
World Press Photo: What counts as a photo now? — A prestigious photo contest had to redefine its rules after AI-generated images entered. The boundary between photography and digital art is officially blurry.
Fastmail's MCP server for email workflows — Fastmail released an MCP (model context protocol) server, letting agents read and manage your email. Early signal that email is becoming an agent-accessible interface.
Deezer: 44% of daily uploads are AI-generated — The music platform is drowning in AI-generated submissions. Raises real questions about curation, copyright, and how we filter signal from noise at scale.
How Affirm retooled its entire engineering org for agents in one week — Real engineering org case study on how a major company is restructuring around agentic development. Very practical.
CC-Canary: Track Claude Code quality regressions — Open-source tool to monitor Claude Code quality over time. Response to concerns that the model's output quality is declining; good for accountability.
Stash: Open-source memory layer for agents — A persistent memory system for any AI agent — let agents store and retrieve facts across sessions. Solves the 'agents need continuity' problem.

Real-World Wins

Big Moves & Consolidation

Tools & Building Blocks

Thinking Pieces

Also

Today’s Sources