AI Friday — Sunday, April 12, 2026

The Real Talk: Benchmarks, Costs & Trust

How We Broke Top AI Agent Benchmarks: And What Comes Next

Berkeley researchers just demonstrated a serious problem: the benchmarks everyone uses to measure AI agent capabilities can be gamed. The team found ways to artificially boost scores on standard tests like SWE-Bench without actually improving real-world performance. This is important because it means the numbers companies are using to compare models may not tell you what's actually useful. If you're evaluating which AI tool to adopt, this should make you skeptical of raw benchmark claims.

Hacker News

Anthropic Silently Cut Claude's Prompt Caching from 1 Hour to 5 Minutes

Developers discovered on GitHub that Anthropic reduced how long it caches your prompts — from one hour down to just five minutes — without announcing it. This matters because caching keeps costs down for people building on Claude. The change went live March 6th, and it's a quiet way to cut your savings window without making noise about it. If you're using Claude Code or building with cached prompts, your economics just shifted.

Hacker News

Sam Altman Responds to 'Incendiary' New Yorker Profile After Home Attack

Altman published a blog post addressing both an apparent attack on his home and a detailed New Yorker profile raising questions about his trustworthiness. He's pushing back hard on the reporting, claiming it misrepresents his record. This is one of those moments where the industry's leadership is under real scrutiny — and the public narrative is splintering. Worth reading both the original reporting and his response to make up your own mind.

TechCrunch

Industry Moves & Big Hires

Cirrus Labs Joins OpenAI

Cirrus Labs, a team working on infrastructure and systems, is now part of OpenAI. The acquisition signals where OpenAI is placing bets — they're doubling down on engineering talent for the backend plumbing that makes models work at scale. Not a flashy acquisition, but these infrastructure teams are where the real work happens when you're running trillion-parameter models.

Hacker News

Meta's Top AI Executives in Line for Nearly $1B in Bonuses

Meta is dangling massive bonuses — nearly $1 billion per exec — if its AI leadership hits their targets. This is a bet-the-company move: they're signaling that AI is their future and they're willing to pay like it. For context, this level of compensation is typically reserved for outcomes that could genuinely reshape the business. Meta's clearly serious about catching up in the AI race.

Hacker News

Worth a Listen

Why Enterprise AI Has a Leadership Problem

New studies from A16Z, KPMG, Writer, and WalkMe paint a picture of enterprise AI that's simultaneously accelerating and breaking down. Over 50% of companies are deploying agentic AI, but adoption is quietly stalling because leadership doesn't understand what to do with it. This is a must-listen if you're building AI products for enterprise customers — it explains why your sales cycle is getting weird.

AI Daily Brief Podcast

Everyone Hates AI. Now What?

AI for Humans digs into the backlash: Florida suing OpenAI, datacenter protests spreading, and the Claude Mythos preview creating chaos. If you're building AI products, this episode explains why the public mood is souring and what it means for your roadmap. Short and punchy.

AI for Humans Podcast

Deep Dives & Interesting Reads

Why Do We Tell Ourselves Scary Stories About AI?

Quanta Magazine explores the psychology behind AI doom narratives — why everyone's convinced the technology will destroy everything, even when the evidence doesn't match the hype. This is a thoughtful piece that helps you understand the cultural moment we're in. If you're trying to have rational conversations about AI with non-technical people, this is a great reference.

Quanta Magazine

Your Next Hire Costs $0/yr and Never Misses a Meeting

A thought-provoking piece on what it looks like when AI agents start handling actual work — scheduling, decisions, routine tasks. The headline is cheeky, but the real question underneath is worth thinking about: what happens to organizational culture when bots are doing the work humans used to do? Not doom-saying, just exploring the shape of the future.

There's An AI For That

Your Baby Deer Plushie Told Me Mitski's Dad Was a CIA Operative

The Verge's account of accidentally getting an AI companion to generate wild conspiracy theories is both hilarious and sobering. The piece walks through how an innocent chatbot feature can veer into total confabulation — making up facts with complete confidence. If you're building AI products, this is a cautionary tale about the difference between "sounds smart" and "actually correct."

The Verge

Also

AI Models Are Terrible at Betting on Soccer — Especially Grok — Practical reality check: systems from Google, OpenAI, Anthropic, and xAI all flopped on Premier League predictions.
How Iran Out-Shitposted the White House — A wild look at how state actors are using AI-generated propaganda in real-time geopolitics.
The Illustration for the New Yorker's Altman Profile Is a Jump Scare — The debate over AI art in journalism just got specific: the magazine used a generated image to illustrate a story about trust in AI.
SQLite 3.53.0 Released — Big SQLite release with lots of user-facing improvements. If you're building with SQL, this is worth a skim.
SQLite Query Result Formatter Demo — Interactive playground for trying out SQLite's rendering options for query results. Nice for prototyping.
White-Collar Workers Quietly Rebelling Against AI Adoption Mandates — 80% of office workers are pushing back on forced AI use — a real sign of adoption friction in the enterprise.
Intel Signs On to Musk's Terafab Chip Factory — Big infrastructure play: Intel and Musk are teaming up on chip manufacturing. Classic "I need chips, you need customers" move.
Anthropic Completes Tender Offer, But Employees Hold Onto Shares — Anthropic's secondary share offering wrapped up, but insiders didn't bail — suggesting internal confidence is holding.
Workday CTO Joins Anthropic Amid Push to Build HR Apps — Anthropic is poaching enterprise expertise to build HR-specific AI products. Tells you where they see the next market.
AI Adoption Survey 2026: The Invisible Crisis in Enterprise AI — New survey data paints a grim picture: companies are deploying AI but struggling to measure ROI or manage change.

The Real Talk: Benchmarks, Costs & Trust

Industry Moves & Big Hires

Worth a Listen

Deep Dives & Interesting Reads

Also

Today’s Sources