Sunday, May 10, 2026

Good Sunday, NOLA. May 10th brings some quieter news, which is perfect for catching up on what actually matters: OpenAI's new realtime voice APIs, Google's multimodal file search, and a solid look at how LLMs actually corrupt documents when you delegate work to them. Plus some genuine reflections on what's working in AI teams and product design.

New APIs & Developer Tools

OpenAI releases GPT-Realtime-2, -Translate, and -Whisper voice APIs

Real-time voice just got faster and smarter. OpenAI shipped three new voice APIs: Realtime-2 (ultra-low latency voice conversations), a new Translate mode (speak in any language, get back English), and updated Whisper for transcription. These are live for developers now. If you've been thinking about building voice interfaces, the latency bar just dropped dramatically.
Latent Space

Gemini API File Search is now multimodal

Google expanded Gemini's file search to handle PDFs, images, and videos—not just text. This matters if you're building systems that need to search across mixed-media documents. Multimodal RAG (retrieval-augmented generation, meaning the AI can pull relevant context from your files) is becoming table stakes.
Google AI Blog

The Hard Lessons

LLMs corrupt your documents when you delegate

A new study shows that when you ask LLMs to edit or summarize documents, they subtly change facts—sometimes introducing errors, sometimes just rewording in ways that lose precision. The kicker: humans don't catch it because the text sounds right. If you're using AI for document review or summarization in production, read this. Discussion on HN.
Hacker News

A recent experience with ChatGPT 5.5 Pro

Timothy Gowers (Fields medalist, real mathematician) spent time actually using GPT-5.5 Pro on a problem he knows deeply. His honest reflection: it's genuinely useful for some things, but it hits a wall fast when the problem gets subtle. Worth reading if you want a reality check on where these models actually excel and where they fail. 632 points on HN.
Hacker News

Building with AI: Real-World Patterns

All my clients wanted a carousel, now it's an AI chatbot

A web designer's honest take: every client who came asking for a carousel now wants an AI chatbot instead. She walks through what actually works, what's hype, and what she's learned about integrating AI into real client projects. Practical and grounded.
Hacker News

The internal AI tool that's transforming how Stripe designs products

Owen Williams built Protodash at Stripe—an internal AI tool that lets designers and PMs generate high-fidelity prototypes of Stripe's dashboard UI. The episode breaks down how they're actually using AI in product design at scale, not theory.
How I AI podcast

Context Engineering in 40 Minutes

Ravi Mehta (former Tinder CPO) walks through his 3-layer system for structuring prompts and context: functional, visual, and data layers. Includes a live demo of building something real. If you want to level up how you're actually talking to these models, this is practical.
Behind the Craft podcast

Worth Reading & Listening

Why Agents Make Every Job a Startup

An interesting angle: AI agents haven't saved time the way we thought. Instead, they've made the infinite backlog of work feel immediate and urgent. If everyone on your team now has a capable AI assistant, what happens to prioritization? What gets sacrificed? Worth thinking about if you're deploying agents.
AI Daily Brief podcast

How to Build an AI Native Team

Mike Cannon-Brookes (Atlassian co-founder) on what it actually takes to build teams that work well with AI—not just teams that use AI tools. Leadership, structure, and culture changes that matter.
AI Daily Brief podcast

Voice AI in India is hard. Wispr Flow is betting on it anyway.

Wispr Flow saw acceleration in India after rolling out Hinglish (Hindi + English mix). A real example of how voice AI is breaking into non-English markets—and what works when English-first models don't.
TechCrunch AI

Today’s Sources