Wednesday, May 13, 2026

Good Wednesday, NOLA. May 13th brings some solid practical wins: a tiny 26M model that can do tool calling, a rethink of how AI interacts with your screen, and a reality check on how companies are actually using (and abusing) AI tools. Plus, some intriguing open-source projects for making agents more reliable.

Things People Built

Needle: A 26M Model That Distilled Gemini's Tool Calling

Popular on HN. Take a frontier model's tool-calling capability and fit it into something that runs on any device. This is the kind of lean, practical distillation that matters for real products. If you're building agents and need them to actually call APIs reliably without hitting the API cost wall, this is worth a look.
Hacker News

Reimagining the Mouse Pointer for the AI Era

Google DeepMind's take on a deceptively simple problem: how should an AI model interact with your screen? Instead of just clicking random pixels, their pointer learns to understand context, follow visual flows, and make decisions based on what's actually on screen. It's a small detail that could make AI-powered automation feel less brittle.
Google DeepMind

Statewright: Visual State Machines for Reliable AI Agents

Shared on HN. One of the biggest complaints about AI agents is that they're unpredictable—they wander off and do weird things. Statewright lets you visually define the states and transitions your agent should follow, then enforces them. It's a guardrail tool that actually makes sense.
Hacker News

Hopper: An Agent Interface for Mainframes and COBOL

Show HN. One of the oldest, most important systems in the world still runs on COBOL. Hopper gives AI agents a natural-language interface to mainframes. If this works, it could unlock decades of legacy business logic for modern AI workflows—without forcing companies to rewrite everything.
Hacker News

Voker: Analytics and Observability for AI Agents

Launch HN (YC S24). Running agents in production means you need visibility. Voker gives you dashboards, error tracking, and performance metrics for your deployed agents. It's the kind of boring-but-necessary infrastructure that separates hobby projects from real products.
Hacker News

AI in Practice: The Reality Check

Amazon Employees Are "Tokenmaxxing" Due to AI Pressure

Here's what happens when companies mandate AI adoption without clear incentives: employees game the system by stuffing prompts with tokens (unnecessary words) to hit usage targets, making the metrics meaningless. It's a cautionary tale about top-down AI rollouts. Discussion on HN.
Ars Technica

AI Isn't Paying Off the Way Companies Think—Gartner Study

Companies are investing big in AI but seeing mediocre returns. The gap between hype and actual ROI is growing. Worth reading not as doom, but as a reminder that implementation matters more than the model itself. Discussed on HN.
Fortune / Gartner

Google Says Criminal Hackers Used AI to Find a Major Software Flaw

Not a theoretical threat anymore. Real attackers are using AI to find exploitable bugs in open-source code. This is the flip side of "AI finds vulnerabilities"—adversaries have access to the same tools.
The New York Times

New Capabilities & Platforms

Claude Is Now Available on AWS

If you've been holding off on Claude because you wanted it to live in your AWS environment, the option is now here. This is enterprise infrastructure—important for compliance and latency-sensitive workloads. Discussed on HN.
Anthropic

Thinking Machines Builds an AI That Listens While It Talks

Real-time conversation means the model needs to respond to interruptions, subtle tone shifts, and context changes mid-sentence. Thinking Machines' new approach moves closer to how humans actually talk to each other, not just turn-taking.
TechCrunch

Interesting Reads

If AI Writes Your Code, Why Use Python?

A thought experiment that's worth sitting with: if AI can generate the code, does the language choice matter anymore? Or do we pick based on different criteria now—clarity for AI prompting, ease of debugging generated output, framework ecosystem? Challenging the assumptions we've held for years.
Medium

I Let AI Build a Tool to Help Me Figure Out What Was Waking Me Up

A great "I tried it" story of someone using Claude Code to build a real tool that solved a real problem. No fancy architecture, just pragmatic problem-solving with AI. Exactly the kind of workflow we'll see becoming normal.
martin.sh

Today’s Sources