Best of your X follows: June 2

Today's window (June 1, 18:00 → June 2, 18:00 UTC) brought five posts worth your time: Anthropic expanding its sovereign-AI pilot to 150 orgs across 15 countries, OpenAI's models going GA on Amazon Bedrock, Ethan Mollick on what good agentic AI interruption actually looks like, François Chollet posting the new ARC-AGI-3 leaderboard with Opus 4.8 at 1.5%, and Paul Graham noticing a telltale pattern in AI-generated tweet prose.

Enterprise and platform moves

OpenAI goes GA on Amazon Bedrock. OpenAI frontier models and Codex are now generally available through Amazon Bedrock, giving enterprise teams a way to use OpenAI APIs inside the security, compliance, and governance stack they already run on AWS. Greg Brockman called it out on X the same evening. 1

Loading content card…

The AWS channel matters here: many enterprise security teams block direct API calls to api.openai.com but approve traffic through Bedrock because it sits inside their existing IAM, VPC, and audit-log setup. Making OpenAI callable through Bedrock removes that procurement friction. OpenAI noted future Bedrock availability for Daybreak, its cybersecurity capability. 2

Anthropic's Project Glasswing expands to 150 orgs. Earlier today Anthropic announced it has extended Claude Mythos Preview to approximately 150 additional organizations in more than 15 countries. Mythos is the variant of Claude built for handling information that must stay within national or organizational perimeters — the focus is on sovereignty over deployment rather than benchmark scores. 3

Loading content card…

The expansion is notable because Anthropic is simultaneously preparing an IPO (S-1 filed confidentially Monday) while building out a completely separate distribution path for high-trust government and enterprise deployments. The two moves tell different market stories: the S-1 is for Wall Street; Glasswing is for defense ministers.

AI research and evaluation

Opus 4.8 is SOTA on ARC-AGI-3, at 1.5%. François Chollet shared ARC Prize's update that Claude Opus 4.8 is the new benchmark leader on ARC-AGI-3 — the third generation of the test designed to be extremely hard for systems that can't generalize. The score is 1.5% at approximately $10,000 compute cost per evaluation run. 4

Loading content card…

One-and-a-half percent sounds like a failing grade, but it's the highest any model has reached on ARC-AGI-3, and Chollet has been consistent that the test is designed so that current scaling approaches shouldn't work well. The cost figure is the other signal: $10K per eval run means the test doesn't reward brute-force sampling at scale, yet something about Opus 4.8's judgment layer is still extracting marginal improvement.

How we work with AI agents

Ethan Mollick: the right agent behavior isn't full automation, it's asking good questions. Mollick posted his clearest statement yet on what distinguishes useful agentic AI from noisy agentic AI: the difference is whether the system knows when to interrupt you with a meaningful question — not because it's stuck, but because your taste matters, or because you'd find the decision interesting. 5

Loading content card…

The framing cuts against both the "it just does everything quietly" vision of agents and the "it constantly asks permission" failure mode. Good interruption is selective and purposeful. A fully automated /goal pipeline might finish the task, but you never got to weigh in on the part that mattered. That's not a productivity win — it's delegation to a system with no taste.

What AI is doing to communication

Paul Graham on AI-generated tweet patterns. Graham posted a question today that landed squarely in the meta-layer of AI content: why do AI-generated replies on X so reliably frame their subject as an opposition between two things? 6

Loading content card…

The observation resonates because the pattern is real and spreading. Scroll far enough through any popular thread and you'll find a reply structured as "It's not X, it's Y" or "Some people think A, but the real answer is B." The opposition frame is a default that AI writing tools seem to default to — possibly because contrast is easy to generate and superficially sounds substantive. Graham wondered if there's a popular bot behind it, or if it's just what models produce when asked to write a tweet with no other constraints.

Best of your X follows: June 2

Enterprise and platform moves

AI research and evaluation

How we work with AI agents

What AI is doing to communication

References