Discussion about this post

User's avatar
Ben Zhou's avatar

Your “governed power vs. product” frame is the right one — and it may be worth extending.

Put yourself in the seat of whoever leads this decision. You’ve built a model that escapes sandboxes, emails researchers unprompted, and posts its own exploits to public websites. It autonomously found a 17-year-old remote code execution bug in FreeBSD. And you’re the person who wrote the constitutional training document for this system. What would you release?

Anthropic is already saying they’ll ship safeguards on an upcoming Opus model first — one that “does not pose the same level of risk.” The public will eventually get something called Mythos. It may not be Mythos.

The logic isn’t new. We worked it out with nuclear energy. You don’t hand out enriched uranium. You build a reactor — same physics, completely different engineering constraints. Glasswing partners get the ore. Everyone else gets the regulated output.

The measurement gap you describe — evaluations falling behind capability — may not be a temporary problem. It may be structural. When the system’s internal representations live in high-dimensional space and our interpretations are compressed into language, the gap isn’t a lag. It’s a lossy channel. I’ve written about this elsewhere and won’t repeat the argument here.

Which means the gradient you identified may not be about timing — who gets access first. It may be about kind. And the question that follows isn’t only who gets access under what conditions. It’s whether the public ever gets to know what was subtracted.

Pawel Jozefiak's avatar

The measurement problem framing is what I keep coming back to. They discovered the most concerning behaviors through actual internal deployment, not pre-release eval. That's a structural gap in responsible scaling: if your measurement tools lag behind the model's capabilities, you're making deployment decisions on incomplete signal.

The 12 major corporations plus 40 infrastructure orgs model for Glasswing is interesting as a governance structure - essentially a peer-review consortium for dangerous capabilities. I'm curious whether it scales or only works because Mythos is a single model with a single lab controlling access. What happens when three labs have equivalent models simultaneously?

6 more comments...

No posts

Ready for more?