@bcherny updates

@nrehiew_ MRCR is a bad eval, we use GraphWalks now. More info here: x.com/bcherny/status… — 𝕏

@bcherny
@eliebakouch 👋 We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.

Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.

MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.

1 view18:37

@bcherny updates

@nrehiew_ MRCR is a bad eval, we use GraphWalks now. More info here: x.com/bcherny/status… — 𝕏

@bcherny
@eliebakouch 👋 We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.

Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.

MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.

1 view18:37

@bcherny updates

@mattpocockuk
Quick PSA about Opus 4.7:

Anthropic have raised the default effort of Claude Code from medium to xhigh.

This will likely burn more tokens by default.

Switch off the defaults and experiment to see what effort level suits your work best.

We've raised rate limits for all subscribers to make up for the increased token usage. Enjoy! — 𝕏

1 view18:37

@bcherny updates

🧵Thread · Next

Opus 4.7 uses more thinking tokens, so we've increased rate limits for all subscribers to make up for it. Enjoy! — 𝕏

1 viewedited 18:37

@bcherny updates

Root

@patnrv
@bcherny Temporary or permenant?

No plans to change it — 𝕏

1 view18:38

@bcherny updates

Root

@DurhamVSmith
@bcherny Thanks! Can you also fix that opus 4.7 is not working with Claude code via Aws bedrock

Do you have the model provisioned for your Bedrock account? — 𝕏

1 view18:38

@bcherny updates

Root

@arturogarrido
@bcherny Permanently? Or just for two weeks?

No time limit — 𝕏

1 view18:38

@bcherny updates

🧵Thread · Next

Dogfooding Opus 4.7 the last few weeks, I've been feeling incredibly productive. Sharing a few tips to get more out of 4.7 🧵 — 𝕏

1 viewedited 19:38

@bcherny updates

🧵Thread · Previous

1/ Auto mode = no more permission prompts

Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark.

In the past, you either had to babysit the model while it did these sorts of long tasks, our use --dangerously-skip-permissions.

We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it's safe, it's auto-approved.

This means no more babysitting while the model runs. More than that, it means you can run more Claudes in parallel. Once a Claude is cooking, you can switch focus to the next Claude.

Auto mode is now available for Opus 4.7 for Max, Teams, and Enterprise users. Shift-tab to enter auto mode in the CLI, or choose it in the dropdown in Desktop or VSCode.

1 view19:38

@bcherny updates

🧵Thread · Previous · Next

2/ The new /fewer-permission-prompts skill

We've also released a new /fewer-permission-prompts skill. It scans through your session history to find common bash and MCP commands that are safe but caused repeated permission prompts.

It then recommends a list of commands to add to your permissions allowlist.

Use this to tune up your permissions and avoid unnecessary permission prompts, especially if you don't use auto mode.

code.claude.com/docs/en/permis… — 𝕏

1 viewedited 19:38

@bcherny updates

🧵Thread · Previous · Next

3/ Recaps

We shipped recaps earlier this week, to prep for Opus 4.7. Recaps are short summaries for what an agent did & what's next.

Very useful when returning to a long-running session after a few minutes or a few hours. — 𝕏

1 viewedited 19:38

@bcherny updates

🧵Thread · Previous · Next

4/ Focus mode

I've been loving the new focus mode in the CLI, which hides all the intermediate work to just focus on the final result. The model has reached a point where I generally trust it to run the right commands and make the right edits. I just look at the final result.

/focus to toggle on/off. — 𝕏

1 viewedited 19:38

@bcherny updates

This media is not supported in your browser

VIEW IN TELEGRAM

🧵Thread · Previous · Next

5/ Configure your effort level

Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort.

Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability.

Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also.

/effort to set your effort level. — 𝕏

1 viewedited 19:38

@bcherny updates

🧵Thread · Previous · Next

6/ Give Claude a way to verify its work

Finally, make sure Claude has a way to verify its work. This has always been a way to 2-3x what you get out of Claude, and with 4.7 it's more important than ever.

Verification looks different depending on the task. For backend work, make sure Claude knows how to start up your server/service to test it end to end; for frontend work, use the Claude Chromium extension to give Claude a way to control your browser; for desktop apps, use computer use.

Personally, many of my prompts these days look like "Claude do blah blah /go". /go is a skill that has Claude

1. Test itself end to end using bash, browser, or computer use
2. Run the /simplify skill
3. Put up a PR

For long running work, verification is important because that way when you come back to a task, you know the code works. — 𝕏

1 viewedited 19:38

@bcherny updates

🧵Thread · Previous

Happy coding! Opus 4.7 is a significant step up. To get the most out of it, take the time to adjust your workflow to take advantage of Claude running for longer & being more agentic. It feels like a nice improvement with old workflows, and a significant leap once you take the time to adjust. — 𝕏

1 view19:38

@bcherny updates

Root

@AutomatorLab
@bcherny can I give you a hug

🫶 — 𝕏

1 view19:38

@bcherny updates

🧵Thread · Previous

For those not seeing the increase, make sure you're using Opus 4.7 with the latest Claude Code — 𝕏

1 view19:38

@bcherny updates

RT @ClaudeDevs

We fixed a bug where rate limits on Claude subscriptions weren't properly adjusted for long context requests in Opus 4.7.

We've reset 5-hour and weekly rate limits. Enjoy Opus 4.7! — 𝕏

1 view20:38

@bcherny updates

Root

@DSJayatillake
@bcherny Have you left the terminal yourself and use Claude Code in Claude Desktop?

I use a mix of Desktop, iOS app, and CLI — 𝕏

1 view20:39

@bcherny updates

RT @felixrieseberg

Hi! I'm here with *another launch*, it just happens to be extremely niche, nerdy, and probably only for a handful of people.

In the desktop app, Claude Cowork and Code now have a little Bluetooth API for makers & developers, allowing you to build hardware devices that interact with Claude.

I, for instance, built a little desk pet that alerts me whenever Claude is waiting for permission. — 𝕏

1 view01:39

About

Blog

Apps

Platform