β@nrehiew_ MRCR is a bad eval, we use GraphWalks now. More info here: x.com/bcherny/statusβ¦ β π
@bcherny
@eliebakouch π We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.
Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.
MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.
β@nrehiew_ MRCR is a bad eval, we use GraphWalks now. More info here: x.com/bcherny/statusβ¦ β π
@bcherny
@eliebakouch π We kept MRCR in the system card for scientific honesty, but we've actually been phasing it out slowly.
Two reasons: (1) it's built around stacking distractors to trick the model, which isn't how people actually use long context, and (2) we care more about applied long-context capability than needle-retrieval. Graphwalks is a better signal for applied reasoning over long context, and internally we've seen this model do really well on long-context code.
MRCR wasn't included in the Mythos Preview system card for these reasons, but Graphwalks was - that will be the case for future models too.
@mattpocockuk
Quick PSA about Opus 4.7:
Anthropic have raised the default effort of Claude Code from medium to xhigh.
This will likely burn more tokens by default.
Switch off the defaults and experiment to see what effort level suits your work best.
We've raised rate limits for all subscribers to make up for the increased token usage. Enjoy! β π
Root
Do you have the model provisioned for your Bedrock account? β π
@DurhamVSmith
@bcherny Thanks! Can you also fix that opus 4.7 is not working with Claude code via Aws bedrock
Do you have the model provisioned for your Bedrock account? β π
π§΅Thread Β· Previous
1/ Auto mode = no more permission prompts
Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark.
In the past, you either had to babysit the model while it did these sorts of long tasks, our use --dangerously-skip-permissions.
We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it's safe, it's auto-approved.
This means no more babysitting while the model runs. More than that, it means you can run more Claudes in parallel. Once a Claude is cooking, you can switch focus to the next Claude.
Auto mode is now available for Opus 4.7 for Max, Teams, and Enterprise users. Shift-tab to enter auto mode in the CLI, or choose it in the dropdown in Desktop or VSCode.
1/ Auto mode = no more permission prompts
Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark.
In the past, you either had to babysit the model while it did these sorts of long tasks, our use --dangerously-skip-permissions.
We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it's safe, it's auto-approved.
This means no more babysitting while the model runs. More than that, it means you can run more Claudes in parallel. Once a Claude is cooking, you can switch focus to the next Claude.
Auto mode is now available for Opus 4.7 for Max, Teams, and Enterprise users. Shift-tab to enter auto mode in the CLI, or choose it in the dropdown in Desktop or VSCode.
βπ§΅Thread Β· Previous Β· Next
2/ The new /fewer-permission-prompts skill
We've also released a new /fewer-permission-prompts skill. It scans through your session history to find common bash and MCP commands that are safe but caused repeated permission prompts.
It then recommends a list of commands to add to your permissions allowlist.
Use this to tune up your permissions and avoid unnecessary permission prompts, especially if you don't use auto mode.
code.claude.com/docs/en/permisβ¦ β π
2/ The new /fewer-permission-prompts skill
We've also released a new /fewer-permission-prompts skill. It scans through your session history to find common bash and MCP commands that are safe but caused repeated permission prompts.
It then recommends a list of commands to add to your permissions allowlist.
Use this to tune up your permissions and avoid unnecessary permission prompts, especially if you don't use auto mode.
code.claude.com/docs/en/permisβ¦ β π
π§΅Thread Β· Previous Β· Next
3/ Recaps
We shipped recaps earlier this week, to prep for Opus 4.7. Recaps are short summaries for what an agent did & what's next.
Very useful when returning to a long-running session after a few minutes or a few hours. β π
3/ Recaps
We shipped recaps earlier this week, to prep for Opus 4.7. Recaps are short summaries for what an agent did & what's next.
Very useful when returning to a long-running session after a few minutes or a few hours. β π
π§΅Thread Β· Previous Β· Next
4/ Focus mode
I've been loving the new focus mode in the CLI, which hides all the intermediate work to just focus on the final result. The model has reached a point where I generally trust it to run the right commands and make the right edits. I just look at the final result.
/focus to toggle on/off. β π
4/ Focus mode
I've been loving the new focus mode in the CLI, which hides all the intermediate work to just focus on the final result. The model has reached a point where I generally trust it to run the right commands and make the right edits. I just look at the final result.
/focus to toggle on/off. β π
This media is not supported in your browser
VIEW IN TELEGRAM
π§΅Thread Β· Previous Β· Next
5/ Configure your effort level
Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort.
Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability.
Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also.
/effort to set your effort level. β π
5/ Configure your effort level
Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort.
Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability.
Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also.
/effort to set your effort level. β π
π§΅Thread Β· Previous Β· Next
6/ Give Claude a way to verify its work
Finally, make sure Claude has a way to verify its work. This has always been a way to 2-3x what you get out of Claude, and with 4.7 it's more important than ever.
Verification looks different depending on the task. For backend work, make sure Claude knows how to start up your server/service to test it end to end; for frontend work, use the Claude Chromium extension to give Claude a way to control your browser; for desktop apps, use computer use.
Personally, many of my prompts these days look like "Claude do blah blah /go". /go is a skill that has Claude
1. Test itself end to end using bash, browser, or computer use
2. Run the /simplify skill
3. Put up a PR
For long running work, verification is important because that way when you come back to a task, you know the code works. β π
6/ Give Claude a way to verify its work
Finally, make sure Claude has a way to verify its work. This has always been a way to 2-3x what you get out of Claude, and with 4.7 it's more important than ever.
Verification looks different depending on the task. For backend work, make sure Claude knows how to start up your server/service to test it end to end; for frontend work, use the Claude Chromium extension to give Claude a way to control your browser; for desktop apps, use computer use.
Personally, many of my prompts these days look like "Claude do blah blah /go". /go is a skill that has Claude
1. Test itself end to end using bash, browser, or computer use
2. Run the /simplify skill
3. Put up a PR
For long running work, verification is important because that way when you come back to a task, you know the code works. β π
π§΅Thread Β· Previous
Happy coding! Opus 4.7 is a significant step up. To get the most out of it, take the time to adjust your workflow to take advantage of Claude running for longer & being more agentic. It feels like a nice improvement with old workflows, and a significant leap once you take the time to adjust. β π
Happy coding! Opus 4.7 is a significant step up. To get the most out of it, take the time to adjust your workflow to take advantage of Claude running for longer & being more agentic. It feels like a nice improvement with old workflows, and a significant leap once you take the time to adjust. β π
π§΅Thread Β· Previous
For those not seeing the increase, make sure you're using Opus 4.7 with the latest Claude Code β π
For those not seeing the increase, make sure you're using Opus 4.7 with the latest Claude Code β π
RT @ClaudeDevs
We fixed a bug where rate limits on Claude subscriptions weren't properly adjusted for long context requests in Opus 4.7.
We've reset 5-hour and weekly rate limits. Enjoy Opus 4.7! β π
We fixed a bug where rate limits on Claude subscriptions weren't properly adjusted for long context requests in Opus 4.7.
We've reset 5-hour and weekly rate limits. Enjoy Opus 4.7! β π
Root
I use a mix of Desktop, iOS app, and CLI β π
@DSJayatillake
@bcherny Have you left the terminal yourself and use Claude Code in Claude Desktop?
I use a mix of Desktop, iOS app, and CLI β π
RT @felixrieseberg
Hi! I'm here with *another launch*, it just happens to be extremely niche, nerdy, and probably only for a handful of people.
In the desktop app, Claude Cowork and Code now have a little Bluetooth API for makers & developers, allowing you to build hardware devices that interact with Claude.
I, for instance, built a little desk pet that alerts me whenever Claude is waiting for permission. β π
Hi! I'm here with *another launch*, it just happens to be extremely niche, nerdy, and probably only for a handful of people.
In the desktop app, Claude Cowork and Code now have a little Bluetooth API for makers & developers, allowing you to build hardware devices that interact with Claude.
I, for instance, built a little desk pet that alerts me whenever Claude is waiting for permission. β π