Scale Labs debuts new Refactoring Leaderboard
Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still limit production use.
π #ai @testingcatalog
Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still limit production use.
π #ai @testingcatalog
TestingCatalog
Scale Labs debuts new Refactoring Leaderboard
Scale Labs unveils the Refactoring Leaderboard, spotlighting AI coding agentsβ ability to restructure complex codebases.
β€4π4
π¨ AI News | TestingCatalog
Scale Labs debuts new Refactoring Leaderboard Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability stillβ¦
Scale AI published SWE Atlas Refactoring Leaderboard, a new benchmark that evaluates agent capabilities of restructuring the code.
> It requires agents to produce twice as much lines of code than SWE Bench Pro.
> Claude Code with Opus 4.7 tops the leaderboard followed by Codex with GPT-5.5, GPT-5.4 and GPT-5.3.
> Refactoring is quite an important task for LLMs to handle as it often boils down to a quite boring engineering work.
> It requires agents to produce twice as much lines of code than SWE Bench Pro.
> Claude Code with Opus 4.7 tops the leaderboard followed by Codex with GPT-5.5, GPT-5.4 and GPT-5.3.
> Refactoring is quite an important task for LLMs to handle as it often boils down to a quite boring engineering work.
β€4π3 3
AVM 2 is in development π§
Historically, AVM updates are reserved to the day before I/O event.
Soon? πππ
Historically, AVM updates are reserved to the day before I/O event.
Soon? πππ
β€8π3
OPENAI π₯: Codex is getting a dedicated Chrome extension soon!
> With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser, and you stay in control of which websites Codex can use.
* Not available yet π
> With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser, and you stay in control of which websites Codex can use.
* Not available yet π
β€4π3
π¨ AI News | TestingCatalog
OPENAI π₯: Codex is getting a dedicated Chrome extension soon! > With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browserβ¦
Media is too big
VIEW IN TELEGRAM
Codex Chrome extension is now officially rolling out on macOS and Windows. You need to install the Chrome plugin to start testing.
β€7π₯5 3π1
SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex
SpaceXAI appears close to launching Grok Build, a desktop coding app for macOS, Linux, and Windows.
> It will support planning mode, Plugins, Skills, and MCPs.
> Will be able to work with the Git tree, spawn dev servers, and work with a built-in browser.
π #grok @testingcatalog
SpaceXAI appears close to launching Grok Build, a desktop coding app for macOS, Linux, and Windows.
> It will support planning mode, Plugins, Skills, and MCPs.
> Will be able to work with the Git tree, spawn dev servers, and work with a built-in browser.
π #grok @testingcatalog
TestingCatalog AI News
SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex
SpaceXAI is nearing the release of Grok Build, a desktop coding tool for macOS, Linux, and Windows, after a brief accidental leak online.
π₯3β€2π1 1
OpenAI launches new realtime voice and translation AI models
OpenAI added three real-time voice models to its API: GPT-Realtime-2 for complex voice agents, GPT-Realtime-Translate for multilingual speech, and GPT-Realtime-Whisper for live transcription, with pricing and Playground access.
π #chatgpt @testingcatalog
OpenAI added three real-time voice models to its API: GPT-Realtime-2 for complex voice agents, GPT-Realtime-Translate for multilingual speech, and GPT-Realtime-Whisper for live transcription, with pricing and Playground access.
π #chatgpt @testingcatalog
TestingCatalog AI News
OpenAI launches new realtime voice and translation AI models
OpenAI introduces three advanced, real-time audio models for developers, supporting live voice agents, instant translation, and streaming transcription via API.
π3β€2
Telegram ships major update for AI bots and automations
Telegram is slowly becoming the most AI-integrated messenger! Users can now interact with guest AI bots, build automated workflows involving bot-to-bot communication, and attach their personal AI assistants to their profiles so the bot can handle incoming inquiries.
π #telegram @testingcatalog
Telegram is slowly becoming the most AI-integrated messenger! Users can now interact with guest AI bots, build automated workflows involving bot-to-bot communication, and attach their personal AI assistants to their profiles so the bot can handle incoming inquiries.
π #telegram @testingcatalog
TestingCatalog AI News
Telegram ships major update for AI bots and automations
Telegramβs latest update introduces Guest Bots, multi-bot workflows, AI sticker search, chat automation features, and new controls for admins.
Google unveils Google Health app, Health Coach, and Fitbit Air
Google expanded its health lineup with a unified Health App, AI-powered Health Coach under Premium, and the screenless Fitbit Air. The platform combines health data, supports secure sharing, and offers personalized wellness guidance.
π #googleapps @testingcatalog
Google expanded its health lineup with a unified Health App, AI-powered Health Coach under Premium, and the screenless Fitbit Air. The platform combines health data, supports secure sharing, and offers personalized wellness guidance.
π #googleapps @testingcatalog
TestingCatalog AI News
Google unveils Google Health app, Health Coach, and Fitbit Air
What's new? Google Health App consolidates health data and links with Google Health Coach on premium; Fitbit Air debuts as a screenless fitness tracker with a three-month trial;
Google launches Gemini 3.1 Flash-Lite in General Availability
Google launched Gemini 3.1 Flash-Lite for Google Cloud, offering low-latency, high-volume AI with text and image support. Aimed at enterprise use, it delivers fast, cost-efficient performance for automation, tool calling, and real-time tasks.
π #aistudio @testingcatalog
Google launched Gemini 3.1 Flash-Lite for Google Cloud, offering low-latency, high-volume AI with text and image support. Aimed at enterprise use, it delivers fast, cost-efficient performance for automation, tool calling, and real-time tasks.
π #aistudio @testingcatalog
TestingCatalog AI News
Google launches Gemini 3.1 Flash-Lite in General Availability
What's new? Gemini 3.1 flash-lite is a new ai model for low latency and high-volume processing on google cloud; it supports text and image processing with tool calling capabilities;
β€4 3
OpenAI adds Chrome plugin and tests Remote control for Codex
1. OpenAI expanded Codex with a Chrome extension that runs in an isolated browser session, supports background tab work, and uses DevTools.
2. OpenAI continues working on Remote Control for Codex, which will allow it to manage remote instances via SSH.
3. A new toggle has been added to the settings that will keep remote connection to Codex open.
π #chatgpt @testingcatalog
1. OpenAI expanded Codex with a Chrome extension that runs in an isolated browser session, supports background tab work, and uses DevTools.
2. OpenAI continues working on Remote Control for Codex, which will allow it to manage remote instances via SSH.
3. A new toggle has been added to the settings that will keep remote connection to Codex open.
π #chatgpt @testingcatalog
TestingCatalog
OpenAI adds Chrome plugin and tests Remote control for Codex
OpenAI rolls out a Chrome extension for Codex, allowing agents to run browser sessions independently; Additionally, tests Remote Control and Voice Mode!
π4β€3 1
Google is testing the option to mark Notebooks as "Donation Safe" as part of a Data Donation feature.
> Your logs from using this notebook will NOT be scrubbed (this allows for quality improvement).
> The notebook will immediately lose its Donation Safe status if shared.
> Marking a notebook as Donation Safe allows you to Donate Detailed Feedback.
> You must not donate NTK or Privileged data.
> Your logs from using this notebook will NOT be scrubbed (this allows for quality improvement).
> The notebook will immediately lose its Donation Safe status if shared.
> Marking a notebook as Donation Safe allows you to Donate Detailed Feedback.
> You must not donate NTK or Privileged data.
ICYMI: Connectors are now available on Grok mobile apps as well.
π4 1