Maket opens Draw from Scratch tool to all users for free
Maket AI made its Draw from Scratch canvas free for all users, opening manual room sketching, furnishing, and 3D visualization to homeowners and builders. The shift widens access beyond paid tiers while keeping premium features for higher-volume use.
π #sponsored @testingcatalog
Maket AI made its Draw from Scratch canvas free for all users, opening manual room sketching, furnishing, and 3D visualization to homeowners and builders. The shift widens access beyond paid tiers while keeping premium features for higher-volume use.
π #sponsored @testingcatalog
TestingCatalog
Maket opens Draw from Scratch tool to all users for free
Maketβs Draw from Scratch canvas is now free for all users, letting anyone create residential layouts and 3D visualizations without a paid plan.
β€1π1 1
Google made Gemma 4 models 3x faster with MTP Drafters
Google introduced speculative decoding for Gemma 4, cutting inference latency by pairing a main model with a lightweight drafter. It speeds local and on-device AI on GPUs and edge hardware without sacrificing output quality.
π #gemini @testingcatalog
Google introduced speculative decoding for Gemma 4, cutting inference latency by pairing a main model with a lightweight drafter. It speeds local and on-device AI on GPUs and edge hardware without sacrificing output quality.
π #gemini @testingcatalog
TestingCatalog
Google made Gemma 4 models 3x faster with MTP Drafters
What's new? Speculative decoding pairs a heavy main model with a light drafter to pre-generate tokens; Gemma 4 models now run on consumer GPUs and edge devices;
β€2π1
Anthropic debuts Dreams for Claude Managed Agents
Anthropic previewed βdreamingβ in Claude Managed Agents, letting agents review past sessions, refine memory, and reduce repeat errors. The update also adds outcomes, multiagent orchestration, and webhooks for complex enterprise workflows.
π #claude @testingcatalog
Anthropic previewed βdreamingβ in Claude Managed Agents, letting agents review past sessions, refine memory, and reduce repeat errors. The update also adds outcomes, multiagent orchestration, and webhooks for complex enterprise workflows.
π #claude @testingcatalog
TestingCatalog
Anthropic debuts Dreams for Claude Managed Agents
What's new? the dreaming feature reviews past sessions to restructure agents memory; outcomes set task criteria and multiagent orchestration assigns complex tasks;
β€3
Anthropic partners with SpaceXAI and doubles 5-hour rate limits
Anthropicβs partnership with SpaceX expands compute capacity and raises Claude usage limits, giving SpaceX staff broader AI access for engineering, operations, coding, and documentation in high-demand internal workflows.
π #claude @testingcatalog
Anthropicβs partnership with SpaceX expands compute capacity and raises Claude usage limits, giving SpaceX staff broader AI access for engineering, operations, coding, and documentation in high-demand internal workflows.
π #claude @testingcatalog
TestingCatalog
Anthropic partners with SpaceXAI and doubles 5-hour rate limits
Anthropic extends higher usage limits for Claude AI, enabling teams to leverage advanced models for engineering and operations tasks.
β€3 1
Google prepares Agent Mode on Gemini to tackle complex tasks
Leaked Gemini builds point to a dedicated Agent Mode tab for multi-step workflows, combining skills and scheduled actions for inbox, meetings, research, writing, and files.
π #gemini @testingcatalog
Leaked Gemini builds point to a dedicated Agent Mode tab for multi-step workflows, combining skills and scheduled actions for inbox, meetings, research, writing, and files.
π #gemini @testingcatalog
TestingCatalog AI News
Google prepares Agent Mode on Gemini to tackle complex tasks
Google is preparing to launch an Agent Mode tab in Gemini, enabling workflows, scheduled actions, and skills designed for Workspace tasks.
β€5
Google Deep Mind π€ EVE Online
Google has partnered with Fernis Creations to conduct a new research within a scope of an isolated EVE Online environment.
> As part of this next chapter, we are beginning a research partnership with Google DeepMind, focused on intelligence in complex, dynamic, player-driven systems.
Google has partnered with Fernis Creations to conduct a new research within a scope of an isolated EVE Online environment.
> As part of this next chapter, we are beginning a research partnership with Google DeepMind, focused on intelligence in complex, dynamic, player-driven systems.
β€8π2
Anthropic is testing Insights feature for its Managed Agents on Claude Console.
> Up to 100 recent sessions are fetched. Each transcript is sent to the model (4 in parallel) with your agent's system prompt as context. The model writes a summary β task, actions, issues, assessment β and a 0β100 quality score. Token, cache, and tool-error counts are computed directly from the events alongside.
> A single model call reads every summary and its stats, then produces cross-session findings (recurring errors, usage patterns, efficiency outliers, wins), error-category buckets, and use-case clusters. Every cited session ID is checked against the input, so findings only ever point at real sessions.
> Summaries and findings are saved so the page loads instantly next time. Everything numeric you see β counts, percentages, token stats per cluster β is computed here from raw event data; only the prose and bucket membership come from the model.
> Up to 100 recent sessions are fetched. Each transcript is sent to the model (4 in parallel) with your agent's system prompt as context. The model writes a summary β task, actions, issues, assessment β and a 0β100 quality score. Token, cache, and tool-error counts are computed directly from the events alongside.
> A single model call reads every summary and its stats, then produces cross-session findings (recurring errors, usage patterns, efficiency outliers, wins), error-category buckets, and use-case clusters. Every cited session ID is checked against the input, so findings only ever point at real sessions.
> Summaries and findings are saved so the page loads instantly next time. Everything numeric you see β counts, percentages, token stats per cluster β is computed here from raw event data; only the prose and bucket membership come from the model.
1β€5π₯2
Google prepares Agent Mode for Flow to automate video production
Google is preparing an Agent Mode for Flow, its Veo-based filmmaking tool. Code indicates a prompt-bar toggle for planning scenes, managing tools, running generation, and updating projects, with a likely reveal at I/O.
π #flow @testingcatalog
Google is preparing an Agent Mode for Flow, its Veo-based filmmaking tool. Code indicates a prompt-bar toggle for planning scenes, managing tools, running generation, and updating projects, with a likely reveal at I/O.
π #flow @testingcatalog
TestingCatalog
Google prepares Agent Mode for Flow to automate video production
Google is preparing an Agent Mode for Flow, enabling creators to toggle an AI assistant that plans scenes and manages video projects through chat
π₯6β€4 2
Meta prepares Hatch Agent under waitlist and social media skills
Meta is preparing Hatch, a waitlisted consumer AI agent for image and video creation, shopping, research, scheduled tasks, and file generation.
π #meta @testingcatalog
Meta is preparing Hatch, a waitlisted consumer AI agent for image and video creation, shopping, research, scheduled tasks, and file generation.
π #meta @testingcatalog
TestingCatalog AI News
Meta prepares Hatch Agent under waitlist and social media skills
Meta is advancing its autonomous agent, Hatch, with early code signaling tasks like image creation and research, launching soon under a waitlist
π2π₯2 2 1
Scale Labs debuts new Refactoring Leaderboard
Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still limit production use.
π #ai @testingcatalog
Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still limit production use.
π #ai @testingcatalog
TestingCatalog
Scale Labs debuts new Refactoring Leaderboard
Scale Labs unveils the Refactoring Leaderboard, spotlighting AI coding agentsβ ability to restructure complex codebases.
β€4π4
π¨ AI News | TestingCatalog
Scale Labs debuts new Refactoring Leaderboard Scale Labsβ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability stillβ¦
Scale AI published SWE Atlas Refactoring Leaderboard, a new benchmark that evaluates agent capabilities of restructuring the code.
> It requires agents to produce twice as much lines of code than SWE Bench Pro.
> Claude Code with Opus 4.7 tops the leaderboard followed by Codex with GPT-5.5, GPT-5.4 and GPT-5.3.
> Refactoring is quite an important task for LLMs to handle as it often boils down to a quite boring engineering work.
> It requires agents to produce twice as much lines of code than SWE Bench Pro.
> Claude Code with Opus 4.7 tops the leaderboard followed by Codex with GPT-5.5, GPT-5.4 and GPT-5.3.
> Refactoring is quite an important task for LLMs to handle as it often boils down to a quite boring engineering work.
β€4π3 3
AVM 2 is in development π§
Historically, AVM updates are reserved to the day before I/O event.
Soon? πππ
Historically, AVM updates are reserved to the day before I/O event.
Soon? πππ
β€8π3
OPENAI π₯: Codex is getting a dedicated Chrome extension soon!
> With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser, and you stay in control of which websites Codex can use.
* Not available yet π
> With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser, and you stay in control of which websites Codex can use.
* Not available yet π
β€4π3
π¨ AI News | TestingCatalog
OPENAI π₯: Codex is getting a dedicated Chrome extension soon! > With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browserβ¦
Media is too big
VIEW IN TELEGRAM
Codex Chrome extension is now officially rolling out on macOS and Windows. You need to install the Chrome plugin to start testing.
β€7π₯5 3π1
SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex
SpaceXAI appears close to launching Grok Build, a desktop coding app for macOS, Linux, and Windows.
> It will support planning mode, Plugins, Skills, and MCPs.
> Will be able to work with the Git tree, spawn dev servers, and work with a built-in browser.
π #grok @testingcatalog
SpaceXAI appears close to launching Grok Build, a desktop coding app for macOS, Linux, and Windows.
> It will support planning mode, Plugins, Skills, and MCPs.
> Will be able to work with the Git tree, spawn dev servers, and work with a built-in browser.
π #grok @testingcatalog
TestingCatalog AI News
SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex
SpaceXAI is nearing the release of Grok Build, a desktop coding tool for macOS, Linux, and Windows, after a brief accidental leak online.
π₯3β€2π1 1
OpenAI launches new realtime voice and translation AI models
OpenAI added three real-time voice models to its API: GPT-Realtime-2 for complex voice agents, GPT-Realtime-Translate for multilingual speech, and GPT-Realtime-Whisper for live transcription, with pricing and Playground access.
π #chatgpt @testingcatalog
OpenAI added three real-time voice models to its API: GPT-Realtime-2 for complex voice agents, GPT-Realtime-Translate for multilingual speech, and GPT-Realtime-Whisper for live transcription, with pricing and Playground access.
π #chatgpt @testingcatalog
TestingCatalog AI News
OpenAI launches new realtime voice and translation AI models
OpenAI introduces three advanced, real-time audio models for developers, supporting live voice agents, instant translation, and streaming transcription via API.
π3β€2