🚨 AI News | TestingCatalog

Google made Gemma 4 models 3x faster with MTP Drafters

Google introduced speculative decoding for Gemma 4, cutting inference latency by pairing a main model with a lightweight drafter. It speeds local and on-device AI on GPUs and edge hardware without sacrificing output quality.

🗞 #gemini @testingcatalog

TestingCatalog

Google made Gemma 4 models 3x faster with MTP Drafters

What's new? Speculative decoding pairs a heavy main model with a light drafter to pre-generate tokens; Gemma 4 models now run on consumer GPUs and edge devices;

❤2👍1

953 viewstc_zapier_bot, 20:35

🚨 AI News | TestingCatalog

Anthropic debuts Dreams for Claude Managed Agents

Anthropic previewed “dreaming” in Claude Managed Agents, letting agents review past sessions, refine memory, and reduce repeat errors. The update also adds outcomes, multiagent orchestration, and webhooks for complex enterprise workflows.

🗞 #claude @testingcatalog

TestingCatalog

Anthropic debuts Dreams for Claude Managed Agents

What's new? the dreaming feature reviews past sessions to restructure agents memory; outcomes set task criteria and multiagent orchestration assigns complex tasks;

❤3

762 viewstc_zapier_bot, 20:50

🚨 AI News | TestingCatalog

Anthropic partners with SpaceXAI and doubles 5-hour rate limits

Anthropic’s partnership with SpaceX expands compute capacity and raises Claude usage limits, giving SpaceX staff broader AI access for engineering, operations, coding, and documentation in high-demand internal workflows.

🗞 #claude @testingcatalog

TestingCatalog

Anthropic partners with SpaceXAI and doubles 5-hour rate limits

Anthropic extends higher usage limits for Claude AI, enabling teams to leverage advanced models for engineering and operations tasks.

❤31

829 viewstc_zapier_bot, 20:59

🚨 AI News | TestingCatalog

Google prepares Agent Mode on Gemini to tackle complex tasks

Leaked Gemini builds point to a dedicated Agent Mode tab for multi-step workflows, combining skills and scheduled actions for inbox, meetings, research, writing, and files.

🗞 #gemini @testingcatalog

TestingCatalog AI News

Google prepares Agent Mode on Gemini to tackle complex tasks

Google is preparing to launch an Agent Mode tab in Gemini, enabling workflows, scheduled actions, and skills designed for Workspace tasks.

❤5

890 viewstc_zapier_bot, edited 21:42

🚨 AI News | TestingCatalog

Google Deep Mind 🤝 EVE Online

Google has partnered with Fernis Creations to conduct a new research within a scope of an isolated EVE Online environment.

> As part of this next chapter, we are beginning a research partnership with Google DeepMind, focused on intelligence in complex, dynamic, player-driven systems.

❤8👍2

949 viewsAlexey, 22:13

🚨 AI News | TestingCatalog

Anthropic is testing Insights feature for its Managed Agents on Claude Console.

> Up to 100 recent sessions are fetched. Each transcript is sent to the model (4 in parallel) with your agent's system prompt as context. The model writes a summary — task, actions, issues, assessment — and a 0–100 quality score. Token, cache, and tool-error counts are computed directly from the events alongside.

> A single model call reads every summary and its stats, then produces cross-session findings (recurring errors, usage patterns, efficiency outliers, wins), error-category buckets, and use-case clusters. Every cited session ID is checked against the input, so findings only ever point at real sessions.

> Summaries and findings are saved so the page loads instantly next time. Everything numeric you see — counts, percentages, token stats per cluster — is computed here from raw event data; only the prose and bucket membership come from the model.

1❤5🔥2

916 viewsAlexey, 11:50

🚨 AI News | TestingCatalog

Google prepares Agent Mode for Flow to automate video production

Google is preparing an Agent Mode for Flow, its Veo-based filmmaking tool. Code indicates a prompt-bar toggle for planning scenes, managing tools, running generation, and updating projects, with a likely reveal at I/O.

🗞 #flow @testingcatalog

TestingCatalog

Google prepares Agent Mode for Flow to automate video production

Google is preparing an Agent Mode for Flow, enabling creators to toggle an AI assistant that plans scenes and manages video projects through chat

🔥6❤42

806 viewstc_zapier_bot, edited 14:11

🚨 AI News | TestingCatalog

Meta prepares Hatch Agent under waitlist and social media skills

Meta is preparing Hatch, a waitlisted consumer AI agent for image and video creation, shopping, research, scheduled tasks, and file generation.

🗞 #meta @testingcatalog

TestingCatalog AI News

Meta prepares Hatch Agent under waitlist and social media skills

Meta is advancing its autonomous agent, Hatch, with early code signaling tasks like image creation and research, launching soon under a waitlist

👍2🔥221

812 viewstc_zapier_bot, edited 14:29

🚨 AI News | TestingCatalog

Scale Labs debuts new Refactoring Leaderboard

Scale Labs’ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still limit production use.

🗞 #ai @testingcatalog

TestingCatalog

Scale Labs debuts new Refactoring Leaderboard

Scale Labs unveils the Refactoring Leaderboard, spotlighting AI coding agents’ ability to restructure complex codebases.

❤4👍4

795 viewstc_zapier_bot, 15:59

🚨 AI News | TestingCatalog

Scale Labs debuts new Refactoring Leaderboard Scale Labs’ SWE Atlas Refactoring Leaderboard tests whether AI coding agents can restructure real codebases without changing behavior. Opus 4.7 leads, but results show reliability, cleanup, and repeatability still…

Scale AI published SWE Atlas Refactoring Leaderboard, a new benchmark that evaluates agent capabilities of restructuring the code.

> It requires agents to produce twice as much lines of code than SWE Bench Pro.

> Claude Code with Opus 4.7 tops the leaderboard followed by Codex with GPT-5.5, GPT-5.4 and GPT-5.3.

> Refactoring is quite an important task for LLMs to handle as it often boils down to a quite boring engineering work.

❤4👍33

871 viewsAlexey, 16:17

🚨 AI News | TestingCatalog

OPENAI 🚨: 3 new models are now available on OpenAI Playground and APIs.

- gpt-realtime 2
- gpt-realtime-whisper
- gpt-realtime-translate

ChatGPT Voice Mode upgrade soon? 👀

👍2

875 viewsAlexey, 17:25

🚨 AI News | TestingCatalog

GOOGLE 🚨: Gemini 3.1 Flash Lite is now Generally Available! Users can also test this model on AI Studio.

> Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale.

7👍321

881 viewsAlexey, 19:02

🚨 AI News | TestingCatalog

SPACEXAI 🚨: New signs of Grok Computer have been spotted on the Grok web.

A new selector allows users to choose between Grok Computer and a "Folder on Google Drive."

This feature became recently available to everyone and might not be intentional.

Grok Computer soon? 👀

❤5👍5

848 viewsAlexey, edited 19:17

🚨 AI News | TestingCatalog

AVM 2 is in development 🚧

Historically, AVM updates are reserved to the day before I/O event.

Soon? 👀👀👀

❤8👍3

805 viewsAlexey, 19:31

🚨 AI News | TestingCatalog

OPENAI 🔥: Codex is getting a dedicated Chrome extension soon!

> With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser, and you stay in control of which websites Codex can use.

* Not available yet 👀

❤4👍3

896 viewsAlexey, 19:58

🚨 AI News | TestingCatalog

OPENAI 🔥: Codex is getting a dedicated Chrome extension soon! > With the new extension for Chrome, Codex is even better at working with apps and websites in your browser. It works in parallel across tabs in the background without taking over your browser…

0:52

Media is too big

VIEW IN TELEGRAM

Codex Chrome extension is now officially rolling out on macOS and Windows. You need to install the Chrome plugin to start testing.

❤7🔥53👍1

929 viewsAlexey, edited 20:18

🚨 AI News | TestingCatalog

SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex

SpaceXAI appears close to launching Grok Build, a desktop coding app for macOS, Linux, and Windows.

> It will support planning mode, Plugins, Skills, and MCPs.
> Will be able to work with the Git tree, spawn dev servers, and work with a built-in browser.

🗞 #grok @testingcatalog

TestingCatalog AI News

SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex

SpaceXAI is nearing the release of Grok Build, a desktop coding tool for macOS, Linux, and Windows, after a brief accidental leak online.

🔥3❤2👍11

1.04K viewstc_zapier_bot, edited 23:14

🚨 AI News | TestingCatalog

OpenAI launches new realtime voice and translation AI models

OpenAI added three real-time voice models to its API: GPT-Realtime-2 for complex voice agents, GPT-Realtime-Translate for multilingual speech, and GPT-Realtime-Whisper for live transcription, with pricing and Playground access.

🗞 #chatgpt @testingcatalog

TestingCatalog AI News

OpenAI launches new realtime voice and translation AI models

OpenAI introduces three advanced, real-time audio models for developers, supporting live voice agents, instant translation, and streaming transcription via API.

👍3❤2

913 viewstc_zapier_bot, 00:14

🚨 AI News | TestingCatalog

Telegram ships major update for AI bots and automations

Telegram is slowly becoming the most AI-integrated messenger! Users can now interact with guest AI bots, build automated workflows involving bot-to-bot communication, and attach their personal AI assistants to their profiles so the bot can handle incoming inquiries.

🗞 #telegram @testingcatalog

TestingCatalog AI News

Telegram ships major update for AI bots and automations

Telegram’s latest update introduces Guest Bots, multi-bot workflows, AI sticker search, chat automation features, and new controls for admins.

6❤2🔥2

1.11K viewstc_zapier_bot, edited 13:52

About

Blog

Apps

Platform