Berkeley researchers demonstrated that SWE-bench Verified and Terminal-Bench can be fully gamed without solving tasks.
Their agent achieved 100% by exploiting benchmark logic.
In SWE-bench, the agent added a short script that forced all tests to return โpassed,โ scoring 100% across hundreds of tasks with zero real fixes.
In Terminal-Bench, it replaced dependencies during setup and injected a binary that wrote correct outputs, reaching 89/89.
The team found similar issues across multiple benchmarks, showing how easily agents can optimize for scores instead of real problem solving.
Benchmark results are becoming easier to manipulate than to trust.
Please open Telegram to view this post
VIEW IN TELEGRAM
โค172๐127๐ฏ91๐ค19๐ฅ1๐1
Startups & Ventures
This media is not supported in your browser
VIEW IN TELEGRAM
Astronauts from Artemis II safely splashed down in the Pacific Ocean and have already been recovered and brought ashore.
The mission has now officially concluded.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐100๐73๐ฅ72๐63๐ฏ62โค53๐1
This media is not supported in your browser
VIEW IN TELEGRAM
Unitree demonstrated a humanoid robot reaching speeds of 10 m/s. For comparison, Usain Boltโs peak speed during his 100m world record was 12.4 m/s.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐108๐ค99๐ค78โค63๐45๐ข6๐1
Startups & Ventures
๐ป Rockstar hacked, GTA VI data not affected
ShinyHunters claim they accessed Rockstarโs cloud storage and demanded a ransom by April 14. The company confirmed the breach but said only non-critical data was exposed.
The leaked materials may include financial reports, player analytics, and internal documents. Rockstar stated that neither GTA VI nor player data were affected.
The incident adds pressure as the company prepares for its next major release.
๐ @tech
ShinyHunters claim they accessed Rockstarโs cloud storage and demanded a ransom by April 14. The company confirmed the breach but said only non-critical data was exposed.
The leaked materials may include financial reports, player analytics, and internal documents. Rockstar stated that neither GTA VI nor player data were affected.
The incident adds pressure as the company prepares for its next major release.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐161โค140๐ฏ101๐14๐ฅ5
This media is not supported in your browser
VIEW IN TELEGRAM
A viral video shows a Unitree G1 robot chasing away wild boars in Warsaw. The animals have become more common in recent years and are increasingly entering urban areas.
Residents are starting to use unconventional methods to deal with the growing problem.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐ฆ151๐137โค103๐35
Stella Lorenz, Senior Director of AI at AMD, published an analysis of Claude Code logs that points to a sharp drop in performance from February to March. She looked at 6,852 sessions, 234,760 tool calls, and 17,871 reasoning blocks.
The timing matters. The reported degradation lines up with the March 8 release of thinking redaction, which hides reasoning. Anthropic said that change was only about the UI.
In the GitHub thread, Claude Code creator Boris Cherny replied that default settings in the agent had changed, including adaptive thinking and Medium effort, so the analysis may be distorted.
Please open Telegram to view this post
VIEW IN TELEGRAM
โค108๐94๐72๐ค70๐47
Apple is working on smart glasses under the codename N50, with a reveal expected in late 2026 or early 2027 and release in 2027.
The device will support photos, video, calls, notifications, music, and voice control through an upgraded Siri from iOS 27. Apple is focusing on tight hardware and software integration, building the product fully in-house.
Four frame styles are in development, using acetate materials and colors like black, ocean blue, and light brown. The camera design may feature vertical oval lenses with a signature light element.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐152โค141๐122
This media is not supported in your browser
VIEW IN TELEGRAM
Blue Origin introduced the Air Pioneer reactor, designed to extract oxygen directly from lunar regolith. NASA has backed the project with $35M.
The process heats Moon dust to about 1600ยฐC until it melts. An electric current then splits the material through electrolysis, separating oxygen from metals like iron, titanium, and aluminum. Oxygen is released as gas, while metals and silicon collect separately.
The system needs about 1 megawatt of power, expected to come from solar panels near a lunar base. Byproducts can be reused as construction materials, including metals and glass.
NASA also provided a real Apollo-era lunar sample to help build accurate test material.
Please open Telegram to view this post
VIEW IN TELEGRAM
โค138๐133๐ฑ133
Lithium-ion battery costs fell from about $9,200 per kWh in 1991 to $78 today.
That brings a modern EV battery to roughly $5,000 versus nearly $600,000 for the same capacity decades ago.
The decline followed a steady learning curve. Since 1998, every doubling of global battery production cut costs by about 19%.
The gains came from incremental improvements across chemistry, manufacturing, and supply chains.
Early scale came from consumer electronics, not cars. Smartphones and laptops drove volume first, making batteries viable for larger use cases later.
Energy density also increased more than 3x over the same period.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐คฏ186โค163๐11๐1๐ข1
Porsche introduced the 911 GT3 S/C, a new two-seat cabriolet priced at โฌ269K.
It is the first GT3 with a retractable soft top and currently the only two-seat convertible in the 911 lineup.
The car runs a 4.0L engine with 510 hp.
It goes from 0 to 100 km/h in 3.9 seconds and reaches a top speed of 313 km/h.
Transmission is a 6-speed manual.
Orders are already open.
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
โค106๐ฅ89๐ฆ80๐66๐ฑ65๐65๐62๐62๐61๐ค60๐56
The GitHub repo andrej-karpathy-skills reached 36K stars in two days. It contains a single
CLAUDE.md file with 65 lines describing one agent skill.The file encodes four rules from Andrej Karpathyโs post: think and ask before coding, simplify solutions, change only what is requested, and work toward a clear goal.
Users report cleaner PRs, fewer unnecessary diffs, and better instruction following after adding the file.
Please open Telegram to view this post
VIEW IN TELEGRAM
โค141๐ฑ106๐ค93๐ณ65
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduced the Ising family of models, targeting calibration and error correction in quantum systems. The models are open and already available for local use.
Ising Decoding runs 2.5x faster and delivers 3x higher accuracy than pyMatching, the current standard. Ising Calibration reduces calibration time from days to hours.
The models integrate with CUDA-Q and NVQLink, and are available on Hugging Face.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐คฉ188โค159๐ณ47๐คฏ5๐4๐2
Google has released a native Gemini application for macOS, allowing users to summon the assistant with the Option + Space shortcut over any task.
The app can analyze screen content, handle files, and quickly answer questions without switching tabs. It also supports image and video generation.
Gemini is now globally available for users running macOS 15 and later versions.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐คฉ186โค165๐4๐3
Media is too big
VIEW IN TELEGRAM
OpenAI is pushing Codex far beyond code. The new pitch is simple: one tool for much more than engineering.
This looks like OpenAIโs answer to Claude Code Epitaxy.
Please open Telegram to view this post
VIEW IN TELEGRAM
1โค87๐ฅ72๐ฏ67๐64๐56๐15๐ข12
Google introduced Gemini 3.1 Flash TTS, a new speech model in the Gemini stack focused on controllable voice generation.
The model allows precise control over tone, pace, stress, and overall style using text tags. It also supports multi-speaker output while preserving each voiceโs characteristics, making it usable for longer-form content.
Latency improved by tens of percent, including faster first-token response, bringing it closer to real-time use cases.
Please open Telegram to view this post
VIEW IN TELEGRAM
1๐105โค94๐85๐ฅ27
Apple is reportedly working on a budget desktop called Mac Neo, priced around $300. The move follows Mac mini shortages and strong demand for lower-cost devices.
The device is expected to run on the A19 Pro chip from iPhone 17 Pro. Form factor is similar to Apple TV 4K, with 12GB RAM, USB-C ports, and about 35W power usage.
It is aimed at basic tasks like browsing, documents, and streaming, not heavy workloads where M-series chips dominate.
Please open Telegram to view this post
VIEW IN TELEGRAM
๐66โค47๐33