Sakana AI introduced Inference-Time Scaling and Collective Intelligence for Frontier AI
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
sakana.ai
Sakana AI
Inference-Time Scaling and Collective Intelligence for Frontier AI
π120π93β€91π₯91
This media is not supported in your browser
VIEW IN TELEGRAM
Meanwhile, Krea.ai has released a Video Restyle tool
It looks fancy, but under the hood is just a new video-to-video model from LumaLabs.
It looks fancy, but under the hood is just a new video-to-video model from LumaLabs.
β€171π171π169π₯149
This media is not supported in your browser
VIEW IN TELEGRAM
This is the Minimax Hailuo 02. It looks like it's been fed every sports report since 1896.
π342β€340π309π₯303
Veo3 Quality and Veo3 Fast
Quality listens to the prompt. Fast is just in a hurry.
Quality listens to the prompt. Fast is just in a hurry.
β€308π₯286π279π264
This media is not supported in your browser
VIEW IN TELEGRAM
Baidu MuseSteamer
Another video generator.
It is deeply hidden in the Chinese jungle of a Chinese website and application.
It is called MuseSteamer - it can do 10 seconds (small ones on the website are 5), 1080P, any aspects, it is available in Turbo (available in beta now), Lite, Pro variations.
The main feature is Chinese lip sync (although the videos on the website are silent).
This is a development from Baidu
Judging by the examples, the quality of the model is the last generation. But on their website they also have a note that this is Model 1.0.
In general, we are waiting for them to go beyond China.
P.S. Does it seem to me alone that there are already more basic video models than basic models for generating images?
Another video generator.
It is deeply hidden in the Chinese jungle of a Chinese website and application.
It is called MuseSteamer - it can do 10 seconds (small ones on the website are 5), 1080P, any aspects, it is available in Turbo (available in beta now), Lite, Pro variations.
The main feature is Chinese lip sync (although the videos on the website are silent).
This is a development from Baidu
Judging by the examples, the quality of the model is the last generation. But on their website they also have a note that this is Model 1.0.
In general, we are waiting for them to go beyond China.
P.S. Does it seem to me alone that there are already more basic video models than basic models for generating images?
π257β€256π241π₯231
Media is too big
VIEW IN TELEGRAM
Hunyuan has released a new 3D generator.
Its name is Hunyuan3D-PolyGen.
This is definitely not the original version 2.1, and most likely an upgrade of version 2.5 (which is codeless).
Judging by the video, it looks pretty killer, but I wouldn't trust the videos.
They write that they made their own autoregressive model for retop and now you can play games or go to movies.
Now there are also 10,000+ polygons per model and increased generation accuracy.
https://3d.hunyuan.tencent.com/
Its name is Hunyuan3D-PolyGen.
This is definitely not the original version 2.1, and most likely an upgrade of version 2.5 (which is codeless).
Judging by the video, it looks pretty killer, but I wouldn't trust the videos.
They write that they made their own autoregressive model for retop and now you can play games or go to movies.
Now there are also 10,000+ polygons per model and increased generation accuracy.
https://3d.hunyuan.tencent.com/
β€240π₯226π223π210
This media is not supported in your browser
VIEW IN TELEGRAM
Well, finally Google Flow has been rolled out to almost the entire world, including Europe. Now you don't have to use VPN
https://labs.google/fx/tools/flow
In the video, it's a lipsync on the initial photo.
https://labs.google/fx/tools/flow
In the video, it's a lipsync on the initial photo.
π249β€230π₯230π228
Musk and the xAI team unveiled Grok 4, a new AI model built on the Colossus supercomputer with 200,000 GPUs, allowing it to perform 10x more RL computations than its competitors.
It leads the Artificial Analysis Intelligence Index with a score of 73, ahead of OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 (68).
Here's everything you need to know about it:
1. Grok 4 comes in 2 versions:
- Base - a single-agent model, fast and versatile, achieves 35% on Humanity's Last Exam (HLE) and 45% with additional computation.
- Grok 4 Heavy - a multi-agent version, where several AIs work as a "training group" to improve accuracy (50.7% on HLE). Needs some work, but is more powerful.
2. Grok 4 can generate full-fledged video games based on text queries, including code, graphics, and mechanics. The model also analyzes the "fun" of games, assessing their attractiveness. Gaming capabilities are still limited to prototypes, and complex projects require further development.
It leads the Artificial Analysis Intelligence Index with a score of 73, ahead of OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 (68).
Here's everything you need to know about it:
1. Grok 4 comes in 2 versions:
- Base - a single-agent model, fast and versatile, achieves 35% on Humanity's Last Exam (HLE) and 45% with additional computation.
- Grok 4 Heavy - a multi-agent version, where several AIs work as a "training group" to improve accuracy (50.7% on HLE). Needs some work, but is more powerful.
2. Grok 4 can generate full-fledged video games based on text queries, including code, graphics, and mechanics. The model also analyzes the "fun" of games, assessing their attractiveness. Gaming capabilities are still limited to prototypes, and complex projects require further development.
π₯144β€128π123π116
Black Forrest is increasingly following the path of Pika Labs or Higgsfield.
They launched "Kontext Komposer" and "Kontext-powered Presets
Now you can upload your image and ask to change it by simply selecting a preset, rather than writing out a complex prompt. A one-button solution for those who don't know how\don't want to make prompts.
It's clear that under the hood there is some kind of system prompt for each preset (which LLM I don't know). So smart people first dug up all these prompts, then screwed them into Komfi and even tried them in other generators\chats.
They launched "Kontext Komposer" and "Kontext-powered Presets
Now you can upload your image and ask to change it by simply selecting a preset, rather than writing out a complex prompt. A one-button solution for those who don't know how\don't want to make prompts.
It's clear that under the hood there is some kind of system prompt for each preset (which LLM I don't know). So smart people first dug up all these prompts, then screwed them into Komfi and even tried them in other generators\chats.
π₯521β€517π497π487
Here we have another competitor to Flux Kontext
This time from Nvidia.
This is an inserter of objects into pictures.
You can see sweet examples here:
https://research.nvidia.com/labs/par/addit/
This time from Nvidia.
This is an inserter of objects into pictures.
You can see sweet examples here:
https://research.nvidia.com/labs/par/addit/
π57π₯53β€48π44
Everyone has probably already read that Grok has brought in a couple of avatars, which they call companions.
These are Girl Ani and red panda Bad Ruby, they say that there will be a third and fourth.
So far on a SuperGrok subscription and in the iOS app.
You can turn on NSFW mode in their Settings and then they start to burn in full virtual power.
Look how brilliantly Deemos Tech, the developers of one of the best 3D generators Rodin, responded.
They simply took this very Ani and converted it into 3D, then did some magic with the material and shoved it into Mixamo for rigging and animation. With Ruby it turned out even more elegant.
And now Ani and Ruby are dancing to the user's tune.
A great example of how one hyped product makes great advertising based on another hyped product.
These are Girl Ani and red panda Bad Ruby, they say that there will be a third and fourth.
So far on a SuperGrok subscription and in the iOS app.
You can turn on NSFW mode in their Settings and then they start to burn in full virtual power.
Look how brilliantly Deemos Tech, the developers of one of the best 3D generators Rodin, responded.
They simply took this very Ani and converted it into 3D, then did some magic with the material and shoved it into Mixamo for rigging and animation. With Ruby it turned out even more elegant.
And now Ani and Ruby are dancing to the user's tune.
A great example of how one hyped product makes great advertising based on another hyped product.
π580β€575π563π₯561
Media is too big
VIEW IN TELEGRAM
High Input Fidelity for chatGPT
Precise image editing in GPT Image-1.
Until now it was clear that, unlike Flux Kontext, chatGPT adds quite a lot of detail to faces, poses, and environments when editing images.
They have just announced the High Input Fidelity parameter, which is designed to preserve details and more accurately preserve the appearance of the original image.
It works only through API and only on the platform https://platform.openai.com/playground/images/
Precise image editing in GPT Image-1.
Until now it was clear that, unlike Flux Kontext, chatGPT adds quite a lot of detail to faces, poses, and environments when editing images.
They have just announced the High Input Fidelity parameter, which is designed to preserve details and more accurately preserve the appearance of the original image.
It works only through API and only on the platform https://platform.openai.com/playground/images/
β€96π86π₯85π80
This media is not supported in your browser
VIEW IN TELEGRAM
Suno v4.5+
More variety in genres
Thicker vocals
Better prompt understanding
Speed up generation time
More creative beats: Specific improvements for genres like metal and pop-punk aimed at reducing repetition and increasing variety.
More complex harmonic arrangements
Stronger Covers support: can imitate celebrity voices and simulate specific roles and styles
Add Vocals - can be added to generations or to external files. Resko speeds up vocal selection.
Add Instrumentals does the opposite: you provide a vocal file (voice recording or vocal fragment), and the AI can immediately generate basic accompaniment for it to create a full song. The same vocals can be quickly used in different instrumental styles.
More variety in genres
Thicker vocals
Better prompt understanding
Speed up generation time
More creative beats: Specific improvements for genres like metal and pop-punk aimed at reducing repetition and increasing variety.
More complex harmonic arrangements
Stronger Covers support: can imitate celebrity voices and simulate specific roles and styles
Add Vocals - can be added to generations or to external files. Resko speeds up vocal selection.
Add Instrumentals does the opposite: you provide a vocal file (voice recording or vocal fragment), and the AI can immediately generate basic accompaniment for it to create a full song. The same vocals can be quickly used in different instrumental styles.
π₯176π164β€157π154
ChatGPT has introduced restyling at the button level, not at the prompt level. The "styles" are just pre-made prompts.
β€396π₯366π357π340
This media is not supported in your browser
VIEW IN TELEGRAM
Looks like Musk, with his avatars Anya and Rudy, has pulled the emergency brake.
Hedra has just dropped real-time avatars.
They're definitely stepping into the territory of HeyGen and Character ai here, but it seems we're in for real-time avatars that can be connected not just to Grok or custom models, but to any LLM at all.
At least, thatβs how it looks with Hedra.
Hedra has just dropped real-time avatars.
They're definitely stepping into the territory of HeyGen and Character ai here, but it seems we're in for real-time avatars that can be connected not just to Grok or custom models, but to any LLM at all.
At least, thatβs how it looks with Hedra.
π275β€264π₯261π260