Chinese Lab MiniMax introduced in this week:
1. open-sourcing LLM MiniMax-M1 β setting new standards in long-context reasoning.
- Worldβs longest context window: 1M-token input, 80k-token output
- State-of-the-art agentic use among open-source models
- RL at unmatched efficiency: trained with just $534,700.
2. Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency
- Best-in-class instruction following
- Handles extreme physics
- Native 1080p
3. MiniMax Audio:
- Any prompt, any voice, any emotion
- Fully customizable and multilingual.
4. Hailuo Video Agent in Beta, Vibe Videoing with Zero-touch.
MiniMax plan to achieve end-to-end Hailuo Video Agent via 3 stages:
Stage 1: Prebuilt video Agent templates for high-quality creative videos. Users simply follow instructions and input text or images β with one click, a polished video is generated.
Stage 2: Semi-customizable video Agent. Users gain the flexibility to edit any part of the video creation process, from script to visuals to voiceover.
Stage 3: Fully autonomous, end-to-end video Agent. A complete, intelligent workflow that turns creative input into final-cut video with minimal manual effort.
This summer, team plan to gradually roll out Stage Two of Agent creation tools.
5. MiniMax Agent, a general intelligent agent designed to tackle long-horizon, complex tasks.
From expert-level multi-step planning to flexible task breakdown and end-to-end execution β itβs designed to act like a reliable teammate, with strengths in:
-Programming & tool use
-Multimodal understanding & generation
-Seamless MCP integration
1. open-sourcing LLM MiniMax-M1 β setting new standards in long-context reasoning.
- Worldβs longest context window: 1M-token input, 80k-token output
- State-of-the-art agentic use among open-source models
- RL at unmatched efficiency: trained with just $534,700.
2. Hailuo 02, World-Class Quality, Record-Breaking Cost Efficiency
- Best-in-class instruction following
- Handles extreme physics
- Native 1080p
3. MiniMax Audio:
- Any prompt, any voice, any emotion
- Fully customizable and multilingual.
4. Hailuo Video Agent in Beta, Vibe Videoing with Zero-touch.
MiniMax plan to achieve end-to-end Hailuo Video Agent via 3 stages:
Stage 1: Prebuilt video Agent templates for high-quality creative videos. Users simply follow instructions and input text or images β with one click, a polished video is generated.
Stage 2: Semi-customizable video Agent. Users gain the flexibility to edit any part of the video creation process, from script to visuals to voiceover.
Stage 3: Fully autonomous, end-to-end video Agent. A complete, intelligent workflow that turns creative input into final-cut video with minimal manual effort.
This summer, team plan to gradually roll out Stage Two of Agent creation tools.
5. MiniMax Agent, a general intelligent agent designed to tackle long-horizon, complex tasks.
From expert-level multi-step planning to flexible task breakdown and end-to-end execution β itβs designed to act like a reliable teammate, with strengths in:
-Programming & tool use
-Multimodal understanding & generation
-Seamless MCP integration
MiniMax
Building AGI with our mission Intelligence with Everyone. Global leader in multi-modal models and AI-native products with over 200 million users.
π₯73π71β€67π67
This media is not supported in your browser
VIEW IN TELEGRAM
The new Minimax Hailuo 02 has already arrived at Krea.ai.
π76π₯73π72β€69
This media is not supported in your browser
VIEW IN TELEGRAM
Minimax
Create Lifelike Audio - the fifth release from Minimax in the last 7 days
Text2Speech, Voice Clonining, Voice Design and even music generation (no miracles here, Suno is the best).
Create Lifelike Audio - the fifth release from Minimax in the last 7 days
Text2Speech, Voice Clonining, Voice Design and even music generation (no miracles here, Suno is the best).
π80π₯79π77β€63
Breaking! New Hailuo Video 02's Beats Google Veo 3 and Kling?
What if the best AI video model right nowβ¦ wasnβt from Google or OpenAI? Thereβs a new beast climbing the leaderboard, and it just knocked out Veo 3 and Kling 2.0. This is Hailuo Video 02 β and it's shaking up the whole AI video scene. Let me show you why creators are freaking out.
Get Hailuo 02 here and try their new video agents! 500 credits bonus to new signups!
Get 500 bonus credits https://hailuoai.video/?utm_source=YT&utm_medium=kol&utm_campaign=Vortex
https://www.youtube.com/watch?v=JOtM8UE0SKA
What if the best AI video model right nowβ¦ wasnβt from Google or OpenAI? Thereβs a new beast climbing the leaderboard, and it just knocked out Veo 3 and Kling 2.0. This is Hailuo Video 02 β and it's shaking up the whole AI video scene. Let me show you why creators are freaking out.
Get Hailuo 02 here and try their new video agents! 500 credits bonus to new signups!
Get 500 bonus credits https://hailuoai.video/?utm_source=YT&utm_medium=kol&utm_campaign=Vortex
https://www.youtube.com/watch?v=JOtM8UE0SKA
hailuoai.video
Hailuo AI: AI Video Generator from Text & Image
Turn text & images into great videos with Hailuo AI's video maker. Our tool also includes an AI image generator to create stunning posts, memes & more in seconds.
π74π57β€56π₯51
Media is too big
VIEW IN TELEGRAM
Hunyuan GameCraft β a neural game engine from Tencent
It looks a cut above Genie 2 and other competitors, while being much more interactive. It is based on Hunyuan Video, which was tuned using gameplay from more than a hundred AAA projects β Assassinβs Creed, Red Dead Redemption and Cyberpunk 2077. The result is appropriate β some games from the dataset can be easily recognized by the results of model generation.
It looks a cut above Genie 2 and other competitors, while being much more interactive. It is based on Hunyuan Video, which was tuned using gameplay from more than a hundred AAA projects β Assassinβs Creed, Red Dead Redemption and Cyberpunk 2077. The result is appropriate β some games from the dataset can be easily recognized by the results of model generation.
π₯186β€178π177π174
Suno - WavTool!
is a browser-based DAW (Digital Audio Workstation) software that combines professional music production features (VST plugin support, sample-accurate editing, real-time recording, and more) with native AI capabilities such as stem splitting, AI-generated MIDI files, and a built-in chatbot for real-time music editing.
is a browser-based DAW (Digital Audio Workstation) software that combines professional music production features (VST plugin support, sample-accurate editing, real-time recording, and more) with native AI capabilities such as stem splitting, AI-generated MIDI files, and a built-in chatbot for real-time music editing.
π₯182π177π174β€166
This media is not supported in your browser
VIEW IN TELEGRAM
New feature from Kling AI.
Video2Audio.
It seems that even free plans let you try it.
Video2Audio.
It seems that even free plans let you try it.
π159π150β€147π₯147
QWEN-VLO - generating images, videos, and editing everything that moves.
Qwen3 has been updated.
Its chat is all multimodal, accepting documents, images, videos, and even sound as input.
And it generates everything that moves. Including video, image analysis, and even brainstorming.
In general, Qwen VLO generates something like this.
Qwen3 has been updated.
Its chat is all multimodal, accepting documents, images, videos, and even sound as input.
And it generates everything that moves. Including video, image analysis, and even brainstorming.
In general, Qwen VLO generates something like this.
π382β€377π₯346π344
ByteDance has unveiled XVerse, a new text2image model that lets you manipulate multiple people and attributes in a single frame.
You enter a prompt, then adjust each object's personality, pose, style, or lighting without disturbing the rest of the scene.
Under the hood, it uses a history-aware DiT flow modulation pipeline to keep every face and object consistent, even in complex multi-subject layouts.
You enter a prompt, then adjust each object's personality, pose, style, or lighting without disturbing the rest of the scene.
Under the hood, it uses a history-aware DiT flow modulation pipeline to keep every face and object consistent, even in complex multi-subject layouts.
π51π₯49π49β€45
Sakana AI introduced Inference-Time Scaling and Collective Intelligence for Frontier AI
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 models by a large margin.
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution.
ARC-AGI-2 code.
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.
Implementation of AB-MCTS on GitHub.
sakana.ai
Sakana AI
Inference-Time Scaling and Collective Intelligence for Frontier AI
π120π93β€91π₯91
This media is not supported in your browser
VIEW IN TELEGRAM
Meanwhile, Krea.ai has released a Video Restyle tool
It looks fancy, but under the hood is just a new video-to-video model from LumaLabs.
It looks fancy, but under the hood is just a new video-to-video model from LumaLabs.
β€171π171π169π₯149
This media is not supported in your browser
VIEW IN TELEGRAM
This is the Minimax Hailuo 02. It looks like it's been fed every sports report since 1896.
π342β€340π309π₯303
Veo3 Quality and Veo3 Fast
Quality listens to the prompt. Fast is just in a hurry.
Quality listens to the prompt. Fast is just in a hurry.
β€308π₯286π279π264
This media is not supported in your browser
VIEW IN TELEGRAM
Baidu MuseSteamer
Another video generator.
It is deeply hidden in the Chinese jungle of a Chinese website and application.
It is called MuseSteamer - it can do 10 seconds (small ones on the website are 5), 1080P, any aspects, it is available in Turbo (available in beta now), Lite, Pro variations.
The main feature is Chinese lip sync (although the videos on the website are silent).
This is a development from Baidu
Judging by the examples, the quality of the model is the last generation. But on their website they also have a note that this is Model 1.0.
In general, we are waiting for them to go beyond China.
P.S. Does it seem to me alone that there are already more basic video models than basic models for generating images?
Another video generator.
It is deeply hidden in the Chinese jungle of a Chinese website and application.
It is called MuseSteamer - it can do 10 seconds (small ones on the website are 5), 1080P, any aspects, it is available in Turbo (available in beta now), Lite, Pro variations.
The main feature is Chinese lip sync (although the videos on the website are silent).
This is a development from Baidu
Judging by the examples, the quality of the model is the last generation. But on their website they also have a note that this is Model 1.0.
In general, we are waiting for them to go beyond China.
P.S. Does it seem to me alone that there are already more basic video models than basic models for generating images?
π257β€256π241π₯231
Media is too big
VIEW IN TELEGRAM
Hunyuan has released a new 3D generator.
Its name is Hunyuan3D-PolyGen.
This is definitely not the original version 2.1, and most likely an upgrade of version 2.5 (which is codeless).
Judging by the video, it looks pretty killer, but I wouldn't trust the videos.
They write that they made their own autoregressive model for retop and now you can play games or go to movies.
Now there are also 10,000+ polygons per model and increased generation accuracy.
https://3d.hunyuan.tencent.com/
Its name is Hunyuan3D-PolyGen.
This is definitely not the original version 2.1, and most likely an upgrade of version 2.5 (which is codeless).
Judging by the video, it looks pretty killer, but I wouldn't trust the videos.
They write that they made their own autoregressive model for retop and now you can play games or go to movies.
Now there are also 10,000+ polygons per model and increased generation accuracy.
https://3d.hunyuan.tencent.com/
β€240π₯226π223π210
This media is not supported in your browser
VIEW IN TELEGRAM
Well, finally Google Flow has been rolled out to almost the entire world, including Europe. Now you don't have to use VPN
https://labs.google/fx/tools/flow
In the video, it's a lipsync on the initial photo.
https://labs.google/fx/tools/flow
In the video, it's a lipsync on the initial photo.
π249β€230π₯230π228
Musk and the xAI team unveiled Grok 4, a new AI model built on the Colossus supercomputer with 200,000 GPUs, allowing it to perform 10x more RL computations than its competitors.
It leads the Artificial Analysis Intelligence Index with a score of 73, ahead of OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 (68).
Here's everything you need to know about it:
1. Grok 4 comes in 2 versions:
- Base - a single-agent model, fast and versatile, achieves 35% on Humanity's Last Exam (HLE) and 45% with additional computation.
- Grok 4 Heavy - a multi-agent version, where several AIs work as a "training group" to improve accuracy (50.7% on HLE). Needs some work, but is more powerful.
2. Grok 4 can generate full-fledged video games based on text queries, including code, graphics, and mechanics. The model also analyzes the "fun" of games, assessing their attractiveness. Gaming capabilities are still limited to prototypes, and complex projects require further development.
It leads the Artificial Analysis Intelligence Index with a score of 73, ahead of OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 (68).
Here's everything you need to know about it:
1. Grok 4 comes in 2 versions:
- Base - a single-agent model, fast and versatile, achieves 35% on Humanity's Last Exam (HLE) and 45% with additional computation.
- Grok 4 Heavy - a multi-agent version, where several AIs work as a "training group" to improve accuracy (50.7% on HLE). Needs some work, but is more powerful.
2. Grok 4 can generate full-fledged video games based on text queries, including code, graphics, and mechanics. The model also analyzes the "fun" of games, assessing their attractiveness. Gaming capabilities are still limited to prototypes, and complex projects require further development.
π₯144β€128π123π116
Black Forrest is increasingly following the path of Pika Labs or Higgsfield.
They launched "Kontext Komposer" and "Kontext-powered Presets
Now you can upload your image and ask to change it by simply selecting a preset, rather than writing out a complex prompt. A one-button solution for those who don't know how\don't want to make prompts.
It's clear that under the hood there is some kind of system prompt for each preset (which LLM I don't know). So smart people first dug up all these prompts, then screwed them into Komfi and even tried them in other generators\chats.
They launched "Kontext Komposer" and "Kontext-powered Presets
Now you can upload your image and ask to change it by simply selecting a preset, rather than writing out a complex prompt. A one-button solution for those who don't know how\don't want to make prompts.
It's clear that under the hood there is some kind of system prompt for each preset (which LLM I don't know). So smart people first dug up all these prompts, then screwed them into Komfi and even tried them in other generators\chats.
π₯521β€517π497π487