Virtual fitting room on VideoX-Fun / Wan2.1-I2V-14B
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, there’s also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on — here it is.
https://vivocameraresearch.github.io/magictryon/
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, there’s also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on — here it is.
https://vivocameraresearch.github.io/magictryon/
❤873🎉820🔥794👍773
This media is not supported in your browser
VIEW IN TELEGRAM
Runway Game Worlds
The name is a bit misleading.
It’s more like Runway Comics Worlds or even Runway’s Board Games.
Because it goes back to the roots — text-based control. It’s basically text adventures: you write a prompt, the game reacts, but also generates an image of what’s happening.
Text games without the need for your imagination.
https://play.runwayml.com/
The name is a bit misleading.
It’s more like Runway Comics Worlds or even Runway’s Board Games.
Because it goes back to the roots — text-based control. It’s basically text adventures: you write a prompt, the game reacts, but also generates an image of what’s happening.
Text games without the need for your imagination.
*“Game Worlds uses new AI technologies for nonlinear storytelling. This means that each game session you play is generated in real time with personalized stories, characters, and multimodal media.
In the beta version, you can play both pre-made text adventures and create your own.”*
https://play.runwayml.com/
👍58❤56🎉55🔥49
Feel the difference between Nanabanana and other AI generators.
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and don’t touch anything else at all
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and don’t touch anything else at all
❤969🔥932👍915🎉891
This media is not supported in your browser
VIEW IN TELEGRAM
VibeVoice: a new text-to-speech (TTS) model for long-form conversations with multiple voices from Microsoft.
• 1.5B parameters
• MIT licensed
• Up to 1.5 hours of generation
• Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1–2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ — lots of examples.
You’ll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
• 1.5B parameters
• MIT licensed
• Up to 1.5 hours of generation
• Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1–2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ — lots of examples.
You’ll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
🔥146🎉133👍125❤116
Examples of applications that can be built on top of Nanabanana—or, as it is now officially called: gemini-2.5-flash-image-preview.
This is done in Google AI Studio, and you can check out examples here: https://aistudio.google.com/apps
What really impressed me was “Gemini Co-Drawing”, which demonstrates the multimodal model’s ability to read hand-drawn diagrams, perform calculations, and follow complex editing instructions.
All of this is available at the link above.
And you can read more about development and pricing here: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
This is done in Google AI Studio, and you can check out examples here: https://aistudio.google.com/apps
What really impressed me was “Gemini Co-Drawing”, which demonstrates the multimodal model’s ability to read hand-drawn diagrams, perform calculations, and follow complex editing instructions.
All of this is available at the link above.
And you can read more about development and pricing here: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
👍573🔥569❤552🎉540
This media is not supported in your browser
VIEW IN TELEGRAM
I love Higgsfield’s promos.
Their PR guy is absolutely wild, of course.
It’s clear they got Nanabanana. But while on Freepik it’s only for paid users, with Higgsfield it goes like this:
Unlimited FREE Nano Banana.
With Higgsfield presets coming SOON.
Their PR guy is absolutely wild, of course.
It’s clear they got Nanabanana. But while on Freepik it’s only for paid users, with Higgsfield it goes like this:
Unlimited FREE Nano Banana.
With Higgsfield presets coming SOON.
🔥61🎉61❤49👍44
This media is not supported in your browser
VIEW IN TELEGRAM
Real-time video generation from Krea.ai
Krea has opened a waitlist for real-time video generation.
12+ fps. You can input a prompt, an image, a screen capture, or even a webcam feed.
You might remember that Krea was the first startup to launch a real-time drawing tool — real-time image generation (there was also Vizcom).
Now they’ve taken a “world model” (unclear whose/which one) and built this kind of “pre-render” of that world.
It looks killer.
https://www.krea.ai/blog/announcing-realtime-video
Krea has opened a waitlist for real-time video generation.
12+ fps. You can input a prompt, an image, a screen capture, or even a webcam feed.
You might remember that Krea was the first startup to launch a real-time drawing tool — real-time image generation (there was also Vizcom).
Now they’ve taken a “world model” (unclear whose/which one) and built this kind of “pre-render” of that world.
It looks killer.
https://www.krea.ai/blog/announcing-realtime-video
🔥259❤251👍245🎉241
MovieFloAI
I found this kind of AI all-in-one tool for video production.
From Twitter: MovieFloAI is an AI-based workflow created by veterans of Lucasfilm and ILM, designed to turn ideas into cinematic videos through an intuitive process.
The process looks like this:
• Write your idea
• Create a synopsis and script
• Choose actors
• Make a storyboard
• Generate images and videos
• Edit and export the final cut
So basically, they’re promising not just generation, but a full end-to-end process.
They’re currently accepting sign-ups for the beta, so you can join in.
The ILM mention is, of course, marketing — but it still works!
Oh, and they have “Nanabanana” free for a week, but you’ll need to register.
https://app.moviefloai.com/ https://x.com/MovieFloAI
I found this kind of AI all-in-one tool for video production.
From Twitter: MovieFloAI is an AI-based workflow created by veterans of Lucasfilm and ILM, designed to turn ideas into cinematic videos through an intuitive process.
The process looks like this:
• Write your idea
• Create a synopsis and script
• Choose actors
• Make a storyboard
• Generate images and videos
• Edit and export the final cut
So basically, they’re promising not just generation, but a full end-to-end process.
They’re currently accepting sign-ups for the beta, so you can join in.
The ILM mention is, of course, marketing — but it still works!
Oh, and they have “Nanabanana” free for a week, but you’ll need to register.
https://app.moviefloai.com/ https://x.com/MovieFloAI
👍277❤274🔥271🎉269
There’s some buzz in the news about a presentation generator from Kimi AI. Although Kimi AI is actually a solid Chinese LLM, it feels a bit odd to see such a sub-product from them.
Nobody mentions that the entire product interface is in Chinese — but you can sign up with Google and you’ll get a familiar prompt input window with the letters “PPT” framed by Chinese characters.
I asked it to make a presentation about cats:
Pros:
• Writes fairly meaningful texts
• Lets you choose templates
• Templates don’t look as awful as those from many other slide generators
Cons:
• About a quarter of the backgrounds contain Chinese characters
• Images are generated stylishly and match the template, but have NOTHING to do with the presentation topic (the random cactuses were especially funny)
Conclusion: Not usable.
https://www.kimi.com/kimiplus/cvvm7bkheutnihqi2100
Nobody mentions that the entire product interface is in Chinese — but you can sign up with Google and you’ll get a familiar prompt input window with the letters “PPT” framed by Chinese characters.
I asked it to make a presentation about cats:
Pros:
• Writes fairly meaningful texts
• Lets you choose templates
• Templates don’t look as awful as those from many other slide generators
Cons:
• About a quarter of the backgrounds contain Chinese characters
• Images are generated stylishly and match the template, but have NOTHING to do with the presentation topic (the random cactuses were especially funny)
Conclusion: Not usable.
https://www.kimi.com/kimiplus/cvvm7bkheutnihqi2100
👍74🔥59🎉58❤57