Qwen Edit vs Nano Banana vs Flux Kontext Pro & Flux Kontext Dev
Prompt: Turn the motorcycle pink and put it against the backdrop of a big city at night, glowing with huge neon signs.
Banano really delivers! ๐
Prompt: Turn the motorcycle pink and put it against the backdrop of a big city at night, glowing with huge neon signs.
Banano really delivers! ๐
๐632๐594โค593๐ฅ530
Virtual fitting room on VideoX-Fun / Wan2.1-I2V-14B
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, thereโs also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on โ here it is.
https://vivocameraresearch.github.io/magictryon/
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, thereโs also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on โ here it is.
https://vivocameraresearch.github.io/magictryon/
โค873๐820๐ฅ794๐773
This media is not supported in your browser
VIEW IN TELEGRAM
Runway Game Worlds
The name is a bit misleading.
Itโs more like Runway Comics Worlds or even Runwayโs Board Games.
Because it goes back to the roots โ text-based control. Itโs basically text adventures: you write a prompt, the game reacts, but also generates an image of whatโs happening.
Text games without the need for your imagination.
https://play.runwayml.com/
The name is a bit misleading.
Itโs more like Runway Comics Worlds or even Runwayโs Board Games.
Because it goes back to the roots โ text-based control. Itโs basically text adventures: you write a prompt, the game reacts, but also generates an image of whatโs happening.
Text games without the need for your imagination.
*โGame Worlds uses new AI technologies for nonlinear storytelling. This means that each game session you play is generated in real time with personalized stories, characters, and multimodal media.
In the beta version, you can play both pre-made text adventures and create your own.โ*
https://play.runwayml.com/
๐58โค56๐55๐ฅ49
Feel the difference between Nanabanana and other AI generators.
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and donโt touch anything else at all
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and donโt touch anything else at all
โค969๐ฅ932๐915๐891
This media is not supported in your browser
VIEW IN TELEGRAM
VibeVoice: a new text-to-speech (TTS) model for long-form conversations with multiple voices from Microsoft.
โข 1.5B parameters
โข MIT licensed
โข Up to 1.5 hours of generation
โข Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1โ2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ โ lots of examples.
Youโll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
โข 1.5B parameters
โข MIT licensed
โข Up to 1.5 hours of generation
โข Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1โ2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ โ lots of examples.
Youโll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
๐ฅ146๐133๐125โค116
Examples of applications that can be built on top of Nanabananaโor, as it is now officially called: gemini-2.5-flash-image-preview.
This is done in Google AI Studio, and you can check out examples here: https://aistudio.google.com/apps
What really impressed me was โGemini Co-Drawingโ, which demonstrates the multimodal modelโs ability to read hand-drawn diagrams, perform calculations, and follow complex editing instructions.
All of this is available at the link above.
And you can read more about development and pricing here: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
This is done in Google AI Studio, and you can check out examples here: https://aistudio.google.com/apps
What really impressed me was โGemini Co-Drawingโ, which demonstrates the multimodal modelโs ability to read hand-drawn diagrams, perform calculations, and follow complex editing instructions.
All of this is available at the link above.
And you can read more about development and pricing here: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/
๐573๐ฅ569โค552๐540