Higgsfield keeps dropping viral features.
Product-to-Video is basically Flux Context, but for video.
Something similar was done by Pika and Runway, but Higgsβ cherry-picks look really polished.
Product-to-Video is basically Flux Context, but for video.
Something similar was done by Pika and Runway, but Higgsβ cherry-picks look really polished.
π409π₯408β€402π399
Qwen Edit vs Nano Banana vs Flux Kontext Pro & Flux Kontext Dev
Prompt: Turn the motorcycle pink and put it against the backdrop of a big city at night, glowing with huge neon signs.
Banano really delivers! π
Prompt: Turn the motorcycle pink and put it against the backdrop of a big city at night, glowing with huge neon signs.
Banano really delivers! π
π632π594β€593π₯530
Virtual fitting room on VideoX-Fun / Wan2.1-I2V-14B
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, thereβs also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on β here it is.
https://vivocameraresearch.github.io/magictryon/
Qwen2.5-VL-7B-Instruct is used for clothing description.
And under the hood, thereβs also OpenPose, DensePose, and more.
If anyone wanted to fine-tune WAN 2.1 for virtual try-on β here it is.
https://vivocameraresearch.github.io/magictryon/
β€873π820π₯794π773
This media is not supported in your browser
VIEW IN TELEGRAM
Runway Game Worlds
The name is a bit misleading.
Itβs more like Runway Comics Worlds or even Runwayβs Board Games.
Because it goes back to the roots β text-based control. Itβs basically text adventures: you write a prompt, the game reacts, but also generates an image of whatβs happening.
Text games without the need for your imagination.
https://play.runwayml.com/
The name is a bit misleading.
Itβs more like Runway Comics Worlds or even Runwayβs Board Games.
Because it goes back to the roots β text-based control. Itβs basically text adventures: you write a prompt, the game reacts, but also generates an image of whatβs happening.
Text games without the need for your imagination.
*βGame Worlds uses new AI technologies for nonlinear storytelling. This means that each game session you play is generated in real time with personalized stories, characters, and multimodal media.
In the beta version, you can play both pre-made text adventures and create your own.β*
https://play.runwayml.com/
π58β€56π55π₯49
Feel the difference between Nanabanana and other AI generators.
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and donβt touch anything else at all
One of the prompts on a picture was: 'make only the plate and the soup itself in the style of 2D anime, and donβt touch anything else at all
β€969π₯932π915π891
This media is not supported in your browser
VIEW IN TELEGRAM
VibeVoice: a new text-to-speech (TTS) model for long-form conversations with multiple voices from Microsoft.
β’ 1.5B parameters
β’ MIT licensed
β’ Up to 1.5 hours of generation
β’ Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1β2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ β lots of examples.
Youβll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
β’ 1.5B parameters
β’ MIT licensed
β’ Up to 1.5 hours of generation
β’ Strong emotional expressiveness
More details: VibeVoice is a new framework designed for creating expressive and extended audio recordings of conversations with multiple speakers (such as podcasts) from text. It addresses key issues of traditional text-to-speech (TTS) systems, particularly those related to scalability, speaker consistency, and natural turn-taking.
The model can synthesize up to 90 minutes of speech with up to 4 distinct speakers, exceeding the typical limitations of many previous models restricted to 1β2 speakers.
Project page: https://microsoft.github.io/VibeVoice/ β lots of examples.
Youβll find the weights, code, and even a Gradio demo here: https://86636c494bbddc69c7.gradio.live/
π₯146π133π125β€116