NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
ByteDance, the company behind TikTok, has launched its latest AI-powered human centric video generation model.

Traditional video generation models struggle to sync multiple input types such as Text, Image and Audio, so HuMo by ByteDance is rewriting a new innovation in AI-powered video generation.
Imagine creating realistic human videos with:
🎬 Preserved character identity across scenes
🎤 Synced motion & lip-movement flawlessly with audio
🖼 Blended text, images, and sound into fine-grained, controllable clips

In our latest guide, we dive into the detailed yet to-the-point steps to setup this model on NodeShift GPU environment and generate lifelike cinematic clips.
The generation took longer than what we assumed, do you think the results are worth it?
🔗 Dive in here to see: https://nodeshift.cloud/blog/create-lifelike-human-videos-with-ai-a-guide-to-run-humo-by-bytedance?utm_source=telegram&utm_medium=social&utm_campaign=humo_launch
1
Introducing Tongyi DeepResearch (30B-A3B) – Alibaba’s Breakthrough in Agentic AI

Tongyi DeepResearch (30B-A3B) is a 30-billion parameter Mixture-of-Experts (MoE) model developed by Alibaba Tongyi Lab, with only 3B active parameters per token for efficiency. Unlike general-purpose LLMs, it is purpose-built for deep, long-horizon information-seeking tasks, and it sets new state-of-the-art results across multiple benchmarks like:
Humanity’s Last Exam
BrowserComp & BrowserComp-ZH
WebWalkerQA
GAIA
xbench-DeepSearch
FRAMES

On these benchmarks, Tongyi DeepResearch consistently outperforms other leading models like GLM 4.5, DeepSeek V3.1, Kimi Researcher, Claude-4-Sonnet, and even OpenAI’s DeepResearch agents.

We’ve just published a step-by-step guide on how to install and run Tongyi DeepResearch (30B-A3B) locally or on cloud GPU.

What’s inside the guide?
Model introduction & benchmark results
Complete GPU configuration table (from entry-level to multi-GPU heavy setups)
Step-by-step process to install, set up, and run DeepResearch on NodeShift GPU VMs
Hugging Face authentication & checkpoint download instructions
Running inference in both ReAct-style and Heavy IterResearch mode

If you’re into agentic reasoning models, research agents, and long-horizon information-seeking AI, this guide is a must-read.

Check out the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-alibaba-tongyi-deepresearch-locally
1🔥1
mmBERT is a modern multilingual encoder (~307M params) trained on 3T+ tokens across 1,800+ languages. Built on the ModernBERT family, it delivers 8K context, fast inference, and state-of-the-art cross-lingual performance for classification, embeddings, retrieval, and reranking—with training tricks like inverse mask scheduling and progressive language addition that especially boost low-resource languages.

We’ve just published a step-by-step guide on how to install and run mmBERT-base locally.

What’s inside the guide
Sanity-check script to validate GPU, dtype, and tokenizer
FastAPI microservice exposing /embed and /mlm endpoints
Streamlit UI for interactive embeddings + masked-LM demos (CSV download included)
GPU sizing cheat sheet: practical VRAM + batch sizes for 512–8K tokens (inference & fine-tuning)
Clear, copy-paste setup for Ubuntu + CUDA, PyTorch, and all Python deps

Who’s it for
Teams adding multilingual search & retrieval (FAISS/pgvector/Milvus)
Builders prototyping classification/reranking on real data
Anyone needing a fast, reliable multilingual encoder with 8K context

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-mmbert-base-locally
2
Struggling with extracting accurate data from complex documents?
In a world where documents are packed with equations, tables, multilingual text, and complex layouts, simple extraction tools just don’t cut it anymore.

IBM's new Granite Docling is an all-rounder in document intelligence. This OCR is a sophisticated multi-modal AI with:
- Precision equation & inline math recognition
- Flexible full-page & region-based inference
- Document-structure QA
- Experimental multilingual support
- Improved stability & reduced loop errors

If you’re handling dense research papers, financial reports, or global documents for data annotation tasks, Granite Docling is built to deliver clarity from complexity. And with NodeShift, deploying and scaling this model is seamless, secure, and production-ready.

Dive into our step-by-step guide on installing & running Granite Docling:
🔗 https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-docling-ocr-for-advanced-document-analysis?utm_source=telegram&utm_medium=social&utm_campaign=granite_docling_launch
1🔥1
Who said small models can’t think big?
Magistral Small 1.2 by Mistral AI has 24B params, multimodal reasoning (text + vision), multilingual support and a 128k context window into a setup you can run locally on a single H100 or even your own GPU-enabled environments.

What’s new in Magistral Small 1.2?
- Vision encoder → reason over images + text
- [THINK] tokens → transparent reasoning traces
- Multilingual support → dozens of languages out of the box
- Smarter formatting + fewer generation loops
- Faster, cleaner, more reliable responses

We’ve put together a step-by-step install guide with copy-paste ready snippets so you can get it running in minutes. If you want to try serious reasoning power without the heavyweight baggage, this is it.

🔗 Full Guide here: https://nodeshift.cloud/blog/how-to-install-and-run-magistral-small-1-2-by-mistral-ai?utm_source=telegram&utm_medium=social&utm_campaign=blog_share
2
Jina Code Embeddings 1.5B is a lightweight yet surprisingly powerful code embedding model—built on Qwen2.5-Coder-1.5B—purpose-tuned for developer workflows. Instead of generic text semantics, it captures the structure and intent of real code across 15+ languages, enabling accurate NL→Code, Code→Code, Code→NL, completion retrieval, and technical QA. It supports 32k tokens for long files, uses last-token pooling, and pairs seamlessly with FlashAttention-2 or SDPA for fast inference.

We’ve just published a new step-by-step guide showing how to run and evaluate the model end-to-end on a GPU VM — from zero to meaningful retrieval results.

What’s inside the guide
GPU sizing & configs (Entry → Enterprise), with practical batch/seq-length tips
Environment setup on a clean CUDA image (Python 3.10, venv, drivers)
Hugging Face auth and dependency installs (Torch, Sentence-Transformers, optional FlashAttention-2)
Two test scripts:
- for a quick sanity check (NL→Code)
- for stress testing across nl2code, code2code, code2nl, code2completion, and QA with distractors
Matryoshka embeddings: try 128–1536 dims and see ranking stability vs storage/speed
Attention backends: flip between FlashAttention-2 and SDPA for the best fit to your hardware
Troubleshooting notes (dtype, padding side, FA2 install, common pitfalls)

If you’re building code search, RAG for repos, or dev tooling, this model hits the sweet spot: cost-efficient, long-context (32k), and flexible via Matryoshka dims — scale from laptop to cluster with simple config tweaks.

Check the full guide here: https://nodeshift.cloud/blog/how-to-install-run-jina-code-embeddings-1-5b-locally
🔥21
Imagine cloning a voice in seconds - tone, accent, rhythm, emotions and all.

That’s what VoxCPM by OpenBMB delivers. It doesn’t rely on tokenization like traditional TTS. Instead, it generates speech in a continuous space, producing output that feels fluid, expressive, and true to life.

With just a short audio clip, VoxCPM can replicate a speaker’s voice with striking accuracy - while also adapting style to match the text’s context. Pair that with real-time synthesis and easy deployment on NodeShift Cloud, and you’ve got one of the most powerful TTS + voice cloning tools available today.

Learn how to install & run it here:
🔗 https://nodeshift.cloud/blog/how-to-install-and-run-voxcpm-realistic-tts-voice-cloning-in-minutes?utm_source=telegram&utm_medium=social&utm_campaign=blog_share
🔥21
Qwen is coming with another model then—meet Qwen3-Omni-30B-A3B-Instruct.

A multilingual, any-to-any omni-modal MoE that understands text, images, audio, and video—and can speak back in natural speech in real time via its native Thinker–Talker design. It pairs long-context reasoning with state-of-the-art ASR/AV, while maintaining strong text & vision performance, and runs smoothly on Transformers or vLLM. Perfect for voice/chat agents, AV understanding, and multimodal RAG.

We just published a step-by-step guide to run this multilingual, any-to-any omni-modal MoE locally/on a NodeShift GPU VM. Qwen3-Omni ingests text, image, audio, and video—and streams back text or natural speech in real time via its native Thinker–Talker design.

What’s inside the guide:
GPU VM setup on NodeShift + quick VRAM tips
Python 3.11 venv and pip setup
Install Torch, Transformers, Qwen Omni Utils, FFmpeg
Ready-to-run script (SDPA; image+audio+text → text/speech)
Troubleshooting + next steps (vLLM, Thinking variant)

Check the full guide here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-omni-30b-a3b-instruct-locally
3
Bring Your Wildest Animation Ideas to Life with Wan2.2 Animate!

From complex motions to precise cinematic aesthetics, Wan2.2 Animate 14B enables creators and enterprises to generate realistic character animations and expressive videos effortlessly.

In our latest guide, we walk you step-by-step on installing and running Wan2.2 Animate 14B, locally or on GPU-accelerated environments like NodeShift Cloud, so you can start generating stunning AI-powered animated videos right there in your machine in no time.

🔗 Check out the full guide: https://nodeshift.cloud/blog/a-step-by-step-guide-to-generating-animated-ai-videos-with-wan2-2-animate?utm_source=telegram&utm_medium=social&utm_campaign=wan2_animate_launch
1
Qwen launches another powerful model — Qwen3Guard-Gen-8B!

Qwen3Guard-Gen-8B is not your typical moderation tool. Built on Qwen3 and trained on 1.19M prompt–response pairs, it goes beyond binary classification by:
Delivering a 3-tier verdict (Safe / Controversial / Unsafe)
Tagging across 10+ categories (Violent, PII, Jailbreak, Political Misinformation, etc.)
Supporting 119 languages
Handling both prompt & response checks
Scaling to 32K context length for real-time deployments

We’ve just published a step-by-step guide to help you install & run Qwen3Guard-Gen-8B on a GPU-powered VM.

What we cover in this guide:
How to spin up a GPU VM on NodeShift
Setting up with the Jupyter template for a ready-to-go environment
Installing Torch + Hugging Face stack & verifying CUDA/GPU
Authenticating with Hugging Face & loading Qwen3Guard-Gen-8B
Running prompt and response moderation checks with parsed outputs
Stress-testing with 25 tricky cases (violence, PII, jailbreak, obfuscation, etc.)

Full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-qwen3guard-gen-8b-locally
1🔥1
Qwen launches another heavyweight multimodal model — Qwen3-VL-235B-A22B-Instruct

Meet Qwen3-VL-235B-A22B-Instruct: a MoE vision-language model with ~235B total params and ~22B active per token. It’s built for image/video + text reasoning, tool-use & visual agents, and long-context understanding (native 256K, extendable).

Highlights: strong OCR (32 langs), robust spatial/temporal grounding for long videos, visual coding (Draw io/HTML/CSS/JS from media), and architectural upgrades like Interleaved-MRoPE, DeepStack, and text–timestamp alignment. Optimized for FlashAttention-2 in multi-image/video workloads.

We’ve just published a step-by-step guide to get Qwen3-VL-235B-A22B-Instruct running on a GPU VM (NodeShift or your cloud of choice).

What the guide covers
Spinning up a GPU VM (H100/A100/H200 tiers) and verifying CUDA + GPU
Installing the vision-language stack (PyTorch, latest Transformers, decord/av)
Optional FlashAttention-2 install for speed + VRAM wins
HF auth + loading Qwen/Qwen3-VL-235B-A22B-Instruct with Qwen3VLMoeForConditionalGeneration
Ready-to-run image & short-video inference cells (with practical VRAM tips, paged-KV, quant notes)

Checkout the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-235b-a22b-instruct-locally
1🔥1
DeepSeek-V3.1-Terminus is here - and it’s a next-level AI powerhouse for reasoning, coding, and agentic tasks!

With this latest update from DeepSeek AI, you get:
⚡️ Smarter Reasoning & Tool Use → Optimized Code & Search Agents
🧠 Consistent Multilingual Output → Fewer mixed-language errors
🛠 Enhanced Agent Templates → Context-aware searches & actions
📊 Benchmark Improvements → Higher scores across reasoning & agentic tasks
💡GGUF Quantized Version → Faster, lighter, and easier to run locally

We’ve made it super easy to get started: our guide walks you through installing & running DeepSeek-V3.1 Terminus GGUF locally with LLaMA.cpp, setting up CUDA acceleration, and leveraging OpenAI-compatible APIs - all while leveraging NodeShift cloud for seamless deployment.

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-and-run-deepseek-v3-1-terminus-gguf?utm_source=telegram&utm_medium=social&utm_campaign=deepseek-v3-1-launch
2
Introducing Isaac 0.1 — the first open-source perceptive-language model built for the physical world by Perceptron AI.

Isaac-0.1 is a ~2.6B VLM that does grounded spatial reasoning (pointing/boxes), reads fine detail (OCR), and adapts to new visual tasks with a few in-prompt examples—no detector re-training. It runs comfortably on a single 12–24 GB GPU (even smaller with 4/8-bit).

We’ve just published a hands-on guide to get Isaac-0.1 running on a GPU VM (NodeShift or any cloud), complete with a working demo and visualization.

What’s inside the guide
GPU sizing cheat-sheet (4-bit / 8-bit / FP16) with realistic VRAM targets & token budgets
Environment setup: CUDA-ready PyTorch, deps, and a clean Python venv
Minimal inference script using AutoProcessor + tensor_stream (image + prompt)
Grounded outputs → visuals: parse <point_box>/<point> and draw boxes/points; export JSON
Quantization options (bitsandbytes 4-bit/8-bit) and FlashAttention-2 notes
Troubleshooting: OOM fixes, attention-mask warnings, pinning revisions
Bonus workflow: connect your VM to VS Code/Cursor for a smooth dev loop

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-isaac-0-1locally
1🔥1
What if an AI model could see, hear, speak, and understand, all at once?

That’s exactly what Qwen3-Omni-Thinking delivers: a foundation model that combines text, images, audio, and video into one seamless, real-time experience. It’s multilingual, lightning-fast, and sets state-of-the-art benchmarks across speech, vision, and multimodal tasks.

With NodeShift, you can install, run, and experiment with Qwen3-Omni-Thinking instantly, unlocking its cookbooks for speech recognition, video analysis, OCR, audio captioning, and more.
🔗 Dive here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-qwen3-omni-thinking?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-omni-thinking
🔥21
Image editing isn’t about filters & photoshops anymore - it’s about control, coherence, & realism. Well, Qwen's latest Qwen-Image-Edit-2509 delivers all three!

What’s new in 2509 upgrade?
- Multi-image editing → Seamlessly combine up to 3 images (person + person, person + product, person + scene).

- Enhanced single-image consistency → Preserve faces, products, and even text styles with stunning accuracy.

- Native ControlNet support → Depth maps, edge maps, keypoints & more for unmatched editing control.

With NodeShift, you can run Qwen-Image-Edit-2509 effortlessly - no messy setup, no complex infra headaches, just private, scalable, and affordable GPU power at your fingertips.
Ready to see what next-level AI image editing looks like?
🔗 Read our step-by-step guide here: https://nodeshift.cloud/blog/a-guide-to-precise-ai-image-editing-with-qwen-image-edit-2509?utm_source=telegram&utm_medium=social&utm_campaign=qwen_image_edit_2509
🔥21
MiMo-Audio-7B-Instruct is Xiaomi’s instruction-tuned audio language model that handles any-to-any tasks across speech and text — from ASR, TTS, and audio understanding to voice conversion, continuation, and style transfer.

Trained on 100M+ hours of audio, it achieves open-source SOTA on speech intelligence benchmarks, while the Instruct variant adds robust “thinking” for both understanding and generation.

In our latest guide, we walk you through a step-by-step process to get MiMo-Audio-7B-Instruct running locally on a GPU VM with CUDA 12, FlashAttention, and Gradio UI:
Setting up a NodeShift GPU VM (or any cloud provider)
Installing Python 3.11+ and dependencies
Configuring PyTorch with CUDA 12.4 wheels
Enabling FlashAttention for speedups
Running the Gradio demo and accessing it via SSH port forwarding
Interacting with the WebRTC interface for real-time ASR/TTS

This setup gives you a fast, privacy-friendly playground for audio tasks—whether you’re building research pipelines, testing speech-to-speech loops, or experimenting with style transfer.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-mimo-audio-7b-instruct-locally
🔥41
Last time we shared a step-by-step installation guide for setting up the K2-Think model locally.

This time, we’re taking it further → we just published a brand-new AI Agent Building Guide powered by K2-Think, a 32B reasoning model created by UAE’s MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) and G42.

K2-Think is designed for tough reasoning tasks in math, code, and science. It ranks high on benchmarks like AIME, HMMT, and LiveCodeBench, making it a powerful open-weights alternative for advanced problem solving.

What’s inside this new guide:
Building a Math Dueler Agent with two proposers + one referee.
Setting up environment & dependencies.
Writing modular agent scripts.
Integrating Sympy for math verification.
Wrapping everything in a clean Gradio interface.
Launching the app locally on your GPU VM.

Already covered setup & installation? Perfect. Jump straight into this agent guide.

Link: https://nodeshift.cloud/blog/building-a-math-dueler-agent-with-k2-think-step-by-step-guide

Also worth noting → K2-Think is available on NodeShift Sovereign Cloud and NodeShift AI, making it easy to run on trusted infrastructure.
🔥21
Create complete, creative, intelligent visuals with just a simple text-prompt with Tencent's latest HunyuanImage 3.0.

With an 80B Mixture-of-Experts engine and a unified autoregressive framework, it delivers photorealistic, fine-grained images that don’t just follow the prompt, but also reason with them. Sparse prompt? No problem. This model fills in the gaps with world knowledge to produce visuals that feel intentional, accurate, and breathtakingly real.

With NodeShift Cloud’s one-stop GPU platform, you can set up and run HunyuanImage 3.0 effortlessly, skipping the hardware headaches while scaling creativity on demand.

🔗 Checkout our step-by-step guide: https://nodeshift.cloud/blog/how-to-install-and-run-hunyuanimage-3-0?utm_source=telegram&utm_medium=social&utm_campaign=hunyuanimage3
🔥21
Tencent just released something crazy — and we built a full guide around it!

Introducing Hunyuan3D-Omni — Tencent’s newest unified image-to-3D generation framework.

This isn't your average text-to-3D tool. Omni lets you control the generation process with:
Point Clouds
Voxels
3D Bounding Boxes
Skeletal Poses

All through a single control encoder, with options like EMA for smoother results and FlashVDM for faster inference. Runs perfectly with just 10–12 GB VRAM.

In this step-by-step guide, we’ve covered:
GPU requirements
How to set it up on a NodeShift GPU VM
Exact commands to run point, voxel, bbox, and pose-controlled generation
Output formats, inference tips, and more!

Whether you're in gaming, research, or 3D design — this model is worth a spin.

Check out the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-hunyuan3d-omni-locally
1🔥1
GLM 4.6, the latest release from Zai Org is an AI model that reasons, codes, and acts with unmatched power against some well known names like DeepSeek V3.1 Terminus and Claude Sonnet 4 .

Built on the next-gen GLM-4.6 foundation, it brings:
- 200K token context window – tackle complex tasks like never before
- Superior coding & agent performance – from Claude Code to Roo Code
- Advanced reasoning & tool use – stronger, smarter, more capable agents
- Refined human-aligned writing – natural style and role-playing scenarios

Our latest publish walks you through how to install & run GLM-4.6 locally or on GPU-accelerated environments with copy-paste ready steps.

🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-glm-4-6?utm_source=telegram&utm_medium=social&utm_campaign=glm46_launch
3
Kwaipilot just released KAT-Dev-32B — a powerful open-source coding assistant

KAT-Dev-32B (Kwaipilot/KAT-Dev) is a 32.8B-parameter coding assistant based on Qwen3-32B, purpose-tuned for software engineering tasks.

It’s trained in three stages — mid-training (core skills), SFT + RFT (teacher trajectories), and large-scale agentic RL (prefix caching + trajectory pruning + scalable infra).

On SWE-Bench Verified, KAT-Dev-32B achieves comparable performance with 62.4% resolved and ranks 5th among all open-source models with different scales.

We just published a step-by-step guide on how to set up and run KAT-Dev-32B on a GPU-powered NodeShift VM.

In this guide, we cover:
GPU configuration requirements (single-GPU, multi-GPU, quantized setups)
Step-by-step process to launch a NodeShift GPU VM
Setting up JupyterLab with CUDA & PyTorch ready-to-go
Installing libraries (Torch, Transformers, Accelerate, Einops)
Running KAT-Dev interactively inside a notebook
Generating your first response with the model

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-kat-dev-locally
1🔥1