NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
DeepSeek-V3.1-Terminus is here - and it’s a next-level AI powerhouse for reasoning, coding, and agentic tasks!

With this latest update from DeepSeek AI, you get:
⚡️ Smarter Reasoning & Tool Use → Optimized Code & Search Agents
🧠 Consistent Multilingual Output → Fewer mixed-language errors
🛠 Enhanced Agent Templates → Context-aware searches & actions
📊 Benchmark Improvements → Higher scores across reasoning & agentic tasks
💡GGUF Quantized Version → Faster, lighter, and easier to run locally

We’ve made it super easy to get started: our guide walks you through installing & running DeepSeek-V3.1 Terminus GGUF locally with LLaMA.cpp, setting up CUDA acceleration, and leveraging OpenAI-compatible APIs - all while leveraging NodeShift cloud for seamless deployment.

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-and-run-deepseek-v3-1-terminus-gguf?utm_source=telegram&utm_medium=social&utm_campaign=deepseek-v3-1-launch
2
Introducing Isaac 0.1 — the first open-source perceptive-language model built for the physical world by Perceptron AI.

Isaac-0.1 is a ~2.6B VLM that does grounded spatial reasoning (pointing/boxes), reads fine detail (OCR), and adapts to new visual tasks with a few in-prompt examples—no detector re-training. It runs comfortably on a single 12–24 GB GPU (even smaller with 4/8-bit).

We’ve just published a hands-on guide to get Isaac-0.1 running on a GPU VM (NodeShift or any cloud), complete with a working demo and visualization.

What’s inside the guide
GPU sizing cheat-sheet (4-bit / 8-bit / FP16) with realistic VRAM targets & token budgets
Environment setup: CUDA-ready PyTorch, deps, and a clean Python venv
Minimal inference script using AutoProcessor + tensor_stream (image + prompt)
Grounded outputs → visuals: parse <point_box>/<point> and draw boxes/points; export JSON
Quantization options (bitsandbytes 4-bit/8-bit) and FlashAttention-2 notes
Troubleshooting: OOM fixes, attention-mask warnings, pinning revisions
Bonus workflow: connect your VM to VS Code/Cursor for a smooth dev loop

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-isaac-0-1locally
1🔥1
What if an AI model could see, hear, speak, and understand, all at once?

That’s exactly what Qwen3-Omni-Thinking delivers: a foundation model that combines text, images, audio, and video into one seamless, real-time experience. It’s multilingual, lightning-fast, and sets state-of-the-art benchmarks across speech, vision, and multimodal tasks.

With NodeShift, you can install, run, and experiment with Qwen3-Omni-Thinking instantly, unlocking its cookbooks for speech recognition, video analysis, OCR, audio captioning, and more.
🔗 Dive here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-qwen3-omni-thinking?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-omni-thinking
🔥21
Image editing isn’t about filters & photoshops anymore - it’s about control, coherence, & realism. Well, Qwen's latest Qwen-Image-Edit-2509 delivers all three!

What’s new in 2509 upgrade?
- Multi-image editing → Seamlessly combine up to 3 images (person + person, person + product, person + scene).

- Enhanced single-image consistency → Preserve faces, products, and even text styles with stunning accuracy.

- Native ControlNet support → Depth maps, edge maps, keypoints & more for unmatched editing control.

With NodeShift, you can run Qwen-Image-Edit-2509 effortlessly - no messy setup, no complex infra headaches, just private, scalable, and affordable GPU power at your fingertips.
Ready to see what next-level AI image editing looks like?
🔗 Read our step-by-step guide here: https://nodeshift.cloud/blog/a-guide-to-precise-ai-image-editing-with-qwen-image-edit-2509?utm_source=telegram&utm_medium=social&utm_campaign=qwen_image_edit_2509
🔥21
MiMo-Audio-7B-Instruct is Xiaomi’s instruction-tuned audio language model that handles any-to-any tasks across speech and text — from ASR, TTS, and audio understanding to voice conversion, continuation, and style transfer.

Trained on 100M+ hours of audio, it achieves open-source SOTA on speech intelligence benchmarks, while the Instruct variant adds robust “thinking” for both understanding and generation.

In our latest guide, we walk you through a step-by-step process to get MiMo-Audio-7B-Instruct running locally on a GPU VM with CUDA 12, FlashAttention, and Gradio UI:
Setting up a NodeShift GPU VM (or any cloud provider)
Installing Python 3.11+ and dependencies
Configuring PyTorch with CUDA 12.4 wheels
Enabling FlashAttention for speedups
Running the Gradio demo and accessing it via SSH port forwarding
Interacting with the WebRTC interface for real-time ASR/TTS

This setup gives you a fast, privacy-friendly playground for audio tasks—whether you’re building research pipelines, testing speech-to-speech loops, or experimenting with style transfer.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-mimo-audio-7b-instruct-locally
🔥41
Last time we shared a step-by-step installation guide for setting up the K2-Think model locally.

This time, we’re taking it further → we just published a brand-new AI Agent Building Guide powered by K2-Think, a 32B reasoning model created by UAE’s MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) and G42.

K2-Think is designed for tough reasoning tasks in math, code, and science. It ranks high on benchmarks like AIME, HMMT, and LiveCodeBench, making it a powerful open-weights alternative for advanced problem solving.

What’s inside this new guide:
Building a Math Dueler Agent with two proposers + one referee.
Setting up environment & dependencies.
Writing modular agent scripts.
Integrating Sympy for math verification.
Wrapping everything in a clean Gradio interface.
Launching the app locally on your GPU VM.

Already covered setup & installation? Perfect. Jump straight into this agent guide.

Link: https://nodeshift.cloud/blog/building-a-math-dueler-agent-with-k2-think-step-by-step-guide

Also worth noting → K2-Think is available on NodeShift Sovereign Cloud and NodeShift AI, making it easy to run on trusted infrastructure.
🔥21
Create complete, creative, intelligent visuals with just a simple text-prompt with Tencent's latest HunyuanImage 3.0.

With an 80B Mixture-of-Experts engine and a unified autoregressive framework, it delivers photorealistic, fine-grained images that don’t just follow the prompt, but also reason with them. Sparse prompt? No problem. This model fills in the gaps with world knowledge to produce visuals that feel intentional, accurate, and breathtakingly real.

With NodeShift Cloud’s one-stop GPU platform, you can set up and run HunyuanImage 3.0 effortlessly, skipping the hardware headaches while scaling creativity on demand.

🔗 Checkout our step-by-step guide: https://nodeshift.cloud/blog/how-to-install-and-run-hunyuanimage-3-0?utm_source=telegram&utm_medium=social&utm_campaign=hunyuanimage3
🔥21
Tencent just released something crazy — and we built a full guide around it!

Introducing Hunyuan3D-Omni — Tencent’s newest unified image-to-3D generation framework.

This isn't your average text-to-3D tool. Omni lets you control the generation process with:
Point Clouds
Voxels
3D Bounding Boxes
Skeletal Poses

All through a single control encoder, with options like EMA for smoother results and FlashVDM for faster inference. Runs perfectly with just 10–12 GB VRAM.

In this step-by-step guide, we’ve covered:
GPU requirements
How to set it up on a NodeShift GPU VM
Exact commands to run point, voxel, bbox, and pose-controlled generation
Output formats, inference tips, and more!

Whether you're in gaming, research, or 3D design — this model is worth a spin.

Check out the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-hunyuan3d-omni-locally
1🔥1
GLM 4.6, the latest release from Zai Org is an AI model that reasons, codes, and acts with unmatched power against some well known names like DeepSeek V3.1 Terminus and Claude Sonnet 4 .

Built on the next-gen GLM-4.6 foundation, it brings:
- 200K token context window – tackle complex tasks like never before
- Superior coding & agent performance – from Claude Code to Roo Code
- Advanced reasoning & tool use – stronger, smarter, more capable agents
- Refined human-aligned writing – natural style and role-playing scenarios

Our latest publish walks you through how to install & run GLM-4.6 locally or on GPU-accelerated environments with copy-paste ready steps.

🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-glm-4-6?utm_source=telegram&utm_medium=social&utm_campaign=glm46_launch
3
Kwaipilot just released KAT-Dev-32B — a powerful open-source coding assistant

KAT-Dev-32B (Kwaipilot/KAT-Dev) is a 32.8B-parameter coding assistant based on Qwen3-32B, purpose-tuned for software engineering tasks.

It’s trained in three stages — mid-training (core skills), SFT + RFT (teacher trajectories), and large-scale agentic RL (prefix caching + trajectory pruning + scalable infra).

On SWE-Bench Verified, KAT-Dev-32B achieves comparable performance with 62.4% resolved and ranks 5th among all open-source models with different scales.

We just published a step-by-step guide on how to set up and run KAT-Dev-32B on a GPU-powered NodeShift VM.

In this guide, we cover:
GPU configuration requirements (single-GPU, multi-GPU, quantized setups)
Step-by-step process to launch a NodeShift GPU VM
Setting up JupyterLab with CUDA & PyTorch ready-to-go
Installing libraries (Torch, Transformers, Accelerate, Einops)
Running KAT-Dev interactively inside a notebook
Generating your first response with the model

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-kat-dev-locally
1🔥1
MinerU2.5-2509-1.2B — A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

MinerU2.5 is a compact 1.2B VLM with a smart two-stage, coarse-to-fine pipeline (global layout → native-res crops) that delivers state-of-the-art doc parsing with low compute. On OmniDocBench it tops the charts—overall 90.67, leading on Text (95.34), Formula (88.46), Table (88.22), and Reading Order (96.62)—outperforming many larger OCR/VLM systems.

What’s inside our new guide
Setup end-to-end on a GPU VM (we demo with NodeShift, works anywhere)
Two paths: Transformers (simple) & vLLM (fast + scalable, async engine ready)
Copy-paste scripts to run two_step_extract() on your pages
VRAM sizing & perf tips (quantization, token budgets, image sizing)
Outputs you can use: structured blocks → Markdown, tables, formulas

Read the guide here: https://nodeshift.cloud/blog/how-to-install-run-mineru2-5-2509-1-2b-locally
🔥21
Struggling to get AI assistants to follow complex instructions or handle multilingual tasks?
IBM's Granite-4.0-Micro is here to help enterprises with instruction-following LLMs. This 3B-parameter mini-package brings:
- Accurate summarization & text extraction
- Question-answering & Retrieval-Augmented Generation (RAG)
- Code completions & function-calling tasks
- Multilingual dialog support across 13+ languages

If you’re building AI agents, automating enterprise workflows, or experimenting with advanced LLMs, Granite-4.0-Micro delivers the flexibility and precision you need once you fine-tune or customize it with your own data.
And with NodeShift Cloud, setup, deployment, and scaling are effortless, secure, and GPU-accelerated for enterprises thinking about long term stability.

Here’s a latest demo guide from us for installing & running Granite-4.0-Micro locally:
🔗 Link: https://nodeshift.cloud/blog/get-started-with-ibm-granite-4-0-micro-for-enterprise-rag-summarization-qa-code-tasks?utm_source=telegram&utm_medium=social&utm_campaign=granite_4_micro_launch
2🔥1
What if AI doesn't generate just basic transcriptions, instead understands the audio and describe it with human-level depth?

Meet Qwen3-Omni-30B-A3B-Captioner, a powerful audio captioning model that generates fine-grained, low-hallucination captions across any soundscape.
From multilingual speech and layered emotions to environmental noise, music, and cinematic effects, it delivers detailed, context-aware audio descriptions without requiring extra prompts.

And the best part? With NodeShift Cloud, you can install, run, and start experimenting instantly in a CUDA-ready environment, no complex setup, just speed and scale in minutes.
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-qwen3-omni-captioner-for-accurate-audio-captioning?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-omni-captioner
2🔥2
Liquid AI just dropped LFM2‑2.6B — a next‑generation hybrid model built for edge AI & on‑device deployment.

With 2.6 B parameters, multiplicative gates + short convolutions, and support for 8 languages, it’s one of the few open models designed to run smoothly on CPU, GPU, and even NPU hardware.

What you can build with it:
✔️ Lightweight tool‑calling agents that work offline or on your laptop
✔️ Data extraction & RAG workflows on private documents
✔️ Conversational assistants with multilingual support
✔️ Creative writing, summarization, etc

What’s inside our new guide
✔️ How to install & run LFM2‑2.6B locally with Transformers
✔️ How to serve it via vLLM for fast, scalable inference
✔️ How to build a minimal agent that calls functions (time, math, RAG) step‑by‑step
✔️ VRAM & GPU tips (BF16 vs. 4‑bit, FlashAttention‑2, sweet spots)

Read the full guide here: https://nodeshift.cloud/blog/pocket-operator-a-local-tool-calling-agent-powered-by-lfm2-2-6b
1👏1
HUGE RELEASE ALERT!
Qwen team has just dropped a major upgrade of Qwen2.5-VL, the most popular vision model in AI industry, which is used by many big players to fine-tune their domain specific vision models.

The newest version is Qwen3-VL, Alibaba’s new multimodal vision-language model that’s breaking benchmarks and expectations.
We just dropped a full guide on how to install and run Qwen3-VL Locally - step-by-step, clean, and fast.
🧠 Expect next-level multimodal understanding
🎥 Vision + Text synergy
⚡️ Lightning-fast inference with NodeShift

🔗 Read now: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-locally-a-step-by-step-guide?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-vl_announcement
4
Media is too big
VIEW IN TELEGRAM
IBM launches Granite 4.0-H — a family of long-context, tool-calling LLMs built for real work.

Three sizes, same DNA:
Micro-H (3B, 1M ctx): lightweight & snappy for JSON/IE, routing, short multilingual chat, FIM code.
Tiny-H (7B, 1M ctx): the sweet spot—stronger reasoning, multi-turn assistants, compact RAG, solid tool-calling.
Small-H (32B, 1M ctx): muscle for complex workflows, long-doc comprehension, higher-fidelity coding & analysis.

We just published a hands-on guide to get you productive fast:

What’s inside
Two setup paths: Ollama + Open WebUI (fast chats) & Transformers/vLLM (prod services)
GPU sizing tables for Micro/Tiny/Small + why we standardize on 1×H200
A mini benchmark/prompt pack to compare the three models
Tool-calling scripts (emit/parse <tool_call> and feed <tool_response>)
Minimal Python examples (BF16 & 4-bit) + sanity checks & troubleshooting

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-4-0-h-tiny-small-and-micro-locally
1🔥1
Big TTS models, running on heavy hardwares, still delivering robotic voices?

Forget them, as NeuTTS Air brings super-realistic, on-device voice AI with instant voice cloning, can run easily on CPUs, no heavy GPUs needed.
- Generate ultra-human voices in real-time
- Clone any speaker in just 3 seconds of audio
- Optimized for laptops, phones & even Raspberry Pis

NeuCodec-powered audio ensures crystal-clear quality with low power consumption.
TL;DR: It’s realistic speech + instant voice cloning + on-device performance, all in one compact model.

In our latest guide, we show you how to install and run NeuTTS Air locally, with NodeShift cloud making setup and GPU-accelerated deployment effortless, get lifelike voice AI running in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-neutts-air-locally-super-realistic-on-device-voice-ai-with-instant-voice-cloning?utm_source=telegram&utm_medium=social&utm_campaign=neutts_air_launch
3
This media is not supported in your browser
VIEW IN TELEGRAM
Meta AI just launched the Code World Model (CWM)!

The Code World Model (CWM) is a 32B parameter dense autoregressive LLM developed by the Meta FAIR CodeGen Team. Unlike traditional code models, CWM was mid-trained on Python execution traces, memory trajectories, and containerized agentic interactions—making it uniquely suited for reasoning about how code affects computational environments.

What’s special about CWM?
Mid-trained on real execution traces & agentic environments
Post-trained with multi-task RL for verifiable coding, math, and multi-turn software engineering
Research-only (non-commercial) release under FAIR license
Strong benchmark performance on Math-500, AIME, and SweBench

We just dropped a full step-by-step guide on:
🔹 Requesting gated access
🔹 Running on a NodeShift GPU VM
🔹 Serving with vLLM
🔹 Streamlit UI for interaction

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-facebook-cwm-locally
1🔥1
Still relying on slow, API based TTS that sounds robotic, can't run on small devices and breaks the bank?
Meet KaniTTS, the latest high-speed, high-fidelity voice AI that runs entirely on your device with just some basic GPU acceleration.

What makes it special:
- Powered by a 370M LLM + Neural Audio Codec for ultra-natural, real-time speech
- ~1 sec latency for 15 seconds of audio, perfect for chatbots, assistants & accessibility tools
- Multilingual: English, German, Chinese, Korean, Arabic & Spanish
- Runs locally with just 2-4GB GPU memory, no APIs, no data leaks, no lag

With NodeShift cloud, setting it up is effortless, GPU-optimized, ready-to-run, and privacy-first.
Get studio-quality speech generation right on your own hardware in minutes.
🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-and-run-kanitts-locally-real-time-on-device-voice-generation?utm_source=telegram&utm_medium=social&utm_campaign=kanitts_launch
2
The ModernVBERT team has just unleashed a compact 250M-parameter vision-language model, which is matching the performance of models up to 10x larger, performing way above its weight!

With state-of-the-art multimodal reasoning, advanced document retrieval capabilities, and seamless image + text understanding, ModernVBERT is your go-to model for next-level AI & RAG workflows.

We’ve published a step-by-step guide to install and run ModernVBERT locally - fast, clean, and ready for experimentation.
- Unlock multimodal intelligence
- Advanced visual document comprehension
- Optimized for lightning-fast local inference with NodeShift Cloud

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-modernvbert-compact-vlm-for-document-retrieval-in-rag-applications?utm_source=telegram&utm_medium=social&utm_campaign=modernvbert_launch
2🔥1
Media is too big
VIEW IN TELEGRAM
ServiceNow just released: Apriel-1.5-15B-Thinker

Apriel-1.5-15B-Thinker is an open-weights, multimodal reasoning model (image-text-to-text) focused on strong mid-training/continual pre-training plus high-quality text SFT—no RL required. It’s compact (15B) yet competitive with much larger models and designed to run on a single GPU.

We just published a step by step guide to install and run Apriel locally—plus a simple Streamlit UI so you can chat with the model and ask questions about images.

What the guide covers:
Picking a GPU + VRAM sizing tips
CUDA/PyTorch install (cu121) & env setup (Py 3.11)
One-file for text + vision with the correct dtype cast (BF16/FP16)
Optional Streamlit app (text & image tabs, sliders for temp/tokens)
Tuning for speed/VRAM (token limits, fp16, 8-bit options)

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-servicenow-apriel-1-5-15b-thinker-locally
🔥21