NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
Microsoft just dropped Kosmos-2.5 — a multimodal “literate” model built to read text-heavy images.

It does two things out of the box:
<ocr> → OCR with spatially-aware text blocks (text + bounding boxes)
<md> → image → Markdown conversion for clean, structured docs

We’ve just published a step-by-step guide to run Kosmos-2.5 on a GPU VM and use it from a browser-based Streamlit WebUI.

What’s inside the guide
GPU setup on NodeShift (works on any cloud)
Precise GPU VRAM matrix (12–48 GB+) and memory levers (bf16, max_patches, FlashAttention-2)
Minimal Python scripts for <md> and <ocr>
One-click Streamlit WebUI to upload docs and get Markdown or OCR+boxes
Tips for large pages, long outputs, and batching

Why this matters
Turn messy receipts, invoices, forms, and scans into usable Markdown
Keep layout awareness with OCR bounding boxes for downstream parsing
Runs with Transformers ≥ 4.56 and standard PyTorch CUDA wheels

Try it
Spin up a GPU, follow the commands, and open the WebUI in your browser. You’ll be extracting Markdown or drawing OCR boxes in minutes.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-microsoft-kosmos-2-5-locally
1🔥1
Ever imagined turning a single image into a fully immersive 3D experience?

Tencent has launched HunyuanWorld-Voyager – a one of its kind, video diffusion framework that generates world-consistent 3D images and videos from just one image!

Unlike previous models, Voyager ensures frame-to-frame consistency, long-range exploration, and automated scene reconstruction, delivering stunning visuals and precise 3D geometry without manual 3D pipelines.

If you’re into creative multimedia projects, simulations, or large-scale dataset creation, Voyager opens up endless possibilities.

Check out our complete setup guide here: https://nodeshift.cloud/blog/how-to-install-hunyuanworld-voyager-create-stunning-3d-images-videos-from-a-single-image?utm_source=telegram&utm_medium=social&utm_campaign=hunyuanworld_voyager_launch
🔥2👍1🖕1
R-4B is trending on Hugging Face — another auto-thinking MLLM to watch.

What it is: R-4B is a multimodal large language model that automatically decides when to think step-by-step and when to answer directly. Through Bi-mode Annealing (build both skills) and Bi-mode Policy Optimization (switch at inference), it delivers strong reasoning without wasting compute. It now runs smoothly with vLLM for fast, scalable serving and exposes a simple thinking_mode control (auto / long / short).

Why it matters (benchmarks): R-4B shows SOTA-level results among <20B open models on multiple multimodal reasoning suites, edging out popular peers:
✔️ MMMU: 68.1 (vs Keye-VL-8B 66.8, InternVL3.5-4B 66.6, Qwen2.5-VL-7B 58.0)
✔️ MMStar: 73.1 (vs 72.8, 65.0, 64.1)
✔️ CharXiV (RQ): 56.8 (vs 40.0, 39.6, 42.5)
✔️ MathVerse-Vision: 64.9 (vs 40.8, 61.7, 41.2)
✔️ DynaMath: 39.5 (vs 35.3, 35.7, 20.1)
✔️ LogicVista: 59.1 (vs 50.6, 56.4, 44.5)

We just published a step-by-step guide to install & run R-4B on a GPU VM.

What’s inside (all methods, end-to-end):
✔️ Infra & env: Choose GPU/region/storage, use CUDA base image nvidia/cuda:12.1.1-devel-ubuntu22.04; set up Python 3.10 venv, PyTorch (cu121), core deps.
✔️ Transformers (single-GPU): FP32 load to avoid LayerNorm dtype bug; image+text chat with thinking_mode; optional BF16 + projector upcast for tight VRAM.
✔️ vLLM serve (recommended): Install via uv + build tools; vllm serve … --trust-remote-code (optional --enforce-eager); metrics & scale via --tensor-parallel-size.
✔️ API & quality: OpenAI-compatible cURL/Python, image_url, streaming, control thinking_mode; guide rails with system prompt, temperature/top_p, stop for </think>, revision pinning.
✔️ Ops: GPU sizing table for light/medium/heavy, troubleshooting (Python.h, OOM, dtype, ports), and prod tips (tmux/systemd, HF transfer acceleration).

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-r-4b-auto-thinking-model-locally
2🔥1
AI models shouldn't be dominated by a handful of black boxes with hidden data and training methods.

Apertus by Swiss AI, the groundbreaking 8B & 70B parameter LLM that's redefining transparency and multilingualism in AI.
This is fully open-source model, supporting 1,800+ languages and providing ALL its training data, code, and evaluation suites. This means true auditability, community extension, and ethical AI development.

But deploying such a powerful, massive multilingual model can be daunting and costly... right? Not anymore. Our latest article shows you how to install and run Apertus efficiently and affordably both locally or with NodeShift.

🔗 Read here: https://nodeshift.cloud/blog/how-to-install-run-apertus-the-massive-multilingual-ai-model-supporting-1800-languages?utm_source=telegram&utm_medium=social&utm_campaign=apertus_install_guide
🔥3
Baidu, Inc. just dropped another open-weight beast — ERNIE-4.5-21B-A3B-Thinking — a 21B parameter Mixture-of-Experts (MoE) model with 3B active experts/token, optimized for reasoning, coding, long-context, and function-calling. Think 131K context length, top-tier benchmarks on HumanEval+, BBH, MUSR, and full multilingual capabilities

And yes… we just published a complete step-by-step guide to:
Install it from Hugging Face
Run it on a GPU VM (H100/H200)
Generate responses in your desired language
Deploy with vLLM, Transformers, or FastDeploy
Run OpenAI-style APIs in seconds
Trim out <think> traces and extract polished outputs

Whether you're experimenting with long-context reasoning, exploring ERNIE’s chain-of-thought or deploying it in production — this tutorial is all you need to get started. No skipped steps. No guesswork. All clean

Read the full setup guide here: https://nodeshift.cloud/blog/how-to-install-run-ernie-4-5-21b-a3b-thinking-locally
🔥2
If you're done with image generation models that force you to choose between high-resolution and high-speed, then HunyuanImage 2.1, the latest Image Generation model from Tencent is worth taking a look.

This #2 trending HF model:
- Generates ultra-HD 2K images (2048×2048) with cinematic quality
- Powered by a 17B parameter diffusion transformer + high-compression VAE
- Dual text encoders for multilingual & multimodal alignment
- Refinement stage for sharper, lifelike details
- Smart prompt rewriting & RLHF for stunning realism

And the best part? It’s open-source, bringing closed-source quality to everyone.
We’ve put together a step-by-step guide to make HunyuanImage 2.1 easily accessible for everyone with NodeShift.

🔗 Read here: https://nodeshift.cloud/blog/how-to-install-run-hunyuanimage-2-1?utm_source=telegram&utm_medium=social&utm_campaign=hunyuanimage2-1
3
MiniCPM4.1 is one of the most exciting open-source LLMs right now, bringing edge-side efficiency to an 8B parameter model that doesn’t need a super-expensive hardware to shine. It’s developed with sparse attention, ternary quantization, and a custom CUDA inference engine (cpm[.]cu) to make long-context reasoning fast and lightweight, perfect for running locally or on consumer-grade GPUs.

We’ve just published a hands-on guide to get you up and running with MiniCPM4.1-8B.
Here’s what's inside:
- Setting up MiniCPM 4.1-8B on your machine or GPU VM
- Running inference with CPM[.]cu for max efficiency

🔗 Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-and-run-minicpm4-1-locally?utm_source=telegram&utm_medium=social&utm_campaign=minicpm4-1
2
Chroma1-HD (8.9B) — FLUX.1-schnell–based, Apache-2.0, built for clean, customizable image generation. As a neutral text-to-image base model, it’s perfect for finetuning and plays nicely with Diffusers and ComfyUI — and it’s trending on Hugging Face.

We just published a step-by-step guide to run Chroma1-HD locally/on a GPU VM:
Quickstart with PyTorch + Diffusers + ChromaPipeline (bf16)
Full environment setup (CUDA, cuDNN, matching Torch/TV/TA wheels)
Reproducible image generation scripts
GemLite + Triton path for lower VRAM & faster matmuls (24–40 GB cards)
GPU configuration table (24 GB / 40–48 GB / 80 GB+) with practical settings

Why this matters:
Apache-2.0 license → easy to adopt, modify, and ship
Neutral base → ideal for downstream finetunes (styles, brands, characters)
Fast iterations → diffusers-native, modern kernels, optional 8-bit linears with GemLite
Repro-friendly → seeded runs, pinned deps, and copy-paste scripts

Perfect for:
Artists & designers experimenting with new styles
Developers building custom T2I apps or internal tooling
Researchers evaluating training choices and alignment strategies
Teams that need cloud-ready workflows (NodeShift GPU VMs work great)

Checkout the full guide here: https://nodeshift.cloud/blog/how-to-install-run-chroma1-hd-locally
🔥2
For years, the trend was simple: go bigger. But the new Qwen3-Next series flips the script.

Instead of chasing raw scale, it delivers ultra-long context (up to 1M tokens!), 10x faster inference, and the power of 80B parameters with only 3B active at a time. With innovations like Hybrid Attention and high-sparsity MoE, this model achieves near state-of-the-art performance outperforming 200B+ parameter models, without the crushing compute cost.

In our latest article, we break down how you can install, set up, and start using Qwen3-Next today with NodeShift in just a few clicks.

🔗 Read the full guide here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-qwen3-next-80b?utm_source=telegram&utm_medium=social&utm_campaign=qwen3next80b_install
🔥2
Elon Musk’s xAI just dropped Grok 2 as open source - and now you can run it locally.
For the first time, devs get free access to a 270B parameter enterprise-grade model, and thanks to Unsloth AI’s GGUF release + llama.cpp integration, you don’t need a supercomputer to try it.

- Full precision: 539GB
- Quantized GGUF (Q3_K_XL): ~118GB
- Runs on a 128GB RAM Mac or even a 24GB GPU setup at >5 tokens/sec

We've put together a step-by-step guide so you can install and run Grok 2 GGUF locally.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-run-grok-2-gguf-locally?utm_source=telegram&utm_medium=social&utm_campaign=grok2_gguf
🔥3
Forget robotic voices. Unlike traditional TTS models, IndexTTS2 lets you clone voices, control emotions, and even decide exactly how long the speech lasts.

- Clone voices with accuracy while guiding emotion using simple text prompts
- Perfect for dubbing, lip-syncing & storytelling
- Separate emotion from speaker identity (mix & match voices + feelings)
- Powered by GPT latents & a 3-stage training paradigm for crystal-clear, stable speech

TLDR; it’s voice cloning + emotional control + precise duration all rolled into one groundbreaking TTS system.
In our latest article, we’ll show you step by step how to install and run IndexTTS2 locally, whether on your machine or a GPU-accelerated environment with NodeShift, so you can start generating lifelike, controllable speech in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-run-indextts2-locally-for-ai-voice-cloning-emotion-controlled-speech?utm_source=telegram&utm_medium=social&utm_campaign=indextts2_install
2
Google just released VaultGemma — a privacy-first open LLM trained end-to-end with Differential Privacy (DP-SGD).

It remembers patterns, not people — and it’s small enough (<1B params) to run on modest GPUs.

We’ve just published a step-by-step guide to get VaultGemma running locally and as an OpenAI-compatible API.

What’s inside:
Quick intro to DP-SGD and why VaultGemma matters for healthcare/finance & other sensitive apps
GPU sizing cheat sheet (from 4 GB tinkering to scalable deployments)
Exact install commands (PyTorch, deps, dev Transformers fix for model_type="vaultgemma")
Serve with vLLM at /v1/completions + optional chat template
Prompting tips for a pretrained (non-instruct) base model

If you care about utility and privacy, this is a great starting point.

Read the full guide guide here: https://nodeshift.cloud/blog/how-to-install-run-google-vaultgemma-1b-locally
2🔥1
Turn a single prompt into a stunning, production-ready website in minutes!

WEBGEN OSS 20B is Tesslate's latest open-source model that's transforming web design. Here's what WEBGEN OSS ships:
- Clean, semantic HTML & Tailwind CSS
- Responsive, mobile-first layouts
- Modern components (hero, pricing, FAQ)
- Quants small enough to run on your laptop!

We just published a quick, no-fluff guide to walk you through easy & simple steps to get WEBGEN OSS up and running in your machine.
🔗 Read here: https://nodeshift.cloud/blog/build-modern-single-page-websites-instantly-with-webgen-oss-20b?utm_source=telegram&utm_medium=social&utm_campaign=webgen_oss_launch
1🔥1
AI at Meta just dropped: MobileLLM-R1-950M.

A new reasoning-focused model in the MobileLLM family—tuned for math, Python/C++ coding, and scientific problems. Despite being <1B params, it rivals or beats larger open models on MATH, GSM8K, MMLU, and LiveCodeBench, and it packs a 32K context window. Lightweight, fast, reproducible—perfect for research-grade reasoning.

We’ve just published a step-by-step guide to get MobileLLM-R1-950M
running locally and as an OpenAI-compatible API.

What’s inside:
Gated access (FAIR Noncommercial license) + HF token setup
CUDA-ready VM setup (NodeShift GPU node or any cloud)
PyTorch (cu121) + Transformers install, HF auth
First inference script (math/code prompts that “just work”)
vLLM serving with an OpenAI-compatible /v1/chat/completions API
Prompt tricks to suppress <think> or post-process only the \boxed{…} answer
VRAM sizing: 12–16 GB for single inferences; 24–40 GB for longer context/concurrency; optional 4-bit for tighter GPUs
Quick troubleshooting notes (headers/toolchain for vLLM, offload tips)

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-facebook-mobilellm-r1-950m-locally
🔥1
Fine-tuning diffusion models in under 10 minutes? is no more an imagination.

Tencent's new SRPO method, currently trending no. 1 on Hugging Face, is a paradigm shift in aligning generative AI with human preference, making advanced fine-tuning faster, more stable, and incredibly efficient. This is a game-changer for researchers, developers, and creative technologists.

What makes SRPO so revolutionary?
> Blazing-Fast Training: Achieve significant performance boosts on models like FLUX.1-dev in less than 10 minutes, a speed previously unimaginable.
> Hyper-Efficient: Ditch expensive online rollouts. SRPO can leverage a small offline dataset of fewer than 1,500 images, making it accessible to everyone.
> Superior Quality: It cleverly avoids "reward hacking," ensuring your generated images have authentic aesthetic quality without common issues like color oversaturation.
> Dynamic Control: For the first time, you can adjust style preferences on the fly, giving you an unprecedented level of creative control.

This new advancement is a new toolkit for building faster, fairer, and more controllable AI. Our latest article provides a comprehensive, step-by-step guide to get SRPO installed and running.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-run-srpo-a-flux-1-dev-fine-tune-by-tencent?utm_source=telegram&utm_medium=social&utm_campaign=srpo_article
🔥1
ByteDance, the company behind TikTok, has launched its latest AI-powered human centric video generation model.

Traditional video generation models struggle to sync multiple input types such as Text, Image and Audio, so HuMo by ByteDance is rewriting a new innovation in AI-powered video generation.
Imagine creating realistic human videos with:
🎬 Preserved character identity across scenes
🎤 Synced motion & lip-movement flawlessly with audio
🖼 Blended text, images, and sound into fine-grained, controllable clips

In our latest guide, we dive into the detailed yet to-the-point steps to setup this model on NodeShift GPU environment and generate lifelike cinematic clips.
The generation took longer than what we assumed, do you think the results are worth it?
🔗 Dive in here to see: https://nodeshift.cloud/blog/create-lifelike-human-videos-with-ai-a-guide-to-run-humo-by-bytedance?utm_source=telegram&utm_medium=social&utm_campaign=humo_launch
1
Introducing Tongyi DeepResearch (30B-A3B) – Alibaba’s Breakthrough in Agentic AI

Tongyi DeepResearch (30B-A3B) is a 30-billion parameter Mixture-of-Experts (MoE) model developed by Alibaba Tongyi Lab, with only 3B active parameters per token for efficiency. Unlike general-purpose LLMs, it is purpose-built for deep, long-horizon information-seeking tasks, and it sets new state-of-the-art results across multiple benchmarks like:
Humanity’s Last Exam
BrowserComp & BrowserComp-ZH
WebWalkerQA
GAIA
xbench-DeepSearch
FRAMES

On these benchmarks, Tongyi DeepResearch consistently outperforms other leading models like GLM 4.5, DeepSeek V3.1, Kimi Researcher, Claude-4-Sonnet, and even OpenAI’s DeepResearch agents.

We’ve just published a step-by-step guide on how to install and run Tongyi DeepResearch (30B-A3B) locally or on cloud GPU.

What’s inside the guide?
Model introduction & benchmark results
Complete GPU configuration table (from entry-level to multi-GPU heavy setups)
Step-by-step process to install, set up, and run DeepResearch on NodeShift GPU VMs
Hugging Face authentication & checkpoint download instructions
Running inference in both ReAct-style and Heavy IterResearch mode

If you’re into agentic reasoning models, research agents, and long-horizon information-seeking AI, this guide is a must-read.

Check out the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-alibaba-tongyi-deepresearch-locally
1🔥1
mmBERT is a modern multilingual encoder (~307M params) trained on 3T+ tokens across 1,800+ languages. Built on the ModernBERT family, it delivers 8K context, fast inference, and state-of-the-art cross-lingual performance for classification, embeddings, retrieval, and reranking—with training tricks like inverse mask scheduling and progressive language addition that especially boost low-resource languages.

We’ve just published a step-by-step guide on how to install and run mmBERT-base locally.

What’s inside the guide
Sanity-check script to validate GPU, dtype, and tokenizer
FastAPI microservice exposing /embed and /mlm endpoints
Streamlit UI for interactive embeddings + masked-LM demos (CSV download included)
GPU sizing cheat sheet: practical VRAM + batch sizes for 512–8K tokens (inference & fine-tuning)
Clear, copy-paste setup for Ubuntu + CUDA, PyTorch, and all Python deps

Who’s it for
Teams adding multilingual search & retrieval (FAISS/pgvector/Milvus)
Builders prototyping classification/reranking on real data
Anyone needing a fast, reliable multilingual encoder with 8K context

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-mmbert-base-locally
2
Struggling with extracting accurate data from complex documents?
In a world where documents are packed with equations, tables, multilingual text, and complex layouts, simple extraction tools just don’t cut it anymore.

IBM's new Granite Docling is an all-rounder in document intelligence. This OCR is a sophisticated multi-modal AI with:
- Precision equation & inline math recognition
- Flexible full-page & region-based inference
- Document-structure QA
- Experimental multilingual support
- Improved stability & reduced loop errors

If you’re handling dense research papers, financial reports, or global documents for data annotation tasks, Granite Docling is built to deliver clarity from complexity. And with NodeShift, deploying and scaling this model is seamless, secure, and production-ready.

Dive into our step-by-step guide on installing & running Granite Docling:
🔗 https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-docling-ocr-for-advanced-document-analysis?utm_source=telegram&utm_medium=social&utm_campaign=granite_docling_launch
1🔥1
Who said small models can’t think big?
Magistral Small 1.2 by Mistral AI has 24B params, multimodal reasoning (text + vision), multilingual support and a 128k context window into a setup you can run locally on a single H100 or even your own GPU-enabled environments.

What’s new in Magistral Small 1.2?
- Vision encoder → reason over images + text
- [THINK] tokens → transparent reasoning traces
- Multilingual support → dozens of languages out of the box
- Smarter formatting + fewer generation loops
- Faster, cleaner, more reliable responses

We’ve put together a step-by-step install guide with copy-paste ready snippets so you can get it running in minutes. If you want to try serious reasoning power without the heavyweight baggage, this is it.

🔗 Full Guide here: https://nodeshift.cloud/blog/how-to-install-and-run-magistral-small-1-2-by-mistral-ai?utm_source=telegram&utm_medium=social&utm_campaign=blog_share
2
Jina Code Embeddings 1.5B is a lightweight yet surprisingly powerful code embedding model—built on Qwen2.5-Coder-1.5B—purpose-tuned for developer workflows. Instead of generic text semantics, it captures the structure and intent of real code across 15+ languages, enabling accurate NL→Code, Code→Code, Code→NL, completion retrieval, and technical QA. It supports 32k tokens for long files, uses last-token pooling, and pairs seamlessly with FlashAttention-2 or SDPA for fast inference.

We’ve just published a new step-by-step guide showing how to run and evaluate the model end-to-end on a GPU VM — from zero to meaningful retrieval results.

What’s inside the guide
GPU sizing & configs (Entry → Enterprise), with practical batch/seq-length tips
Environment setup on a clean CUDA image (Python 3.10, venv, drivers)
Hugging Face auth and dependency installs (Torch, Sentence-Transformers, optional FlashAttention-2)
Two test scripts:
- for a quick sanity check (NL→Code)
- for stress testing across nl2code, code2code, code2nl, code2completion, and QA with distractors
Matryoshka embeddings: try 128–1536 dims and see ranking stability vs storage/speed
Attention backends: flip between FlashAttention-2 and SDPA for the best fit to your hardware
Troubleshooting notes (dtype, padding side, FA2 install, common pitfalls)

If you’re building code search, RAG for repos, or dev tooling, this model hits the sweet spot: cost-efficient, long-context (32k), and flexible via Matryoshka dims — scale from laptop to cluster with simple config tweaks.

Check the full guide here: https://nodeshift.cloud/blog/how-to-install-run-jina-code-embeddings-1-5b-locally
🔥21