NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
Smaller, Smarter, Faster. Meet MiniCPM-V 4.0.
OpenBMB’s latest multimodal AI offers 4.1B parameters yet outperforms larger models like GPT-4.1-mini, delivering state-of-the-art image, multi-image, and video understanding.

- Runs with <2s first-token delay and 17+ tokens/s on iPhone 16 Pro Max — no heating, no lag.
- Easy integration via llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory, and even a native iOS app.

We just published a step-by-step guide to install and run MiniCPM-V 4.0 locally or in GPU-accelerated environments.
🔗 Dive in and try it yourself: https://nodeshift.cloud/blog/get-started-with-minicpm-v4-the-next-gen-multimodal-ai-model-by-openbmb?utm_source=telegram&utm_medium=social&utm_campaign=minicpmv4_launch
🔥1
Dyad Tech, Inc is a free, local, and open-source app builder that lets you create AI-powered apps with zero coding. Think of it as a privacy-friendly alternative to Lovable, v0, Bolt, and Replit — but without vendor lock-in.

We just published a step-by-step guide on how to connect Dyad + Ollama using a GPU-powered VM on NodeShift. In this guide, you’ll learn how to:
Spin up a GPU Node (H100 to A100) on NodeShift
Install and run Ollama on your VM
Pull & configure powerful open-source models like GPT-OSS 120B
Connect Ollama as a custom provider inside Dyad
Build your first full-stack AI app in minutes — privately, securely, and without lock-in

Why this matters:
Full control — your code & data stay with you
AI freedom — integrate any model, from Gemini to GPT-OSS
Enterprise-ready — NodeShift GPU VMs are GDPR, SOC2 & ISO27001 compliant

Whether you’re a developer, tinkerer, or someone just exploring no-code AI tools, this tutorial will help you build apps that are private, fast, and future-proof.

Read the full guide here: https://nodeshift.cloud/blog/the-open-source-app-builder-that-ate-saas-dyad-ollama-setup
🔥1
NuMarkdown-8B-Thinking from NuMind is here — and it’s a beast.

A Vision-Language OCR model fine-tuned from Qwen2.5-VL, it doesn’t just extract text — it reasons about layout, structure, and formatting before generating clean, structured Markdown.

It literally outperformed GPT-4o and other giants in head-to-head arena rankings.

In our latest blog, we show you how to:
Deploy NuMarkdown-8B-Thinking on a GPU-powered VM
Run local inference on scanned docs or PDFs
Build a fully functional Streamlit web app that converts docs to Markdown
Handle reasoning tokens, batch documents, and layout-rich PDFs like a pro

From raw scans to clean Markdown in seconds — this is the OCR model RAG pipelines have been waiting for.

Read the full guide here: https://nodeshift.cloud/blog/the-ocr-model-that-outranks-gpt-4o
🔥1
Ovis2.5-9B: A Next-Gen Multimodal Reasoning Powerhouse

We Just dropped a complete step-by-step guide on how to run it locally in your browser. From raw images to deep reasoning — all within a sleek Streamlit UI.

Ovis2.5-9B, developed by AIDC-AI, combines the power of native-resolution vision encoding (via NaViT) with deep multimodal reasoning (Chain-of-Thought + Reflective Thinking). It’s designed to understand and reason over real images, complex charts, and documents—not just "see" them.

What makes it special?
✔️ Supports “thinking mode” and “thinking budget” for layered internal reasoning
✔️ SOTA performance in OCR, chart QA, and layout understanding
✔️ Fully runnable on your own GPU VM (we used NodeShift Cloud for this guide)
✔️ Built-in support for both terminal and browser-based interfaces (Streamlit)

In this new guide, we walk through:
VM setup on NodeShift
CUDA environment configuration
Running Ovis2.5-9B via terminal and Streamlit
Uploading charts, asking visual questions, and getting deep reasoning outputs

If you’re working on visual QA, document parsing, OCR, or any MLLM-powered app — this setup is a game-changer.

Read the full blog here → https://nodeshift.cloud/blog/how-to-install-run-ovis2-5-9b-locally
🔥1
Image editing is no longer just about filters and touch-ups, it’s about precision + creativity at scale. Meet Qwen-Image-Edit, the advanced model built on the 20B Qwen-Image foundation, designed to:
- Perform both semantic edits (rotate objects, style transfer, new creations) & appearance edits (add/remove elements without disturbing the rest of the image).
- Deliver precise bilingual text editing in English & Chinese while preserving fonts, size & style.
- Achieve SOTA benchmark performance in AI-powered image editing.

And the best part? You can run it effortlessly with affordable, private and secure GPU setup on NodeShift, no infra headaches, just pure creativity owned privately by you.
Ready to unlock next-level professional editing?
🔗 Check out our step-by-step guide here: https://nodeshift.cloud/blog/a-complete-setup-guide-to-powerful-ai-image-editing-with-qwen-image-edit?utm_source=telegram&utm_medium=social&utm_campaign=qwen_image_edit
🔥3
DeepSeek is back — and DeepSeek-V3.1 is anything but ordinary!

This latest release introduces:
- Hybrid Thinking Modes → Switch effortlessly between thinking and non-thinking for any use case
- Smarter Tool Calling → Optimized post-training for sharper agent + automation performance
- Extended Context Mastery → 32K tokens scaled 10x to 630B & 128K tokens extended 3.3x to 209B
- Faster Reasoning Efficiency → Comparable to R1, but quicker responses

Think running such a massive model locally is impossible? Think again.
With Unsloth’s dynamic quantization and NodeShift's scalable, private cloud/on-premise GPU infrastructure, installing and running a powerul model like DeepSeek-V3.1 has never been easier.
🔗 Dive into our step-by-step guide here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-deepseek-v3-1?utm_source=telegram&utm_medium=social&utm_campaign=deepseek-v3-1
🔥1
Say bye to complex Kubernetes commands!
Ever thought you could manage your Kubernetes cluster just by typing in plain English?
That’s exactly what Google's kubectl-ai does - it turns natural language into real-time Kubernetes operations, making it feel like as if you're talking to just another AI.

Now DevOps teams don't need to memorize tricky syntax. Just ask, run, and scale.
In our latest guide, we walk you through installing, setting up and using kubectl-ai step by step in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-setup-kubectl-ai-simplify-kubernetes-management-with-natural-language?utm_source=telegram&utm_medium=social&utm_campaign=kubectl-ai_launch
🔥1
Grok 2 is now Open Source!

Elon Musk’s xAI has officially made Grok 2, its flagship AI model, open source.
This is a massive step for developers worldwide, as it unlocks enterprise-level AI for free.

We just published a step-by-step guide on how you can install, run, and even build a Streamlit-powered chatbot with Grok 2. The model is now live on Hugging Face, making it super easy to download and experiment with.

Keep in mind: Grok 2 is huge (nearly 500GB+) and requires a solid GPU setup (8× H100/H200 GPUs recommended). But don’t worry — you don’t need to burn a hole in your pocket. You can easily rent powerful GPUs from NodeShift, where pricing is developer-friendly and built for scalability.

Check out the full guide and start experimenting with Grok 2 today.

Link: https://nodeshift.cloud/blog/how-to-install-run-grok-2-locally
🔥1
Imagine generating 90 minutes of podcast-style audio with up to 4 distinct, natural-sounding speakers - all from just a text script. That’s exactly what VibeVoice, Microsoft’s open-source TTS model, makes possible.

Unlike traditional TTS systems, VibeVoice brings:
🔹 Expressive, long-form, multi-speaker conversations
🔹 Continuous speech tokenizers for high fidelity + efficiency
🔹 Diffusion-based decoding for lifelike detail & flow

We just published a step-by-step guide on how to install and run VibeVoice locally or accelerate your VibeVoice environment with NodeShift GPUs.
🔗 Dive in: https://nodeshift.cloud/blog/generate-expressive-long-form-multi-speaker-audios-podcasts-with-microsofts-vibevoice?utm_source=telegram&utm_medium=social&utm_campaign=vibevoice_article
🔥1
DeepSeek has just taken a massive leap forward with DeepSeek-V3.1 — a next-generation reasoning powerhouse designed for advanced problem-solving, coding, and tool-using capabilities.

Now, thanks to Unsloth AI, we have GGUF quantized versions that make this beast faster, lighter, and easier to run locally.

This model is built for:
Thinking Mode → Structured, step-by-step reasoning for complex tasks
🧠 128K Context → Handles large documents & long conversations
🛠 Tool-Calling Capabilities → Integrate APIs & functions seamlessly
💡 Optimized GGUFs → Lower VRAM usage, higher inference speed
📊 SOTA Performance → Competitive in math, coding, reasoning & agents

To help you get started, we’ve prepared a full step-by-step guide where we cover:
Installing & running DeepSeek-V3.1 GGUF with llama.cpp
Setting up CUDA acceleration for top performance
Using OpenAI-compatible APIs to connect your apps
Switching between Thinking & Non-Thinking Modes
Deploying a Streamlit-powered chat UI so you can prompt the model right from your browser

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-deepseek-v3-1-gguf-locally
🔥1
From Speech to Video – A New Era of Storytelling!

Imagine entering an audio of spoken words and instantly watching them transform into a captivating video. That’s the power of Speech-to-Video AI – revolutionizing creativity, content production, and accessibility.

Wan2.1, the popular one-of-its-kind model is eventually getting an upgrade and we have the newest Wan2.2 S2V in the town for seamless spech-to-video generation.

In our latest deep dive, we break down:
🔹 How it works
🔹 How to setup and run the model without facing errors
🔹 What are the system requirements to get the best possible results

🔗 Read the full article here: https://nodeshift.cloud/blog/transform-speech-into-cinematic-ai-videos-with-latest-wan2-2-s2v?utm_source=telegram&utm_medium=social&utm_campaign=speech_to_video_article
Hermes 4: The Open-Source Reasoning Powerhouse

Nous Research just dropped Hermes 4 70B, their flagship reasoning model built on top of Llama-3.1-70B — and it’s already turning heads.

What makes it special?
Hybrid reasoning with explicit <think> segments — choose between fast responses or deep, step-by-step deliberation
Massive gains in math, logic, coding, STEM, and creative writing
Schema-faithful outputs (valid JSON, structured responses)
Lower refusal rates + better steerability
Production-ready with function calling & tool use

On RefusalBench, Hermes 4 70B crushed frontier giants — even outperforming models many times its size in real-world reasoning and alignment.

We put Hermes 4 to the test on our GPU Nodes, and it runs seamlessly. Whether you’re deploying from the terminal or building a full Streamlit-powered chat UI, Hermes 4 adapts perfectly.

Checkout Full tutorial + benchmarks here: https://nodeshift.cloud/blog/refusalbench-showdown-how-hermes-4-crushed-frontier-giants
🔥1
Meet Parakeet-TDT-0.6B-v3 — NVIDIA’s multilingual ASR model (≈600M params) built on the FastConformer-TDT architecture. It auto-detects 25 European languages, returns punctuation + capitalization, and handles everything from short clips to multi-hour audio (with local attention) while staying lightweight enough for real-world deployments.

We just published a step-by-step guide on how you can install, run, and even build a Streamlit-powered app with NVIDIA Parakeet TDT 0.6B V3.

Here’s what you’ll learn:
Spin up a GPU VM on NodeShift
Clean Python env + PyTorch 2.4.1 (cu121) + NeMo 2.4.0 pins
Terminal sanity check with scripts (downloads model & transcribes)
Build a Streamlit web app with timestamp tables (word & segment)
GPU sizing table for short clips, long-form audio, and high-throughput setups
Practical tips: 16 kHz mono conversion, long-audio local attention, batching

You get production-grade multilingual transcription—fast to deploy, affordable to scale, and easy to demo in a browser.

Read the full guide: https://nodeshift.cloud/blog/how-to-install-run-nvidia-parakeet-tdt-0-6b-v3-locally
🔥1
Broken, hallucinating translation tools slowing your apps down & making a bad first-impression among your diverse users?

Well, a groundbreaking multilingual model is here: Hunyuan-MT-7B by Tencent, an open-source translation model that’s quickly catching eyes of AI developers worldwide. The reason is behind its powerful support for over 33 languages spoken worldwide, making this model one of its kind.

What it offers?
- Translates across 33 languages (including regional and minority ones like Marathi, Bengali, Polish, Cantonese & many, many more..)
- Got First place in 30/31 language categories at WMT25 – outperforming huge closed-source systems
- Comes with Hunyuan-MT-Chimera-7B, the world’s first open-source ensemble translation model for even higher accuracy

And the best part? Team has open sourced both of these models and you can now install & run it locally or scale it with NodeShift in just a few simple steps.

🔗 Checkout the full guide here: https://nodeshift.cloud/blog/how-to-install-hunyuan-mt-7b-locally-groundbreaking-machine-translation-model-for-33-languages?utm_source=telegram&utm_medium=social&utm_campaign=hunyuan_mt7b_blog
1
MiniCPM-V 4.5 is one of the most impressive open-source MLLMs out there—packing GPT-4o-level multimodal performance into just 8.7B parameters. Built on Qwen3-8B + SigLIP2-400M, it dominates OCR, document parsing, high-FPS video understanding, and multilingual vision reasoning—all while being lightweight.

We’ve just published a full-blown guide to help you install, run, and interact with MiniCPM-V 4.5.

Here’s what you’ll learn:
Spin up a NodeShift Cloud GPU VMs
Terminal-based Image & Video Inference
Streamlit Browser App with Full UI
Support for Image, Video, Multi-Turn Chat, and Deep Thinking Mode

This guide covers every step, no guesswork required.

Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-minicpm-v-4_5-locally

If you’re into multimodal models, vision-language applications, or just exploring what open-source LLMs can do—this one’s for you.
1
ByteDance just dropped USO — a unified model that finally brings style-driven and subject-driven image generation under one roof.

USO learns from triplets (content, style, stylized) with disentangled training (style-alignment + content–style separation) and a Style Reward Learning boost — plus a new joint benchmark, USO-Bench, to measure both style similarity and subject fidelity.

We just published a hands-on guide to run USO locally.

What’s inside the guide:
Full setup on a CUDA 12.x image (no guesswork)
Exact commands to clone, install, and pull weights
Env vars for LoRA + projector, and HF auth
One-liner inference for: subject-only, style-only, and style+subject (IP-style)
GPU configuration table (16 GB → 80 GB): what fits, what to tweak, and how to avoid OOM
Speed/quality tips: FP8/INT8, attention slicing, offload strategies

You don’t have to pick between “perfect style” or “faithful subject” anymore. With USO on top of FLUX.1, you can steer both — cleanly and predictably.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-bytedance-uso-locally
🔥1
Time to level up your Voice AI Apps with end-to-end speech conversations.

Step-Audio 2 is an end-to-end multi-modal large language model that doesn't just transcribe, it comprehends and reasons through what it hears.

It goes beyond basic transcription, grasping para-linguistic cues like tone and emotion, and even non-vocal information like background noise. Imagine truly intelligent speech conversations, advanced audio understanding, and responses that are contextually perfect for any scenario.

With features like Tool Calling and Multimodal RAG, Step-Audio2 taps into real-world knowledge to reduce hallucinations. It's open-source, performs at a state-of-the-art level!
We've put together a comprehensive guide on how to install Step-Audio 2 locally.

🔗 Read the full article here: https://nodeshift.cloud/blog/build-advanced-speech-to-speech-systems-with-step-audio-2?utm_source=telegram&utm_medium=social&utm_campaign=speech_to_speech_stepaudio2_launch
2👍1🤬1
Google just released EmbeddingGemma-300M — a lightweight, multilingual (100+ languages) embedding model built on Gemma 3/T5Gemma foundations. And…We’ve just published a step-by-step guide showing how to run it locally and build a fast semantic search index with FAISS.

Why it’s exciting
✔️ 300M params optimized for retrieval, classification, clustering, similarity, QA & code retrieval
✔️ 768-dim vectors with Matryoshka down-projections to 512/256/128
✔️ Runs via SentenceTransformers; FP32 / bfloat16 (no float16 activations)
✔️ Trained across 100+ languages; strong results on MTEB (English/Multilingual/Code)

Here’s What You’ll Learn

Spin up a GPU VM (I used 1× RTX A6000 on NodeShift) or run on CPU
Minimal script demo: encode query + docs → rank by similarity
Script for: batch-encode your corpus, MRL truncation, FAISS cosine
search
Tips for smaller vectors (128–512), batching, and deployment options

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-embeddinggemma-300m-locally
👍1🔥1
Microsoft just dropped Kosmos-2.5 — a multimodal “literate” model built to read text-heavy images.

It does two things out of the box:
<ocr> → OCR with spatially-aware text blocks (text + bounding boxes)
<md> → image → Markdown conversion for clean, structured docs

We’ve just published a step-by-step guide to run Kosmos-2.5 on a GPU VM and use it from a browser-based Streamlit WebUI.

What’s inside the guide
GPU setup on NodeShift (works on any cloud)
Precise GPU VRAM matrix (12–48 GB+) and memory levers (bf16, max_patches, FlashAttention-2)
Minimal Python scripts for <md> and <ocr>
One-click Streamlit WebUI to upload docs and get Markdown or OCR+boxes
Tips for large pages, long outputs, and batching

Why this matters
Turn messy receipts, invoices, forms, and scans into usable Markdown
Keep layout awareness with OCR bounding boxes for downstream parsing
Runs with Transformers ≥ 4.56 and standard PyTorch CUDA wheels

Try it
Spin up a GPU, follow the commands, and open the WebUI in your browser. You’ll be extracting Markdown or drawing OCR boxes in minutes.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-microsoft-kosmos-2-5-locally
1🔥1
Ever imagined turning a single image into a fully immersive 3D experience?

Tencent has launched HunyuanWorld-Voyager – a one of its kind, video diffusion framework that generates world-consistent 3D images and videos from just one image!

Unlike previous models, Voyager ensures frame-to-frame consistency, long-range exploration, and automated scene reconstruction, delivering stunning visuals and precise 3D geometry without manual 3D pipelines.

If you’re into creative multimedia projects, simulations, or large-scale dataset creation, Voyager opens up endless possibilities.

Check out our complete setup guide here: https://nodeshift.cloud/blog/how-to-install-hunyuanworld-voyager-create-stunning-3d-images-videos-from-a-single-image?utm_source=telegram&utm_medium=social&utm_campaign=hunyuanworld_voyager_launch
🔥2👍1🖕1
R-4B is trending on Hugging Face — another auto-thinking MLLM to watch.

What it is: R-4B is a multimodal large language model that automatically decides when to think step-by-step and when to answer directly. Through Bi-mode Annealing (build both skills) and Bi-mode Policy Optimization (switch at inference), it delivers strong reasoning without wasting compute. It now runs smoothly with vLLM for fast, scalable serving and exposes a simple thinking_mode control (auto / long / short).

Why it matters (benchmarks): R-4B shows SOTA-level results among <20B open models on multiple multimodal reasoning suites, edging out popular peers:
✔️ MMMU: 68.1 (vs Keye-VL-8B 66.8, InternVL3.5-4B 66.6, Qwen2.5-VL-7B 58.0)
✔️ MMStar: 73.1 (vs 72.8, 65.0, 64.1)
✔️ CharXiV (RQ): 56.8 (vs 40.0, 39.6, 42.5)
✔️ MathVerse-Vision: 64.9 (vs 40.8, 61.7, 41.2)
✔️ DynaMath: 39.5 (vs 35.3, 35.7, 20.1)
✔️ LogicVista: 59.1 (vs 50.6, 56.4, 44.5)

We just published a step-by-step guide to install & run R-4B on a GPU VM.

What’s inside (all methods, end-to-end):
✔️ Infra & env: Choose GPU/region/storage, use CUDA base image nvidia/cuda:12.1.1-devel-ubuntu22.04; set up Python 3.10 venv, PyTorch (cu121), core deps.
✔️ Transformers (single-GPU): FP32 load to avoid LayerNorm dtype bug; image+text chat with thinking_mode; optional BF16 + projector upcast for tight VRAM.
✔️ vLLM serve (recommended): Install via uv + build tools; vllm serve … --trust-remote-code (optional --enforce-eager); metrics & scale via --tensor-parallel-size.
✔️ API & quality: OpenAI-compatible cURL/Python, image_url, streaming, control thinking_mode; guide rails with system prompt, temperature/top_p, stop for </think>, revision pinning.
✔️ Ops: GPU sizing table for light/medium/heavy, troubleshooting (Python.h, OOM, dtype, ports), and prod tips (tmux/systemd, HF transfer acceleration).

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-r-4b-auto-thinking-model-locally
2🔥1