NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
Liquid AI just dropped LFM2‑2.6B — a next‑generation hybrid model built for edge AI & on‑device deployment.

With 2.6 B parameters, multiplicative gates + short convolutions, and support for 8 languages, it’s one of the few open models designed to run smoothly on CPU, GPU, and even NPU hardware.

What you can build with it:
✔️ Lightweight tool‑calling agents that work offline or on your laptop
✔️ Data extraction & RAG workflows on private documents
✔️ Conversational assistants with multilingual support
✔️ Creative writing, summarization, etc

What’s inside our new guide
✔️ How to install & run LFM2‑2.6B locally with Transformers
✔️ How to serve it via vLLM for fast, scalable inference
✔️ How to build a minimal agent that calls functions (time, math, RAG) step‑by‑step
✔️ VRAM & GPU tips (BF16 vs. 4‑bit, FlashAttention‑2, sweet spots)

Read the full guide here: https://nodeshift.cloud/blog/pocket-operator-a-local-tool-calling-agent-powered-by-lfm2-2-6b
1👏1
HUGE RELEASE ALERT!
Qwen team has just dropped a major upgrade of Qwen2.5-VL, the most popular vision model in AI industry, which is used by many big players to fine-tune their domain specific vision models.

The newest version is Qwen3-VL, Alibaba’s new multimodal vision-language model that’s breaking benchmarks and expectations.
We just dropped a full guide on how to install and run Qwen3-VL Locally - step-by-step, clean, and fast.
🧠 Expect next-level multimodal understanding
🎥 Vision + Text synergy
⚡️ Lightning-fast inference with NodeShift

🔗 Read now: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-locally-a-step-by-step-guide?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-vl_announcement
4
Media is too big
VIEW IN TELEGRAM
IBM launches Granite 4.0-H — a family of long-context, tool-calling LLMs built for real work.

Three sizes, same DNA:
Micro-H (3B, 1M ctx): lightweight & snappy for JSON/IE, routing, short multilingual chat, FIM code.
Tiny-H (7B, 1M ctx): the sweet spot—stronger reasoning, multi-turn assistants, compact RAG, solid tool-calling.
Small-H (32B, 1M ctx): muscle for complex workflows, long-doc comprehension, higher-fidelity coding & analysis.

We just published a hands-on guide to get you productive fast:

What’s inside
Two setup paths: Ollama + Open WebUI (fast chats) & Transformers/vLLM (prod services)
GPU sizing tables for Micro/Tiny/Small + why we standardize on 1×H200
A mini benchmark/prompt pack to compare the three models
Tool-calling scripts (emit/parse <tool_call> and feed <tool_response>)
Minimal Python examples (BF16 & 4-bit) + sanity checks & troubleshooting

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-4-0-h-tiny-small-and-micro-locally
1🔥1
Big TTS models, running on heavy hardwares, still delivering robotic voices?

Forget them, as NeuTTS Air brings super-realistic, on-device voice AI with instant voice cloning, can run easily on CPUs, no heavy GPUs needed.
- Generate ultra-human voices in real-time
- Clone any speaker in just 3 seconds of audio
- Optimized for laptops, phones & even Raspberry Pis

NeuCodec-powered audio ensures crystal-clear quality with low power consumption.
TL;DR: It’s realistic speech + instant voice cloning + on-device performance, all in one compact model.

In our latest guide, we show you how to install and run NeuTTS Air locally, with NodeShift cloud making setup and GPU-accelerated deployment effortless, get lifelike voice AI running in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-neutts-air-locally-super-realistic-on-device-voice-ai-with-instant-voice-cloning?utm_source=telegram&utm_medium=social&utm_campaign=neutts_air_launch
3
This media is not supported in your browser
VIEW IN TELEGRAM
Meta AI just launched the Code World Model (CWM)!

The Code World Model (CWM) is a 32B parameter dense autoregressive LLM developed by the Meta FAIR CodeGen Team. Unlike traditional code models, CWM was mid-trained on Python execution traces, memory trajectories, and containerized agentic interactions—making it uniquely suited for reasoning about how code affects computational environments.

What’s special about CWM?
Mid-trained on real execution traces & agentic environments
Post-trained with multi-task RL for verifiable coding, math, and multi-turn software engineering
Research-only (non-commercial) release under FAIR license
Strong benchmark performance on Math-500, AIME, and SweBench

We just dropped a full step-by-step guide on:
🔹 Requesting gated access
🔹 Running on a NodeShift GPU VM
🔹 Serving with vLLM
🔹 Streamlit UI for interaction

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-facebook-cwm-locally
1🔥1
Still relying on slow, API based TTS that sounds robotic, can't run on small devices and breaks the bank?
Meet KaniTTS, the latest high-speed, high-fidelity voice AI that runs entirely on your device with just some basic GPU acceleration.

What makes it special:
- Powered by a 370M LLM + Neural Audio Codec for ultra-natural, real-time speech
- ~1 sec latency for 15 seconds of audio, perfect for chatbots, assistants & accessibility tools
- Multilingual: English, German, Chinese, Korean, Arabic & Spanish
- Runs locally with just 2-4GB GPU memory, no APIs, no data leaks, no lag

With NodeShift cloud, setting it up is effortless, GPU-optimized, ready-to-run, and privacy-first.
Get studio-quality speech generation right on your own hardware in minutes.
🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-and-run-kanitts-locally-real-time-on-device-voice-generation?utm_source=telegram&utm_medium=social&utm_campaign=kanitts_launch
2
The ModernVBERT team has just unleashed a compact 250M-parameter vision-language model, which is matching the performance of models up to 10x larger, performing way above its weight!

With state-of-the-art multimodal reasoning, advanced document retrieval capabilities, and seamless image + text understanding, ModernVBERT is your go-to model for next-level AI & RAG workflows.

We’ve published a step-by-step guide to install and run ModernVBERT locally - fast, clean, and ready for experimentation.
- Unlock multimodal intelligence
- Advanced visual document comprehension
- Optimized for lightning-fast local inference with NodeShift Cloud

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-modernvbert-compact-vlm-for-document-retrieval-in-rag-applications?utm_source=telegram&utm_medium=social&utm_campaign=modernvbert_launch
2🔥1
Media is too big
VIEW IN TELEGRAM
ServiceNow just released: Apriel-1.5-15B-Thinker

Apriel-1.5-15B-Thinker is an open-weights, multimodal reasoning model (image-text-to-text) focused on strong mid-training/continual pre-training plus high-quality text SFT—no RL required. It’s compact (15B) yet competitive with much larger models and designed to run on a single GPU.

We just published a step by step guide to install and run Apriel locally—plus a simple Streamlit UI so you can chat with the model and ask questions about images.

What the guide covers:
Picking a GPU + VRAM sizing tips
CUDA/PyTorch install (cu121) & env setup (Py 3.11)
One-file for text + vision with the correct dtype cast (BF16/FP16)
Optional Streamlit app (text & image tabs, sliders for temp/tokens)
Tuning for speed/VRAM (token limits, fp16, 8-bit options)

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-servicenow-apriel-1-5-15b-thinker-locally
🔥21
Microsoft just released UserLM-8B — a unique open-weights model that flips the script: instead of acting like an assistant, it simulates the user in a conversation.

What is UserLM-8B?
A fine-tuned Llama-3.1-8B model trained on WildChat-1M to generate realistic user turns—from the first query to multi-turn follow-ups—and it can even gracefully wrap up with a special <|endconversation|> token.

Why it’s special
Purpose-built for assistant evaluation & robustness testing
Great for synthetic dialogue data generation
More natural, diverse “user” behavior vs. prompting an assistant model to pretend

We just published a new guide “How to Install & Run Microsoft UserLM-8B Locally”

What’s inside:
GPU sizing + a practical VRAM table
Full setup on a GPU VM (NodeShift example)
Ready-to-run scripts
Guardrails for realistic simulations (stop tokens, end-of-conversation handling)
Tips to plug UserLM into your own assistant for end-to-end testing

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-microsoft-userlm-8b-locally
🔥2
Wondering how can you test if your AI model is safe or behaves unpredictably under pressure?
Anthropic has just released a non-negotiable tool for AI safety - Petri (Parallel Exploration Tool for Risky Interactions). Petri is an open-source tool that automates AI behavior testing through multi-turn conversations, simulated environments, and detailed scoring across safety dimensions.

With Petri, you can:
- Run alignment tests on any model - deception, reward hacking, & more
- Automate hundreds of behavioral evaluations in minutes
- Get structured insights and transcripts for deeper analysis & benchmarking

And with NodeShift Cloud, you can install and run Petri locally, easily, securely, and with zero setup friction.
In our latest guide, we’ll cover:
🔹 How to install and set up Petri locally
🔹 How to setup local model for auditing with Ollama as the API
🔹 How to run your first automated safety audit
🔹 How to provide seed instructions and interpret transcripts

🔗 Read full guide here: https://nodeshift.cloud/blog/how-to-install-run-anthropics-petri-locally-the-easiest-way-to-audit-ai-models-for-safety?utm_source=telegram&utm_medium=social&utm_campaign=petri_ai_audit
🔥21
Qwen just dropped: Qwen3-VL-30B-A3B-Thinking

A powerhouse multimodal model built on a Mixture-of-Experts stack—designed for deep text + vision + video reasoning, long-context understanding (256K→1M), robust OCR (32 languages), GUI/tool use, and even converting diagrams/screens into working code.

We’ve published a fresh, hands-on guide:
“How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally” — tuned for a GPU VM workflow (We used NodeShift, but it works anywhere).

What’s inside the guide
Clean environment setup (CUDA-aligned PyTorch, optional FlashAttention-2)
Image & video inference
“Thinking” variant notes + practical VRAM plans (single-/multi-GPU)
Troubleshooting (FA2 mismatches, SDPA fallback)
Ready-to-copy commands & code blocks for Jupyter/terminal

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-30b-a3b-thinking-locally
🔥21
Media is too big
VIEW IN TELEGRAM
AI21 Labs just launched Jamba Reasoning 3B — a compact, hybrid Transformer–Mamba model built for serious reasoning on modest hardware.

Why it’s special
~3B params (26 Mamba + 2 Attention) → fast, memory-light, edge-friendly
256K context without the usual KV-cache blow-up
Strong benchmarks: IFBench 52.0, Humanity’s Last Exam 6.0, MMLU-Pro 61.0
On-device speed that holds up as context grows (≈43–44 tok/s at 16–32K)

We just published a new step-by-step guide:
“How to Install & Run AI21-Jamba-Reasoning-3B Locally (GPU VM)”

What’s inside
Pick the right GPU & VRAM (rule-of-thumb table)
Clean setup on a CUDA 12.1.1 image (Python 3.11, Torch cu121)
vLLM serving (OpenAI-compatible) with the right flags for Mamba SSM
Transformers alternative path + FlashAttention 2 tips
A one-file Streamlit UI to chat with the model on your own server

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ai21-jamba-reasoning-3b-locally
🔥21
VNGRS Releases Kumru-2B — A Turkish-Native Lightweight Language Model

VNGRS has officially released Kumru-2B, a compact yet powerful Turkish-native LLM built entirely from scratch. Trained on ~500 GB of curated text (≈300B tokens) and fine-tuned on over 1M supervised examples, Kumru-2B is designed specifically for the Turkish language — featuring a modern 50K-token Turkish-optimized tokenizer, 8K context window, and native support for math and code.

Why Kumru-2B is Special
Built from scratch for Turkish — not a multilingual adaptation.
Efficient tokenizer: uses ~40% fewer tokens than multilingual models like GPT-4o or Gemma.
Punches above its weight — outperforms much larger models like Llama-3.3-70B and Qwen2-72B on Turkish-centric tasks.
Runs smoothly on local or cloud GPUs, making it ideal for research, startups, and developers.

In our latest blog, we walk you through everything you need to:
Deploy a GPU-powered VM on NodeShift Cloud
Install Python 3.11 + CUDA 12.1.1 environment
Run the model with a simple Python script
Launch an interactive Streamlit WebUI to chat with Kumru-2B directly in your browser

Whether you’re building NLP tools, studying Turkish linguistics, or experimenting with LLMs, this guide helps you get started in minutes.

Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-vngrs-ai-kumru-2b-locally
🔥21
OCR needs has evolved beyond just extracting text, enterprises now need the OCR that can understand the documents and turns them into structured, AI-ready markdown.
That’s why Nanonets-OCR2 by Nanonets is a game-changer for anyone working with scanned docs, academic papers, business reports, invoices, or forms etc.

What can it do?
Converts mathematical equations to LaTeX
Describes images using structured <img> tags
Detects signatures & watermarks
Handles checkboxes, radio buttons, and complex tables
Extracts flowcharts & org charts as Mermaid code
Supports handwritten documents and multiple languages
Provides Visual Question Answering (VQA) directly from the document

We’ve just published a complete guide to install and run Nanonets-OCR2 locally or in GPU accelerated environment with NodeShift Cloud for continuous delivery, so you can start automating document workflows with full control and scalability.
🔗 Read the guide here: https://nodeshift.cloud/blog/convert-documents-to-structured-markdown-html-with-nanonets-ocr2?utm_source=telegram&utm_medium=social&utm_campaign=nanonets_ocr2_guide
2
The wait is over, now you could run Korea’s first fully open source 10B-parameter AI model - right on your machine!

Meet KORMo-10B-sft, a 10B-parameter bilingual Korean-English LLM built entirely from scratch and released 100% open-source - weights, code, and even training data.
Developed by KAIST's MLP Lab, KORMo sets a new benchmark for transparency, reproducibility, and real-world performance - bridging the gap between open research and applied AI specially in non-english domains.

In our latest article, we break down how to install and run KORMo-10B-sft locally, explore its most powerful features, and show how NodeShift Cloud makes deploying massive open models effortless, from Colab to production GPUs.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-kormo-the-first-fully-open-source-korean-english-llm?utm_source=telegram&utm_medium=social&utm_campaign=kormo_10b_launch
2
Liquid AI just dropped something special — the LFM2-8B-A1B model is here!

This new on-device-friendly Mixture-of-Experts (MoE) model packs 8.3B total parameters (only 1.5B active!) and blends 18 convolutional LIV layers + 6 GQA attention layers for hybrid speed and quality. It supports 32K context length, runs smoothly even on modest GPUs, and rivals much larger 3–4B dense models in performance — perfect for agentic tasks, RAG, data extraction, and multi-turn reasoning.

We’ve just published a step-by-step installation and setup guide for LFM2-8B-A1B, where we walk through everything — from spinning up a GPU VM on NodeShift Cloud to running the model locally using Transformers.

Here’s what we covered in the guide:
Model benchmarks, specs, and comparison tables
Full environment setup (CUDA, Python, PyTorch)
Hugging Face authentication and correct Transformers commit
Script to run the model locally
GPU configuration cheatsheet for every use case

Check out the complete guide here: https://nodeshift.cloud/blog/how-to-install-run-lfm2-8b-a1b-locally
1🔥1
Kwaipilot just dropped KAT-Dev-72B-Exp — their most ambitious open-source coder yet. It’s a 72B-parameter, RL-tuned LLM built for software engineering, debugging, and automated code reasoning—the experimental sibling of the proprietary KAT-Coder.

Benchmark highlight: On SWE-Bench Verified, KAT-Dev-72B-Exp hits 74.6% when evaluated strictly with the SWE-agent scaffold.

What’s inside the guide
Fast setup on a GPU VM (NodeShift-style, works anywhere)
Transformers BF16 quickstart + multi-GPU tips
4-bit (bitsandbytes) single-GPU recipe for tight VRAM
A polished Streamlit web UI to chat in the browser
vLLM/TGI notes for production-grade serving & throughput
VRAM & storage planning for 72B (quantized vs full-precision)
SWE-agent eval knobs (temp=0.6, max_turns=150, history=100)
“Hard-mode” prompts to stress test reasoning & code repair

If you care about long-context debugging, multi-turn repair, and RL-hardened coding agents, this one’s for you.

Check out the complete guide here: https://nodeshift.cloud/blog/how-to-install-run-kat-dev-72b-exp-locally
🔥21
Following the global launch of the Qwen3-VL series, which redefined multimodal AI with its vision-language fusion and massive context capabilities, the new Qwen3-VL-4B and 8B-Thinking editions take a sharper turn toward intelligence per parameter.
These smaller, more efficient models have the same deep multimodal understanding as their larger counterparts - but now enhanced with a “Thinking” mode that lets them reason, plan, and act with remarkable depth.

From generating code from screenshots to understanding complex STEM visuals and long videos, they deliver cognitive precision in a lightweight footprint you can actually run locally.

We’ve just published a step-by-step guide on how to install and run Qwen3-VL-Thinking locally, fully optimized with NodeShift Cloud.
- Small models, big reasoning power
- Thinking-enhanced multimodal intelligence
- Instant GPU environments, no setup needed

🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-qwen3-vl-4b-8b-thinking-locally?utm_source=telegram&utm_medium=social&utm_campaign=qwen3_vl_thinking_launch
2🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
AI at Meta just released MobileLLM-Pro — their new 1.08B-parameter on-device language model!

MobileLLM-Pro is built for speed, efficiency, and privacy, bringing large-model intelligence directly to phones, edge accelerators, and low-VRAM GPUs. It features:
🔹 128k context window for long-form understanding
🔹 Local-global attention (3:1) for faster prefill & smaller KV cache
🔹 Near-lossless int4 quantization
🔹 Base & instruction-tuned variants
🔹 Competitive accuracy vs Gemma 3 1B and Llama 3.2 1B

We’ve just published a complete step-by-step guide on how to install, configure, and run MobileLLM-Pro locally.

In this guide, you’ll learn how to:
🔹 Set up a CUDA-based GPU VM on NodeShift
🔹 Install Python 3.11, PyTorch CUDA, and key dependencies
🔹 Authenticate with Hugging Face for the gated model
🔹 Run the base inference script directly in terminal
🔹 Build a browser chat interface

Check out the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-facebook-mobilellm-pro-locally
1🔥1
Media is too big
VIEW IN TELEGRAM
Meet LangCode, the next-gen multi-LLM coding agent that brings Gemini, Claude, OpenAI, and Ollama together, right inside your local terminal.

LangChain-code or LangCode in short, serves as an AI-powered development environment with:
- Deep and ReAct modes for fast or complex reasoning
- Safe, reviewable code diffs before every change
- Smart routing to pick the best LLM for each task
- MCP-based tool integrations and customizable project rules

And with NodeShift Cloud, you can install and run LangCode locally, effortlessly, securely, and with zero setup friction.
In our latest guide, you’ll learn:
🔹 How to install and configure LangCode locally
🔹 How to launch its interactive coding interface
🔹 How to enable Local LLM setup with Ollama
🔹 How to start building faster, safer, and smarter with AI

🔗 Read the full guide here: https://nodeshift.cloud/blog/build-faster-safer-with-langcode-your-ultimate-multi-llm-local-ai-copilot?utm_source=telegram&utm_medium=social&utm_campaign=langcode_guide
2
DeepSeek AI releases DeepSeek-OCR — a next-gen Vision-Language OCR model!

DeepSeek-OCR is a cutting-edge vision-language model built on DeepSeek-VL-v2, designed for intelligent optical character recognition and document understanding.

It excels at turning complex images, scanned documents, and charts into clean, structured Markdown or text with incredible accuracy.

Specialties:
Context-aware multilingual OCR
FlashAttention 2 acceleration for high-speed GPU inference
Visual-text compression & layout reasoning
Converts entire documents, PDFs, and images into readable Markdown

What we covered in our latest tutorial:
Full step-by-step setup on a GPU VM (NodeShift Cloud)
Installing CUDA, Python 3.12, PyTorch 2.6.0 (CUDA 11.8)
Configuring FlashAttention 2
Running DeepSeek-OCR for image-to-markdown conversion

Read the complete setup & usage guide here: https://nodeshift.cloud/blog/how-to-install-run-deepseek-ocr-locally
1🔥1