NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
DeepSeek AI releases DeepSeek-OCR — a next-gen Vision-Language OCR model!

DeepSeek-OCR is a cutting-edge vision-language model built on DeepSeek-VL-v2, designed for intelligent optical character recognition and document understanding.

It excels at turning complex images, scanned documents, and charts into clean, structured Markdown or text with incredible accuracy.

Specialties:
Context-aware multilingual OCR
FlashAttention 2 acceleration for high-speed GPU inference
Visual-text compression & layout reasoning
Converts entire documents, PDFs, and images into readable Markdown

What we covered in our latest tutorial:
Full step-by-step setup on a GPU VM (NodeShift Cloud)
Installing CUDA, Python 3.12, PyTorch 2.6.0 (CUDA 11.8)
Configuring FlashAttention 2
Running DeepSeek-OCR for image-to-markdown conversion

Read the complete setup & usage guide here: https://nodeshift.cloud/blog/how-to-install-run-deepseek-ocr-locally
1🔥1
How far can AI go in understanding the language of biology?
Meet the model that has already helped uncover a novel cancer therapy pathway, validated in living cells, proving that large language models can drive real biological discovery.

C2S-Scale-Gemma-27B - an innovative Gemma model developed by the collaboration of Yale University, Google Research, and Google DeepMind that can translate complex single-cell gene expression data into “cell sentences” that AI can understand.

Our latest guide walks you through how to install and deploy C2S-Scale-Gemma-27B on NodeShift Cloud, letting you explore AI-powered cell analysis, drug response prediction, and biomarker discovery, all from your own GPU setup.
🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-run-c2s-scale-gemma-2-27b-for-single-cell-biological-discovery?utm_source=telegram&utm_medium=social&utm_campaign=c2s_gemma2_blog
1🔥1
Arch-Router-1.5B is Katanemo’s compact, preference-aligned routing model that reads a conversation + your user-defined “routes” (domain/action pairs) and returns the single best route as clean JSON (e.g., {"route":"bug_fixing"}).

What’s special about it?
Transparent & controllable routing for multi-model stacks
Tiny footprint, low latency, production-oriented
Swap target models without retraining the router

We just published a step-by-step guide to get Arch-Router-1.5B running on a GPU VM and a browser-based Streamlit WebUI so you can play with routes live.

What this guide covers:
GPU configuration cheatsheet (FP16, int8/int4, vLLM)
End-to-end setup on a GPU VM (Ubuntu + CUDA + PyTorch)
Quickstart Python script (clean JSON outputs)
Streamlit WebUI to edit route sets & test conversations
Optional FastAPI microservice pattern for production
Tips on batching, quantization, and stability (attention masks, temp)
Troubleshooting + next steps for gateways/agents

If you’re building agents, gateways, or API proxies and want rock-solid preference routing, this will save you hours.

Read the full guide: https://nodeshift.cloud/blog/how-to-install-run-katanemo-arch-router-1-5b-locally
2🔥1
Tired of open models lagging behind proprietary ones?

Bee-8B-RL by Open-Bee changes the game. An 8B-parameter Multimodal LLM trained on the meticulously curated Honey-Data-15M corpus, built using their transparent HoneyPipe data curation framework.
Unlike noisy open datasets, Honey-Data-15M blends short and long Chain-of-Thought (CoT) reasoning over 15M clean, enriched samples that power Bee-8B-RL to deliver SOTA reasoning, visual understanding, and factual accuracy rivaling closed models like InternVL3.5-8B.

Now, you can run it locally, fast, efficient, and fully open.
In our latest guide, we show you how to install and run Bee-8B-RL on your own machine with NodeShift Cloud, unlocking a smooth, high-performance environment for experimentation, deployment, and innovation.

🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-and-run-bee-8b-rl-locally?utm_source=telegram&utm_medium=social&utm_campaign=bee8b_rl_launch
🔥2
Ai2 releases olmOCR-2-7B-1025-FP8 — an OCR-specialized Vision-Language Model built for real-world document intelligence!

olmOCR-2-7B-1025-FP8 is AllenAI’s powerful OCR VLM distilled from Qwen2.5-VL-7B-Instruct, fine-tuned on the olmOCR-mix-1025 dataset, and further optimized with GRPO reinforcement learning to handle math formulas, tables, long/tiny text, and noisy scans. With FP8 quantization (via llmcompressor), it achieves outstanding accuracy while drastically cutting memory usage — reaching ~82.4 ± 1.1 overall on olmOCR-Bench when paired with the official olmOCR toolkit (v0.4.0).

We’ve just published a brand-new step-by-step guide that shows you exactly how to install and run olmOCR-2-7B-1025-FP8 locally on a GPU-powered Virtual Machine using NodeShift Cloud.

In this guide, we cover:
Complete environment setup using NodeShift GPU VMs
Installing dependencies
Setting up and running the olmOCR pipeline
Generating high-accuracy Markdown outputs from scanned PDFs
Optimized GPU configurations for FP8 quantized inference

Whether you’re building large-scale document pipelines or experimenting with multimodal OCR models — this guide helps you deploy olmOCR seamlessly, from setup to high-throughput inference.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-olmocr-2-7b-1025-fp8-locally
2👍1
LLaDA2.0-Mini-Preview is a diffusion-style Mixture-of-Experts (MoE) model with 16B total parameters (~1.4B active) — built for strong reasoning and coding performance while keeping inference light. Only a small subset of experts fire per token, giving it near-7B quality with just ~1–2B-class compute. It supports tool use, 4K context, and runs seamlessly with transformers using trust_remote_code=True.

We just published a new step-by-step guide on how to deploy and run LLaDA2.0-Mini-Preview on NodeShift Cloud — from VM setup to browser-based interaction.

What this guide covers:
Creating a GPU Node on NodeShift Cloud
Installing CUDA, PyTorch, and essential dependencies
Running the model locally with a Python script
Launching an interactive Streamlit WebUI for chatting with the model
Detailed GPU configuration table for every VRAM tier

Whether you’re a developer, researcher, or enthusiast, this guide helps you get LLaDA2-Mini running smoothly — delivering powerful reasoning and coding performance at an affordable cost.

Read the full guide: https://nodeshift.cloud/blog/how-to-install-run-llada2-0-mini-preview-locally
2🔥2
Liquid AI has officially released its new LFM2-VL series, a next-generation family of multimodal (image + text) models that blend visual perception with deep language understanding. The lineup comes in three variants:

✔️ LFM2-VL-450M — lightweight and edge-optimized
✔️ LFM2-VL-1.6B — balanced for accuracy and efficiency
✔️ LFM2-VL-3B — advanced precision reasoning model

Each model combines Liquid AI’s SigLIP2 NaFlex vision encoder with powerful language backbones, supporting 512×512 image inputs, dynamic token scaling, and efficient bfloat16 inference. Whether you’re working on document OCR, visual QA, or detailed image captioning — this series delivers performance that scales with your hardware and needs.

We’ve just published a complete step-by-step guide to help you install and run all three models locally or on the NodeShift Cloud.

Here’s what we cover in this guide:
Model introductions, benchmark comparisons, and GPU configuration table
End-to-end setup on NodeShift GPU VM (with CUDA + Python 3.11)
Running LFM2-VL-450M via terminal and Gradio UI
Scaling up to LFM2-VL-1.6B and LFM2-VL-3B for advanced multimodal reasoning
Includes code snippets, installation commands, and sample outputs

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-liquidai-lfm2-vl-locally
1
Imagine creating minutes-long, high-quality 720p videos, all from text or a single image, right on your own machine.
That’s exactly what LongCat-Video (13.6B parameters) makes possible.

What it offers:
- Unified model for Text-to-Video, Image-to-Video, & Video-Continuation
- Generates smooth, coherent long videos with no color drift or frame drops
- Efficient inference powered by Block Sparse Attention
- Trained with multi-reward RLHF for cinematic realism

With NodeShift Cloud, you can now install, run, and scale LongCat-Video locally or on the cloud in just a few steps, unlocking studio-grade AI video generation for everyone.
🔗 Dive into the full guide here: https://nodeshift.cloud/blog/how-to-install-and-run-longcat-video-locally-generate-stunning-long-videos-with-ai?utm_source=telegram&utm_medium=social&utm_campaign=longcat_video_launch
2
Tired of slow, laggy OCR pipelines? LightOnOCR-1B emerges as a fast and lightweight open source OCR model that outpaces many well known OCRs on benchmarks.
With a Pixtral-based Vision Transformer and Qwen3 text decoder, it delivers end-to-end differentiable OCR, no external steps needed.
- 5× faster than dots.ocr
- Processes 493k pages/day for <$0.01 per 1,000 pages
- Handles math, tables, receipts, forms, and multi-column layouts effortlessly
- State-of-the-art accuracy (76.1 overall on Olmo-Bench)

You can now install and run it locally, right on your machine, with the help of the latest step-by-step guide powered by NodeShift Cloud.
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-and-run-lightonocr-1b-locally-the-fastest-open-ocr-model-for-document-understanding?utm_source=telegram&utm_medium=social&utm_campaign=lightonocr1b_launch
1🔥1
Datalab just released their next-generation OCR model — Chandra!

Chandra is a powerful vision-language OCR model built for precise document understanding. It doesn’t just extract text — it reconstructs full document layouts into clean Markdown, HTML, or JSON formats, handling tables, forms, diagrams, handwriting, math equations, and multi-column pages with ease.

Supporting over 40 languages, Chandra achieves an impressive 83.1% overall accuracy on the olmOCR benchmark, outperforming many open and commercial OCR systems.

We’ve just published a comprehensive guide that walks you through everything — from setting up Chandra on a GPU-powered NodeShift Cloud VM, installing dependencies, and running the model with Transformers and vLLM, to launching a full Streamlit web app for interactive document analysis in the browser.

Whether you’re a researcher, developer, or just passionate about document AI, this guide will help you get Chandra running end-to-end — from terminal to web UI.

Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-chandra-ocr-locally
1🔥1
Baidu's PaddleOCR-VL is the new SOTA vision-language model redefining document understanding and trending as one of the top OCR models along with big models like DeepSeek OCR.
This is a compact yet insanely capable OCR-VLM that blends:
- NaViT-style dynamic visual encoding
- ERNIE-4.5-0.3B language model
- Support for 109 languages
- Lightning-fast, resource-efficient inference

It doesn’t just read documents, it understands and explains them. From complex tables and formulas to multi-lingual text and charts, PaddleOCR-VL achieves state-of-the-art accuracy while staying lightweight enough for real-world deployment.
At NodeShift, we made it even easier to install, run, and benchmark PaddleOCR-VL locally, so you can experience its power without the complex setup friction.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-paddleocr-vl-locally?utm_source=telegram&utm_medium=social&utm_campaign=paddleocr-vl-launch
Kimi Linear by Moonshot AI is the Future of Scalable Attention!
Imagine handling 1 million tokens with 6× faster decoding and 75% less memory, that’s what Kimi Linear delivers.

Built on the groundbreaking Kimi Delta Attention (KDA), it redefines how we process long-context data with unmatched speed, efficiency, and precision.

In our latest guide, we break down how to install and run Kimi Linear locally so you can experience next-gen attention models firsthand, right from your own setup. If you're into LLM research, RL-style reasoning, or long-context applications, this one’s a must-try.
🔗 Read full detailed article here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-run-kimi-linear?utm_source=telegram&utm_medium=social&utm_campaign=kimi_linear_launch
3
Meet JanusCoderV-8B — the next leap in visual-programmatic intelligence!

Developed by InternLM, JanusCoderV-8B is an 8-billion-parameter multimodal model built on InternVL-3.5-8B, trained on the massive JANUSCODE-800K corpus. It’s designed to unify vision and code, enabling image-conditioned code generation, visually grounded edits, and UI-to-code translation — all in one model.

What makes it special?
It bridges the gap between visual context and programmatic logic.
Generates HTML/CSS, charts, and interactive elements directly from screenshots or design mockups.
Supports long-context outputs (up to 32K tokens) and runs smoothly on affordable GPUs using 8-bit or BF16 precision.

We’ve just published a new step-by-step guide:

How to Install & Run JanusCoderV-8B Locally — a complete walkthrough that covers:
Setting up a GPU-powered VM on NodeShift Cloud
Installing CUDA 12.1.1, Python 3.11, and PyTorch 2.5.1
Configuring the environment for multimodal inference
Running JanusCoderV-8B to generate image-based code and UI descriptions

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-januscoderv-8b-locally
🔥1
AMD Releases Nitro-E — A Lightweight Text-to-Image Diffusion Model

AMD has introduced Nitro-E, a highly efficient text-to-image diffusion model built on the E-MMDiT architecture (~304M parameters). It’s designed for fast, low-cost training and inference, making image generation accessible even on modest GPU setups.

Key Highlights:
🔹 Ultra-light architecture (~304M params) with EMMDiT backbone.
🔹 Base 512px model delivers quality in ~20 steps.
🔹 Distilled 512px variant generates great results in just 4 steps.
🔹 GRPO-tuned checkpoint for improved post-training image quality.
🔹 Fully compatible with both NVIDIA (CUDA) and AMD (ROCm) GPUs.

We’ve just published a complete step-by-step guide covering everything you need to install and run AMD Nitro-E locally.

Inside this guide, you’ll learn:
How to set up a GPU VM on NodeShift Cloud.
Environment setup with Python 3.11, CUDA 12.1, and required libraries.
Installation of PyTorch, Diffusers, and FlashAttention.
Running Nitro-E in multiple modes — base, distilled, and GRPO-tuned.
Generating your first AI image in just a few minutes.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-amd-nitro-e-locally
2🔥1
Media is too big
VIEW IN TELEGRAM
OpenAI just released GPT-OSS-Safeguard — a new era of open-source safety reasoning!

OpenAI’s GPT-OSS-Safeguard 20B and 120B models are purpose-built for Trust & Safety, trained to interpret your own policy text, explain moderation decisions, and let you control reasoning effort (low / medium / high).

🔹 20B → optimized for 16 GB-class GPUs, perfect for low-latency filters & offline labeling
🔹 120B → high-fidelity safety reasoning, fits a single H100 80 GB via MoE + MXFP4 quantization
🔹 Fully open-weight under Apache 2.0, built on the GPT-OSS family
🔹 Requires the Harmony response format for interpretable

We’ve just published a step-by-step guide covering everything you need to:
Deploy GPT-OSS-Safeguard Models
Pull & run the models via Ollama CLI
Launch Open WebUI for a visual chat experience
Explore reasoning depth, labeling workflows, and real-world policy checks

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-gpt-oss-safeguard-20b-and-120b-locally
2
Meet VieNeu-TTS, the first-ever Vietnamese Text-to-Speech model that runs entirely on your personal device.
Built on the Qwen 0.5B LLM and fine-tuned from NeuTTS Air, it delivers hyper-realistic, natural voices with real-time inference, no heavy hardware dependency.
If you’re building voice assistants, educational tools, or offline AI applications, VieNeu-TTS is a game-changer for anyone who values speed, privacy, and quality.
In this step-by-step guide, we show you how to install and run VieNeu-TTS locally, and experience the future of Vietnamese voice AI right away.
🔗 Read the full article here: https://nodeshift.cloud/blog/how-to-install-run-vieneu-tts-locally-the-first-realistic-vietnamese-voice-ai?utm_source=telegram&utm_medium=social&utm_campaign=vieneu_tts_launch
👍2
SoulX-Podcast-1.7B — Long-Form, Multi-Speaker TTS Is Here!

SoulX-Podcast-1.7B is a podcast-style TTS model built for long, multi-turn, multi-speaker dialogs. It supports English, Mandarin, and several Chinese dialects (Sichuanese, Henanese, Cantonese), performs zero-shot voice cloning from short clips, and even captures laughter, sighs, and emotions to make speech sound real.

It’s optimized for single-GPU inference, letting you generate entire podcast episodes with expressive delivery, natural tone, and dynamic speaker changes.

Key Highlights:
Long-form conversational TTS with speaker variation
Zero-shot cloning from short reference clips
Multilingual + multi-dialect support
Runs smoothly on 8–24 GB GPUs for smaller use cases
Perfect for podcasts, storytelling, or research in expressive speech

We just published a new step-by-step guide covering:
Complete NodeShift GPU setup (CUDA 12.1.1 devel image)
Python 3.11 + Conda environment setup
Installing PyTorch (cu121) and all dependencies
Pulling base + dialect models from Hugging Face
Running dialogue inference scripts
Launching the Gradio WebUI for real-time podcast generation

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-soulx-podcast-1-7b-locally
2
The Future of Expressive AI Voice Generation is here - Meet Maya1 by Maya Research
Imagine a voice model that can laugh, cry, whisper, or sigh, all from a single text prompt describing the type of voice you want.
That’s exactly what Maya1 by Maya Research has to offer - a 3B-parameter speech model built for rich emotional realism, natural language voice design, and real-time streaming using the SNAC neural codec.
With NodeShift cloud you can run it locally on a single GPU, open-source under Apache 2.0.

In our new article, we walk you through how to install and run Maya1 locally, so you can start crafting lifelike, emotionally aware AI voices for your own projects from podcasts to storytelling to research.

🔗 Dive in now → https://nodeshift.cloud/blog/how-to-install-and-run-maya1-locally-create-emotion-rich-ai-voices-in-minutes?utm_source=telegram&utm_medium=social&utm_campaign=maya1_launch&utm_content=blog_post
🔥21
Released just days ago, Aquif 3.5 Plus and Aquif 3.5 Max bring GPT-5-level intelligence, with next-generation reasoning power, and massive 1M-token context windows - all that can be easily run locally with NodeShift.

With hybrid reasoning modes, 3.3B active parameters, and multilingual support, Aquif 3.5 lets you toggle between speed and depth, from lightning-fast inference to deep scientific analysis.
And with NodeShift Cloud, you can deploy and run it effortlessly on your own hardware or custom GPU instances in minutes.

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-aquif-3-5-plus-max-locally-the-open-source-models-with-gpt-5-level-reasoning?utm_source=telegram&utm_medium=social&utm_campaign=aquif3-5plusmax_blog
3🔥1
You can now run Kimi K2 Thinking - the SOTA and most powerful open-source thinking agent model to date - fully locally!

This isn't just another chat model.
Kimi K2 Thinking by Moonshot AI can reason step-by-step, plan tasks, write code, and autonomously call tools - sustaining 200–300 sequential actions without losing direction.
And with the new GGUF quantized build by Unsloth AI, the massive 1T parameter model (1.09TB) is reduced to ~230GB - while retaining its deep reasoning performance. Meaning: you can actually run it with just a handful of H100s/H200s.

We just published a full guide showing how to install and run Kimi K2 Thinking locally:
• Setup requirements
• Setting up your local/NodeShift GPU environment
• Download & run GGUF with Llama.cpp
• Inference code for reasoning

If you're building autonomous agents, research copilots, or coding assistants, this is one model you’ll definitely want to try.

🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-and-run-kimi-k2-thinking-gguf-locally?utm_source=telegram&utm_medium=social&utm_campaign=kimi-k2-gguf-guide
🔥2
AI at Meta just dropped Omnilingual ASR – an open-source speech recognition model suite built to support 1,600+ languages, including many that have never had reliable ASR support before.

This family combines Wav2Vec2, CTC, and LLM-based architectures to deliver:
Scalable zero-shot transcription for new & low-resource languages
State-of-the-art accuracy with the flagship omniASR_LLM_7B (CER < 10% for ~80% of supported languages)
Easy integration with PyTorch, Fairseq2, and Hugging Face for real-world deployments

We’ve just published a new hands-on guide on how to run Omnilingual ASR on a GPU-powered VM using NodeShift Cloud.

What this guide covers:
GPU configuration recommendations for all Omnilingual ASR model variants
Step-by-step setup on a NodeShift GPU VM (Python 3.11, CUDA, libsndfile, dependencies)
Installing & running omniASR_LLM_7B with the official ASRInferencePipeline
A ready-to-use Gradio WebUI to upload audio (≤40s) and get instant multilingual transcripts
Practical workflow tips for teams, researchers & community language projects

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-omnilingual-asr-locally
1