GLM 4.6, the latest release from Zai Org is an AI model that reasons, codes, and acts with unmatched power against some well known names like DeepSeek V3.1 Terminus and Claude Sonnet 4 .
Built on the next-gen GLM-4.6 foundation, it brings:
- 200K token context window – tackle complex tasks like never before
- Superior coding & agent performance – from Claude Code to Roo Code
- Advanced reasoning & tool use – stronger, smarter, more capable agents
- Refined human-aligned writing – natural style and role-playing scenarios
Our latest publish walks you through how to install & run GLM-4.6 locally or on GPU-accelerated environments with copy-paste ready steps.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-glm-4-6?utm_source=telegram&utm_medium=social&utm_campaign=glm46_launch
Built on the next-gen GLM-4.6 foundation, it brings:
- 200K token context window – tackle complex tasks like never before
- Superior coding & agent performance – from Claude Code to Roo Code
- Advanced reasoning & tool use – stronger, smarter, more capable agents
- Refined human-aligned writing – natural style and role-playing scenarios
Our latest publish walks you through how to install & run GLM-4.6 locally or on GPU-accelerated environments with copy-paste ready steps.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-glm-4-6?utm_source=telegram&utm_medium=social&utm_campaign=glm46_launch
NodeShift Cloud
How to Install and Run GLM 4.6
In the fast-paced industry of AI, where models are no longer just tools but collaborators in reasoning, coding, and agentic decision-making, GLM-4.6 emerges as a significant advancement. Building upon the strengths of GLM-4.5, this latest release expands…
❤3
Kwaipilot just released KAT-Dev-32B — a powerful open-source coding assistant
KAT-Dev-32B (Kwaipilot/KAT-Dev) is a 32.8B-parameter coding assistant based on Qwen3-32B, purpose-tuned for software engineering tasks.
It’s trained in three stages — mid-training (core skills), SFT + RFT (teacher trajectories), and large-scale agentic RL (prefix caching + trajectory pruning + scalable infra).
On SWE-Bench Verified, KAT-Dev-32B achieves comparable performance with 62.4% resolved and ranks 5th among all open-source models with different scales.
We just published a step-by-step guide on how to set up and run KAT-Dev-32B on a GPU-powered NodeShift VM.
In this guide, we cover:
✅ GPU configuration requirements (single-GPU, multi-GPU, quantized setups)
✅ Step-by-step process to launch a NodeShift GPU VM
✅ Setting up JupyterLab with CUDA & PyTorch ready-to-go
✅ Installing libraries (Torch, Transformers, Accelerate, Einops)
✅ Running KAT-Dev interactively inside a notebook
✅ Generating your first response with the model
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-kat-dev-locally
KAT-Dev-32B (Kwaipilot/KAT-Dev) is a 32.8B-parameter coding assistant based on Qwen3-32B, purpose-tuned for software engineering tasks.
It’s trained in three stages — mid-training (core skills), SFT + RFT (teacher trajectories), and large-scale agentic RL (prefix caching + trajectory pruning + scalable infra).
On SWE-Bench Verified, KAT-Dev-32B achieves comparable performance with 62.4% resolved and ranks 5th among all open-source models with different scales.
We just published a step-by-step guide on how to set up and run KAT-Dev-32B on a GPU-powered NodeShift VM.
In this guide, we cover:
✅ GPU configuration requirements (single-GPU, multi-GPU, quantized setups)
✅ Step-by-step process to launch a NodeShift GPU VM
✅ Setting up JupyterLab with CUDA & PyTorch ready-to-go
✅ Installing libraries (Torch, Transformers, Accelerate, Einops)
✅ Running KAT-Dev interactively inside a notebook
✅ Generating your first response with the model
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-kat-dev-locally
NodeShift Cloud
How to Install & Run KAT-Dev Locally?
KAT-Dev-32B (Kwaipilot/KAT-Dev) is a 32.8B-parameter coding assistant based on Qwen3-32B, purpose-tuned for software engineering. It’s trained in three phases—mid-training (core skills), SFT + RFT (curated tasks with teacher trajectories), and large-scale…
❤1🔥1
MinerU2.5-2509-1.2B — A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
MinerU2.5 is a compact 1.2B VLM with a smart two-stage, coarse-to-fine pipeline (global layout → native-res crops) that delivers state-of-the-art doc parsing with low compute. On OmniDocBench it tops the charts—overall 90.67, leading on Text (95.34), Formula (88.46), Table (88.22), and Reading Order (96.62)—outperforming many larger OCR/VLM systems.
What’s inside our new guide
✅ Setup end-to-end on a GPU VM (we demo with NodeShift, works anywhere)
✅ Two paths: Transformers (simple) & vLLM (fast + scalable, async engine ready)
✅ Copy-paste scripts to run two_step_extract() on your pages
✅ VRAM sizing & perf tips (quantization, token budgets, image sizing)
✅ Outputs you can use: structured blocks → Markdown, tables, formulas
Read the guide here: https://nodeshift.cloud/blog/how-to-install-run-mineru2-5-2509-1-2b-locally
MinerU2.5 is a compact 1.2B VLM with a smart two-stage, coarse-to-fine pipeline (global layout → native-res crops) that delivers state-of-the-art doc parsing with low compute. On OmniDocBench it tops the charts—overall 90.67, leading on Text (95.34), Formula (88.46), Table (88.22), and Reading Order (96.62)—outperforming many larger OCR/VLM systems.
What’s inside our new guide
✅ Setup end-to-end on a GPU VM (we demo with NodeShift, works anywhere)
✅ Two paths: Transformers (simple) & vLLM (fast + scalable, async engine ready)
✅ Copy-paste scripts to run two_step_extract() on your pages
✅ VRAM sizing & perf tips (quantization, token budgets, image sizing)
✅ Outputs you can use: structured blocks → Markdown, tables, formulas
Read the guide here: https://nodeshift.cloud/blog/how-to-install-run-mineru2-5-2509-1-2b-locally
NodeShift Cloud
How to Install & Run MinerU2.5-2509-1.2B Locally?
MinerU2.5 is a 1.2B-parameter vision-language model purpose-built for high-resolution document parsing. It uses a two-stage, coarse-to-fine pipeline—fast global layout on a downsampled page, then native-resolution crop recognition for text, tables, and formulas—to…
🔥2❤1
Struggling to get AI assistants to follow complex instructions or handle multilingual tasks?
IBM's Granite-4.0-Micro is here to help enterprises with instruction-following LLMs. This 3B-parameter mini-package brings:
- Accurate summarization & text extraction
- Question-answering & Retrieval-Augmented Generation (RAG)
- Code completions & function-calling tasks
- Multilingual dialog support across 13+ languages
If you’re building AI agents, automating enterprise workflows, or experimenting with advanced LLMs, Granite-4.0-Micro delivers the flexibility and precision you need once you fine-tune or customize it with your own data.
And with NodeShift Cloud, setup, deployment, and scaling are effortless, secure, and GPU-accelerated for enterprises thinking about long term stability.
Here’s a latest demo guide from us for installing & running Granite-4.0-Micro locally:
🔗 Link: https://nodeshift.cloud/blog/get-started-with-ibm-granite-4-0-micro-for-enterprise-rag-summarization-qa-code-tasks?utm_source=telegram&utm_medium=social&utm_campaign=granite_4_micro_launch
IBM's Granite-4.0-Micro is here to help enterprises with instruction-following LLMs. This 3B-parameter mini-package brings:
- Accurate summarization & text extraction
- Question-answering & Retrieval-Augmented Generation (RAG)
- Code completions & function-calling tasks
- Multilingual dialog support across 13+ languages
If you’re building AI agents, automating enterprise workflows, or experimenting with advanced LLMs, Granite-4.0-Micro delivers the flexibility and precision you need once you fine-tune or customize it with your own data.
And with NodeShift Cloud, setup, deployment, and scaling are effortless, secure, and GPU-accelerated for enterprises thinking about long term stability.
Here’s a latest demo guide from us for installing & running Granite-4.0-Micro locally:
🔗 Link: https://nodeshift.cloud/blog/get-started-with-ibm-granite-4-0-micro-for-enterprise-rag-summarization-qa-code-tasks?utm_source=telegram&utm_medium=social&utm_campaign=granite_4_micro_launch
NodeShift Cloud
Get Started with IBM’s Granite-4.0-Micro for Enterprise RAG, Summarization, QA, & Code Tasks
In an era where AI-driven applications are rapidly transforming enterprises and research workflows, having a model that can intelligently understand and execute complex instructions is more critical than ever. IBM has launched its latest model series, Granite…
❤2🔥1
What if AI doesn't generate just basic transcriptions, instead understands the audio and describe it with human-level depth?
Meet Qwen3-Omni-30B-A3B-Captioner, a powerful audio captioning model that generates fine-grained, low-hallucination captions across any soundscape.
From multilingual speech and layered emotions to environmental noise, music, and cinematic effects, it delivers detailed, context-aware audio descriptions without requiring extra prompts.
And the best part? With NodeShift Cloud, you can install, run, and start experimenting instantly in a CUDA-ready environment, no complex setup, just speed and scale in minutes.
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-qwen3-omni-captioner-for-accurate-audio-captioning?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-omni-captioner
Meet Qwen3-Omni-30B-A3B-Captioner, a powerful audio captioning model that generates fine-grained, low-hallucination captions across any soundscape.
From multilingual speech and layered emotions to environmental noise, music, and cinematic effects, it delivers detailed, context-aware audio descriptions without requiring extra prompts.
And the best part? With NodeShift Cloud, you can install, run, and start experimenting instantly in a CUDA-ready environment, no complex setup, just speed and scale in minutes.
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-qwen3-omni-captioner-for-accurate-audio-captioning?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-omni-captioner
NodeShift Cloud
How to Install Qwen3-Omni-Captioner For Accurate Audio Captioning
The AI world has been waiting for a breakthrough in general-purpose audio captioning, and Qwen3-Omni-30B-A3B-Captioner finally delivers it. Built on the powerful Qwen3-Omni-30B-A3B-Instruct backbone, this model is specifically fine-tuned to generate rich…
❤2🔥2
Liquid AI just dropped LFM2‑2.6B — a next‑generation hybrid model built for edge AI & on‑device deployment.
With 2.6 B parameters, multiplicative gates + short convolutions, and support for 8 languages, it’s one of the few open models designed to run smoothly on CPU, GPU, and even NPU hardware.
What you can build with it:
✔️ Lightweight tool‑calling agents that work offline or on your laptop
✔️ Data extraction & RAG workflows on private documents
✔️ Conversational assistants with multilingual support
✔️ Creative writing, summarization, etc
What’s inside our new guide
✔️ How to install & run LFM2‑2.6B locally with Transformers
✔️ How to serve it via vLLM for fast, scalable inference
✔️ How to build a minimal agent that calls functions (time, math, RAG) step‑by‑step
✔️ VRAM & GPU tips (BF16 vs. 4‑bit, FlashAttention‑2, sweet spots)
Read the full guide here: https://nodeshift.cloud/blog/pocket-operator-a-local-tool-calling-agent-powered-by-lfm2-2-6b
With 2.6 B parameters, multiplicative gates + short convolutions, and support for 8 languages, it’s one of the few open models designed to run smoothly on CPU, GPU, and even NPU hardware.
What you can build with it:
✔️ Lightweight tool‑calling agents that work offline or on your laptop
✔️ Data extraction & RAG workflows on private documents
✔️ Conversational assistants with multilingual support
✔️ Creative writing, summarization, etc
What’s inside our new guide
✔️ How to install & run LFM2‑2.6B locally with Transformers
✔️ How to serve it via vLLM for fast, scalable inference
✔️ How to build a minimal agent that calls functions (time, math, RAG) step‑by‑step
✔️ VRAM & GPU tips (BF16 vs. 4‑bit, FlashAttention‑2, sweet spots)
Read the full guide here: https://nodeshift.cloud/blog/pocket-operator-a-local-tool-calling-agent-powered-by-lfm2-2-6b
NodeShift Cloud
Pocket Operator: A Local, Tool-Calling Agent Powered by LFM2-2.6B
LFM2-2.6B by Liquid AI is a next-generation hybrid model designed for edge AI and on-device deployment. With 2.6B parameters, it combines multiplicative gates and short convolutions for high efficiency, speed, and quality. The model supports eight major languages…
❤1👏1
HUGE RELEASE ALERT!
Qwen team has just dropped a major upgrade of Qwen2.5-VL, the most popular vision model in AI industry, which is used by many big players to fine-tune their domain specific vision models.
The newest version is Qwen3-VL, Alibaba’s new multimodal vision-language model that’s breaking benchmarks and expectations.
We just dropped a full guide on how to install and run Qwen3-VL Locally - step-by-step, clean, and fast.
🧠 Expect next-level multimodal understanding
🎥 Vision + Text synergy
⚡️ Lightning-fast inference with NodeShift
🔗 Read now: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-locally-a-step-by-step-guide?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-vl_announcement
Qwen team has just dropped a major upgrade of Qwen2.5-VL, the most popular vision model in AI industry, which is used by many big players to fine-tune their domain specific vision models.
The newest version is Qwen3-VL, Alibaba’s new multimodal vision-language model that’s breaking benchmarks and expectations.
We just dropped a full guide on how to install and run Qwen3-VL Locally - step-by-step, clean, and fast.
🧠 Expect next-level multimodal understanding
🎥 Vision + Text synergy
⚡️ Lightning-fast inference with NodeShift
🔗 Read now: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-locally-a-step-by-step-guide?utm_source=telegram&utm_medium=social&utm_campaign=qwen3-vl_announcement
NodeShift Cloud
How to Install & Run Qwen3-VL Locally: A Step-By-Step Guide
The world of multimodal AI just received a major upgrade of Qwen2.5-VL, the most popular open-source vision model till now. Qwen3-VL, is the newest and most capable vision-language model in the Qwen family. Designed to understand, reason, and act across text…
❤4
Media is too big
VIEW IN TELEGRAM
IBM launches Granite 4.0-H — a family of long-context, tool-calling LLMs built for real work.
Three sizes, same DNA:
✅ Micro-H (3B, 1M ctx): lightweight & snappy for JSON/IE, routing, short multilingual chat, FIM code.
✅ Tiny-H (7B, 1M ctx): the sweet spot—stronger reasoning, multi-turn assistants, compact RAG, solid tool-calling.
✅ Small-H (32B, 1M ctx): muscle for complex workflows, long-doc comprehension, higher-fidelity coding & analysis.
We just published a hands-on guide to get you productive fast:
What’s inside
✅ Two setup paths: Ollama + Open WebUI (fast chats) & Transformers/vLLM (prod services)
✅ GPU sizing tables for Micro/Tiny/Small + why we standardize on 1×H200
✅ A mini benchmark/prompt pack to compare the three models
✅ Tool-calling scripts (emit/parse <tool_call> and feed <tool_response>)
✅ Minimal Python examples (BF16 & 4-bit) + sanity checks & troubleshooting
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-4-0-h-tiny-small-and-micro-locally
Three sizes, same DNA:
✅ Micro-H (3B, 1M ctx): lightweight & snappy for JSON/IE, routing, short multilingual chat, FIM code.
✅ Tiny-H (7B, 1M ctx): the sweet spot—stronger reasoning, multi-turn assistants, compact RAG, solid tool-calling.
✅ Small-H (32B, 1M ctx): muscle for complex workflows, long-doc comprehension, higher-fidelity coding & analysis.
We just published a hands-on guide to get you productive fast:
What’s inside
✅ Two setup paths: Ollama + Open WebUI (fast chats) & Transformers/vLLM (prod services)
✅ GPU sizing tables for Micro/Tiny/Small + why we standardize on 1×H200
✅ A mini benchmark/prompt pack to compare the three models
✅ Tool-calling scripts (emit/parse <tool_call> and feed <tool_response>)
✅ Minimal Python examples (BF16 & 4-bit) + sanity checks & troubleshooting
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ibm-granite-4-0-h-tiny-small-and-micro-locally
❤1🔥1
Big TTS models, running on heavy hardwares, still delivering robotic voices?
Forget them, as NeuTTS Air brings super-realistic, on-device voice AI with instant voice cloning, can run easily on CPUs, no heavy GPUs needed.
- Generate ultra-human voices in real-time
- Clone any speaker in just 3 seconds of audio
- Optimized for laptops, phones & even Raspberry Pis
NeuCodec-powered audio ensures crystal-clear quality with low power consumption.
TL;DR: It’s realistic speech + instant voice cloning + on-device performance, all in one compact model.
In our latest guide, we show you how to install and run NeuTTS Air locally, with NodeShift cloud making setup and GPU-accelerated deployment effortless, get lifelike voice AI running in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-neutts-air-locally-super-realistic-on-device-voice-ai-with-instant-voice-cloning?utm_source=telegram&utm_medium=social&utm_campaign=neutts_air_launch
Forget them, as NeuTTS Air brings super-realistic, on-device voice AI with instant voice cloning, can run easily on CPUs, no heavy GPUs needed.
- Generate ultra-human voices in real-time
- Clone any speaker in just 3 seconds of audio
- Optimized for laptops, phones & even Raspberry Pis
NeuCodec-powered audio ensures crystal-clear quality with low power consumption.
TL;DR: It’s realistic speech + instant voice cloning + on-device performance, all in one compact model.
In our latest guide, we show you how to install and run NeuTTS Air locally, with NodeShift cloud making setup and GPU-accelerated deployment effortless, get lifelike voice AI running in minutes.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-neutts-air-locally-super-realistic-on-device-voice-ai-with-instant-voice-cloning?utm_source=telegram&utm_medium=social&utm_campaign=neutts_air_launch
NodeShift Cloud
How to Install and Run NeuTTS Air Locally: Super-Realistic On-Device Voice AI with Instant Voice Cloning
Voice AI has finally broken free from heavy hardware. NeuTTS Air is redefining text-to-speech as we know it, delivering studio-grade realism, instant voice cloning, and real-time generation directly on your device. Built by Neuphonic, the pioneers of fast…
❤3
This media is not supported in your browser
VIEW IN TELEGRAM
Meta AI just launched the Code World Model (CWM)!
The Code World Model (CWM) is a 32B parameter dense autoregressive LLM developed by the Meta FAIR CodeGen Team. Unlike traditional code models, CWM was mid-trained on Python execution traces, memory trajectories, and containerized agentic interactions—making it uniquely suited for reasoning about how code affects computational environments.
What’s special about CWM?
✅ Mid-trained on real execution traces & agentic environments
✅ Post-trained with multi-task RL for verifiable coding, math, and multi-turn software engineering
✅ Research-only (non-commercial) release under FAIR license
✅ Strong benchmark performance on Math-500, AIME, and SweBench
We just dropped a full step-by-step guide on:
🔹 Requesting gated access
🔹 Running on a NodeShift GPU VM
🔹 Serving with vLLM
🔹 Streamlit UI for interaction
Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-facebook-cwm-locally
The Code World Model (CWM) is a 32B parameter dense autoregressive LLM developed by the Meta FAIR CodeGen Team. Unlike traditional code models, CWM was mid-trained on Python execution traces, memory trajectories, and containerized agentic interactions—making it uniquely suited for reasoning about how code affects computational environments.
What’s special about CWM?
✅ Mid-trained on real execution traces & agentic environments
✅ Post-trained with multi-task RL for verifiable coding, math, and multi-turn software engineering
✅ Research-only (non-commercial) release under FAIR license
✅ Strong benchmark performance on Math-500, AIME, and SweBench
We just dropped a full step-by-step guide on:
🔹 Requesting gated access
🔹 Running on a NodeShift GPU VM
🔹 Serving with vLLM
🔹 Streamlit UI for interaction
Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-facebook-cwm-locally
❤1🔥1
Still relying on slow, API based TTS that sounds robotic, can't run on small devices and breaks the bank?
Meet KaniTTS, the latest high-speed, high-fidelity voice AI that runs entirely on your device with just some basic GPU acceleration.
What makes it special:
- Powered by a 370M LLM + Neural Audio Codec for ultra-natural, real-time speech
- ~1 sec latency for 15 seconds of audio, perfect for chatbots, assistants & accessibility tools
- Multilingual: English, German, Chinese, Korean, Arabic & Spanish
- Runs locally with just 2-4GB GPU memory, no APIs, no data leaks, no lag
With NodeShift cloud, setting it up is effortless, GPU-optimized, ready-to-run, and privacy-first.
Get studio-quality speech generation right on your own hardware in minutes.
🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-and-run-kanitts-locally-real-time-on-device-voice-generation?utm_source=telegram&utm_medium=social&utm_campaign=kanitts_launch
Meet KaniTTS, the latest high-speed, high-fidelity voice AI that runs entirely on your device with just some basic GPU acceleration.
What makes it special:
- Powered by a 370M LLM + Neural Audio Codec for ultra-natural, real-time speech
- ~1 sec latency for 15 seconds of audio, perfect for chatbots, assistants & accessibility tools
- Multilingual: English, German, Chinese, Korean, Arabic & Spanish
- Runs locally with just 2-4GB GPU memory, no APIs, no data leaks, no lag
With NodeShift cloud, setting it up is effortless, GPU-optimized, ready-to-run, and privacy-first.
Get studio-quality speech generation right on your own hardware in minutes.
🔗 Read the full guide: https://nodeshift.cloud/blog/how-to-install-and-run-kanitts-locally-real-time-on-device-voice-generation?utm_source=telegram&utm_medium=social&utm_campaign=kanitts_launch
NodeShift Cloud
How to Install and Run KaniTTS Locally: Real-Time On-Device Voice Generation
KaniTTS is a text-to-speech model, a high-speed, high-fidelity speech generation system built for real-time conversational AI. Designed with a two-stage architecture that fuses a 370M parameter language model and an ultra-efficient neural audio codec, KaniTTS…
❤2
The ModernVBERT team has just unleashed a compact 250M-parameter vision-language model, which is matching the performance of models up to 10x larger, performing way above its weight!
With state-of-the-art multimodal reasoning, advanced document retrieval capabilities, and seamless image + text understanding, ModernVBERT is your go-to model for next-level AI & RAG workflows.
We’ve published a step-by-step guide to install and run ModernVBERT locally - fast, clean, and ready for experimentation.
- Unlock multimodal intelligence
- Advanced visual document comprehension
- Optimized for lightning-fast local inference with NodeShift Cloud
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-modernvbert-compact-vlm-for-document-retrieval-in-rag-applications?utm_source=telegram&utm_medium=social&utm_campaign=modernvbert_launch
With state-of-the-art multimodal reasoning, advanced document retrieval capabilities, and seamless image + text understanding, ModernVBERT is your go-to model for next-level AI & RAG workflows.
We’ve published a step-by-step guide to install and run ModernVBERT locally - fast, clean, and ready for experimentation.
- Unlock multimodal intelligence
- Advanced visual document comprehension
- Optimized for lightning-fast local inference with NodeShift Cloud
🔗 Read the full guide here: https://nodeshift.cloud/blog/how-to-install-modernvbert-compact-vlm-for-document-retrieval-in-rag-applications?utm_source=telegram&utm_medium=social&utm_campaign=modernvbert_launch
NodeShift Cloud
How to Install ModernVBERT: Compact VLM for Document Retrieval in RAG Applications
The world of vision-language models is evolving fast, and ModernVBERT is proof that efficiency no longer means compromise. Developed as part of the ModernVBERT suite, this compact 250M-parameter model packs the intelligence and alignment capabilities of models…
❤2🔥1
Media is too big
VIEW IN TELEGRAM
ServiceNow just released: Apriel-1.5-15B-Thinker
Apriel-1.5-15B-Thinker is an open-weights, multimodal reasoning model (image-text-to-text) focused on strong mid-training/continual pre-training plus high-quality text SFT—no RL required. It’s compact (15B) yet competitive with much larger models and designed to run on a single GPU.
We just published a step by step guide to install and run Apriel locally—plus a simple Streamlit UI so you can chat with the model and ask questions about images.
What the guide covers:
✅ Picking a GPU + VRAM sizing tips
✅ CUDA/PyTorch install (cu121) & env setup (Py 3.11)
✅ One-file for text + vision with the correct dtype cast (BF16/FP16)
✅ Optional Streamlit app (text & image tabs, sliders for temp/tokens)
✅ Tuning for speed/VRAM (token limits, fp16, 8-bit options)
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-servicenow-apriel-1-5-15b-thinker-locally
Apriel-1.5-15B-Thinker is an open-weights, multimodal reasoning model (image-text-to-text) focused on strong mid-training/continual pre-training plus high-quality text SFT—no RL required. It’s compact (15B) yet competitive with much larger models and designed to run on a single GPU.
We just published a step by step guide to install and run Apriel locally—plus a simple Streamlit UI so you can chat with the model and ask questions about images.
What the guide covers:
✅ Picking a GPU + VRAM sizing tips
✅ CUDA/PyTorch install (cu121) & env setup (Py 3.11)
✅ One-file for text + vision with the correct dtype cast (BF16/FP16)
✅ Optional Streamlit app (text & image tabs, sliders for temp/tokens)
✅ Tuning for speed/VRAM (token limits, fp16, 8-bit options)
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-servicenow-apriel-1-5-15b-thinker-locally
🔥2❤1
Microsoft just released UserLM-8B — a unique open-weights model that flips the script: instead of acting like an assistant, it simulates the user in a conversation.
What is UserLM-8B?
A fine-tuned Llama-3.1-8B model trained on WildChat-1M to generate realistic user turns—from the first query to multi-turn follow-ups—and it can even gracefully wrap up with a special <|endconversation|> token.
Why it’s special
✅ Purpose-built for assistant evaluation & robustness testing
✅ Great for synthetic dialogue data generation
✅ More natural, diverse “user” behavior vs. prompting an assistant model to pretend
We just published a new guide “How to Install & Run Microsoft UserLM-8B Locally”
What’s inside:
✅ GPU sizing + a practical VRAM table
✅ Full setup on a GPU VM (NodeShift example)
✅ Ready-to-run scripts
✅ Guardrails for realistic simulations (stop tokens, end-of-conversation handling)
✅ Tips to plug UserLM into your own assistant for end-to-end testing
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-microsoft-userlm-8b-locally
What is UserLM-8B?
A fine-tuned Llama-3.1-8B model trained on WildChat-1M to generate realistic user turns—from the first query to multi-turn follow-ups—and it can even gracefully wrap up with a special <|endconversation|> token.
Why it’s special
✅ Purpose-built for assistant evaluation & robustness testing
✅ Great for synthetic dialogue data generation
✅ More natural, diverse “user” behavior vs. prompting an assistant model to pretend
We just published a new guide “How to Install & Run Microsoft UserLM-8B Locally”
What’s inside:
✅ GPU sizing + a practical VRAM table
✅ Full setup on a GPU VM (NodeShift example)
✅ Ready-to-run scripts
✅ Guardrails for realistic simulations (stop tokens, end-of-conversation handling)
✅ Tips to plug UserLM into your own assistant for end-to-end testing
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-microsoft-userlm-8b-locally
NodeShift Cloud
How to Install & Run Microsoft UserLM-8B Locally?
UserLM-8b is Microsoft’s open-weight large language model uniquely designed to simulate the “user” role in conversations. Unlike most LLMs that play the assistant role, UserLM-8b was fine-tuned on the WildChat-1M dataset to generate realistic user utterances.…
🔥2
Wondering how can you test if your AI model is safe or behaves unpredictably under pressure?
Anthropic has just released a non-negotiable tool for AI safety - Petri (Parallel Exploration Tool for Risky Interactions). Petri is an open-source tool that automates AI behavior testing through multi-turn conversations, simulated environments, and detailed scoring across safety dimensions.
With Petri, you can:
- Run alignment tests on any model - deception, reward hacking, & more
- Automate hundreds of behavioral evaluations in minutes
- Get structured insights and transcripts for deeper analysis & benchmarking
And with NodeShift Cloud, you can install and run Petri locally, easily, securely, and with zero setup friction.
In our latest guide, we’ll cover:
🔹 How to install and set up Petri locally
🔹 How to setup local model for auditing with Ollama as the API
🔹 How to run your first automated safety audit
🔹 How to provide seed instructions and interpret transcripts
🔗 Read full guide here: https://nodeshift.cloud/blog/how-to-install-run-anthropics-petri-locally-the-easiest-way-to-audit-ai-models-for-safety?utm_source=telegram&utm_medium=social&utm_campaign=petri_ai_audit
Anthropic has just released a non-negotiable tool for AI safety - Petri (Parallel Exploration Tool for Risky Interactions). Petri is an open-source tool that automates AI behavior testing through multi-turn conversations, simulated environments, and detailed scoring across safety dimensions.
With Petri, you can:
- Run alignment tests on any model - deception, reward hacking, & more
- Automate hundreds of behavioral evaluations in minutes
- Get structured insights and transcripts for deeper analysis & benchmarking
And with NodeShift Cloud, you can install and run Petri locally, easily, securely, and with zero setup friction.
In our latest guide, we’ll cover:
🔹 How to install and set up Petri locally
🔹 How to setup local model for auditing with Ollama as the API
🔹 How to run your first automated safety audit
🔹 How to provide seed instructions and interpret transcripts
🔗 Read full guide here: https://nodeshift.cloud/blog/how-to-install-run-anthropics-petri-locally-the-easiest-way-to-audit-ai-models-for-safety?utm_source=telegram&utm_medium=social&utm_campaign=petri_ai_audit
NodeShift Cloud
How to Install & Run Anthropic’s Petri Locally: The Easiest Way to Audit AI Models for Safety
As frontier AI systems grow increasingly capable and autonomous, understanding how they behave under pressure, deception, or ethical ambiguity has become one of the most critical challenges in AI safety. Anthropic’s Petri – short for Parallel Exploration…
🔥2❤1
Qwen just dropped: Qwen3-VL-30B-A3B-Thinking
A powerhouse multimodal model built on a Mixture-of-Experts stack—designed for deep text + vision + video reasoning, long-context understanding (256K→1M), robust OCR (32 languages), GUI/tool use, and even converting diagrams/screens into working code.
We’ve published a fresh, hands-on guide:
“How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally” — tuned for a GPU VM workflow (We used NodeShift, but it works anywhere).
What’s inside the guide
✅ Clean environment setup (CUDA-aligned PyTorch, optional FlashAttention-2)
✅ Image & video inference
✅ “Thinking” variant notes + practical VRAM plans (single-/multi-GPU)
✅ Troubleshooting (FA2 mismatches, SDPA fallback)
✅ Ready-to-copy commands & code blocks for Jupyter/terminal
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-30b-a3b-thinking-locally
A powerhouse multimodal model built on a Mixture-of-Experts stack—designed for deep text + vision + video reasoning, long-context understanding (256K→1M), robust OCR (32 languages), GUI/tool use, and even converting diagrams/screens into working code.
We’ve published a fresh, hands-on guide:
“How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally” — tuned for a GPU VM workflow (We used NodeShift, but it works anywhere).
What’s inside the guide
✅ Clean environment setup (CUDA-aligned PyTorch, optional FlashAttention-2)
✅ Image & video inference
✅ “Thinking” variant notes + practical VRAM plans (single-/multi-GPU)
✅ Troubleshooting (FA2 mismatches, SDPA fallback)
✅ Ready-to-copy commands & code blocks for Jupyter/terminal
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-vl-30b-a3b-thinking-locally
NodeShift Cloud
How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally?
Qwen3-VL-30B-A3B-Thinking is one of the most advanced multimodal reasoning models in the Qwen3 series, designed to seamlessly fuse text, vision, and video understanding with large-scale reasoning. Built on a Mixture-of-Experts (MoE) architecture with 30B…
🔥2❤1
Media is too big
VIEW IN TELEGRAM
AI21 Labs just launched Jamba Reasoning 3B — a compact, hybrid Transformer–Mamba model built for serious reasoning on modest hardware.
Why it’s special
✅ ~3B params (26 Mamba + 2 Attention) → fast, memory-light, edge-friendly
✅ 256K context without the usual KV-cache blow-up
✅ Strong benchmarks: IFBench 52.0, Humanity’s Last Exam 6.0, MMLU-Pro 61.0
✅ On-device speed that holds up as context grows (≈43–44 tok/s at 16–32K)
We just published a new step-by-step guide:
“How to Install & Run AI21-Jamba-Reasoning-3B Locally (GPU VM)”
What’s inside
✅ Pick the right GPU & VRAM (rule-of-thumb table)
✅ Clean setup on a CUDA 12.1.1 image (Python 3.11, Torch cu121)
✅ vLLM serving (OpenAI-compatible) with the right flags for Mamba SSM
✅ Transformers alternative path + FlashAttention 2 tips
✅ A one-file Streamlit UI to chat with the model on your own server
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ai21-jamba-reasoning-3b-locally
Why it’s special
✅ ~3B params (26 Mamba + 2 Attention) → fast, memory-light, edge-friendly
✅ 256K context without the usual KV-cache blow-up
✅ Strong benchmarks: IFBench 52.0, Humanity’s Last Exam 6.0, MMLU-Pro 61.0
✅ On-device speed that holds up as context grows (≈43–44 tok/s at 16–32K)
We just published a new step-by-step guide:
“How to Install & Run AI21-Jamba-Reasoning-3B Locally (GPU VM)”
What’s inside
✅ Pick the right GPU & VRAM (rule-of-thumb table)
✅ Clean setup on a CUDA 12.1.1 image (Python 3.11, Torch cu121)
✅ vLLM serving (OpenAI-compatible) with the right flags for Mamba SSM
✅ Transformers alternative path + FlashAttention 2 tips
✅ A one-file Streamlit UI to chat with the model on your own server
Check out the full guide here: https://nodeshift.cloud/blog/how-to-install-run-ai21-jamba-reasoning-3b-locally
🔥2❤1
VNGRS Releases Kumru-2B — A Turkish-Native Lightweight Language Model
VNGRS has officially released Kumru-2B, a compact yet powerful Turkish-native LLM built entirely from scratch. Trained on ~500 GB of curated text (≈300B tokens) and fine-tuned on over 1M supervised examples, Kumru-2B is designed specifically for the Turkish language — featuring a modern 50K-token Turkish-optimized tokenizer, 8K context window, and native support for math and code.
Why Kumru-2B is Special
✅ Built from scratch for Turkish — not a multilingual adaptation.
✅ Efficient tokenizer: uses ~40% fewer tokens than multilingual models like GPT-4o or Gemma.
✅ Punches above its weight — outperforms much larger models like Llama-3.3-70B and Qwen2-72B on Turkish-centric tasks.
✅ Runs smoothly on local or cloud GPUs, making it ideal for research, startups, and developers.
In our latest blog, we walk you through everything you need to:
✅ Deploy a GPU-powered VM on NodeShift Cloud
✅ Install Python 3.11 + CUDA 12.1.1 environment
✅ Run the model with a simple Python script
✅ Launch an interactive Streamlit WebUI to chat with Kumru-2B directly in your browser
Whether you’re building NLP tools, studying Turkish linguistics, or experimenting with LLMs, this guide helps you get started in minutes.
Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-vngrs-ai-kumru-2b-locally
VNGRS has officially released Kumru-2B, a compact yet powerful Turkish-native LLM built entirely from scratch. Trained on ~500 GB of curated text (≈300B tokens) and fine-tuned on over 1M supervised examples, Kumru-2B is designed specifically for the Turkish language — featuring a modern 50K-token Turkish-optimized tokenizer, 8K context window, and native support for math and code.
Why Kumru-2B is Special
✅ Built from scratch for Turkish — not a multilingual adaptation.
✅ Efficient tokenizer: uses ~40% fewer tokens than multilingual models like GPT-4o or Gemma.
✅ Punches above its weight — outperforms much larger models like Llama-3.3-70B and Qwen2-72B on Turkish-centric tasks.
✅ Runs smoothly on local or cloud GPUs, making it ideal for research, startups, and developers.
In our latest blog, we walk you through everything you need to:
✅ Deploy a GPU-powered VM on NodeShift Cloud
✅ Install Python 3.11 + CUDA 12.1.1 environment
✅ Run the model with a simple Python script
✅ Launch an interactive Streamlit WebUI to chat with Kumru-2B directly in your browser
Whether you’re building NLP tools, studying Turkish linguistics, or experimenting with LLMs, this guide helps you get started in minutes.
Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-run-vngrs-ai-kumru-2b-locally
NodeShift Cloud
How to Install & Run Vngrs-AI Kumru-2B Locally?
Kumru-2B is VNGRS’s lightweight, Turkish-native LLM trained from scratch. It’s pre-trained on ~500 GB of cleaned, deduplicated text (~300B tokens) and SFT’d on ~1M examples. Kumru uses a modern Turkish-optimized tokenizer (≈50,176 vocab) and ships with a…
🔥2❤1
OCR needs has evolved beyond just extracting text, enterprises now need the OCR that can understand the documents and turns them into structured, AI-ready markdown.
That’s why Nanonets-OCR2 by Nanonets is a game-changer for anyone working with scanned docs, academic papers, business reports, invoices, or forms etc.
What can it do?
✅ Converts mathematical equations to LaTeX
✅ Describes images using structured <img> tags
✅ Detects signatures & watermarks
✅ Handles checkboxes, radio buttons, and complex tables
✅ Extracts flowcharts & org charts as Mermaid code
✅ Supports handwritten documents and multiple languages
✅ Provides Visual Question Answering (VQA) directly from the document
We’ve just published a complete guide to install and run Nanonets-OCR2 locally or in GPU accelerated environment with NodeShift Cloud for continuous delivery, so you can start automating document workflows with full control and scalability.
🔗 Read the guide here: https://nodeshift.cloud/blog/convert-documents-to-structured-markdown-html-with-nanonets-ocr2?utm_source=telegram&utm_medium=social&utm_campaign=nanonets_ocr2_guide
That’s why Nanonets-OCR2 by Nanonets is a game-changer for anyone working with scanned docs, academic papers, business reports, invoices, or forms etc.
What can it do?
✅ Converts mathematical equations to LaTeX
✅ Describes images using structured <img> tags
✅ Detects signatures & watermarks
✅ Handles checkboxes, radio buttons, and complex tables
✅ Extracts flowcharts & org charts as Mermaid code
✅ Supports handwritten documents and multiple languages
✅ Provides Visual Question Answering (VQA) directly from the document
We’ve just published a complete guide to install and run Nanonets-OCR2 locally or in GPU accelerated environment with NodeShift Cloud for continuous delivery, so you can start automating document workflows with full control and scalability.
🔗 Read the guide here: https://nodeshift.cloud/blog/convert-documents-to-structured-markdown-html-with-nanonets-ocr2?utm_source=telegram&utm_medium=social&utm_campaign=nanonets_ocr2_guide
NodeShift Cloud
Convert Documents to Structured Markdown & HTML with Nanonets-OCR2
Optical Character Recognition (OCR) has evolved far beyond simple text extraction, and Nanonets-OCR2 is the next-generation proof of that transformation. This state-of-the-art image-to-markdown OCR model doesn’t just pull text from images or PDFs, it converts…
❤2
The wait is over, now you could run Korea’s first fully open source 10B-parameter AI model - right on your machine!
Meet KORMo-10B-sft, a 10B-parameter bilingual Korean-English LLM built entirely from scratch and released 100% open-source - weights, code, and even training data.
Developed by KAIST's MLP Lab, KORMo sets a new benchmark for transparency, reproducibility, and real-world performance - bridging the gap between open research and applied AI specially in non-english domains.
In our latest article, we break down how to install and run KORMo-10B-sft locally, explore its most powerful features, and show how NodeShift Cloud makes deploying massive open models effortless, from Colab to production GPUs.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-kormo-the-first-fully-open-source-korean-english-llm?utm_source=telegram&utm_medium=social&utm_campaign=kormo_10b_launch
Meet KORMo-10B-sft, a 10B-parameter bilingual Korean-English LLM built entirely from scratch and released 100% open-source - weights, code, and even training data.
Developed by KAIST's MLP Lab, KORMo sets a new benchmark for transparency, reproducibility, and real-world performance - bridging the gap between open research and applied AI specially in non-english domains.
In our latest article, we break down how to install and run KORMo-10B-sft locally, explore its most powerful features, and show how NodeShift Cloud makes deploying massive open models effortless, from Colab to production GPUs.
🔗 Read here: https://nodeshift.cloud/blog/how-to-install-and-run-kormo-the-first-fully-open-source-korean-english-llm?utm_source=telegram&utm_medium=social&utm_campaign=kormo_10b_launch
NodeShift Cloud
How to Install and Run KORMo: The first fully Open Source Korean-English LLM
The era of open, large-scale bilingual language models has arrived, and KORMo-10B-sft stands at the forefront of that revolution. Developed by KAIST’s MLP Lab, this 10.8-billion-parameter fully open-source model represents a milestone for the Korean AI ecosystem…
❤2
Liquid AI just dropped something special — the LFM2-8B-A1B model is here!
This new on-device-friendly Mixture-of-Experts (MoE) model packs 8.3B total parameters (only 1.5B active!) and blends 18 convolutional LIV layers + 6 GQA attention layers for hybrid speed and quality. It supports 32K context length, runs smoothly even on modest GPUs, and rivals much larger 3–4B dense models in performance — perfect for agentic tasks, RAG, data extraction, and multi-turn reasoning.
We’ve just published a step-by-step installation and setup guide for LFM2-8B-A1B, where we walk through everything — from spinning up a GPU VM on NodeShift Cloud to running the model locally using Transformers.
Here’s what we covered in the guide:
✅ Model benchmarks, specs, and comparison tables
✅ Full environment setup (CUDA, Python, PyTorch)
✅ Hugging Face authentication and correct Transformers commit
✅ Script to run the model locally
✅ GPU configuration cheatsheet for every use case
Check out the complete guide here: https://nodeshift.cloud/blog/how-to-install-run-lfm2-8b-a1b-locally
This new on-device-friendly Mixture-of-Experts (MoE) model packs 8.3B total parameters (only 1.5B active!) and blends 18 convolutional LIV layers + 6 GQA attention layers for hybrid speed and quality. It supports 32K context length, runs smoothly even on modest GPUs, and rivals much larger 3–4B dense models in performance — perfect for agentic tasks, RAG, data extraction, and multi-turn reasoning.
We’ve just published a step-by-step installation and setup guide for LFM2-8B-A1B, where we walk through everything — from spinning up a GPU VM on NodeShift Cloud to running the model locally using Transformers.
Here’s what we covered in the guide:
✅ Model benchmarks, specs, and comparison tables
✅ Full environment setup (CUDA, Python, PyTorch)
✅ Hugging Face authentication and correct Transformers commit
✅ Script to run the model locally
✅ GPU configuration cheatsheet for every use case
Check out the complete guide here: https://nodeshift.cloud/blog/how-to-install-run-lfm2-8b-a1b-locally
NodeShift Cloud
How to Install & Run LFM2-8B-A1B Locally?
LFM2-8B-A1B is Liquid AI’s on-device-friendly MoE: 8.3B total / 1.5B active params with a hybrid conv-attention stack (18 LIV conv + 6 GQA). It uses a ChatML-style template, supports 32K context, and is tuned for agentic tasks, data extraction, RAG, and multi…
❤1🔥1