NodeShift Announcements Official
22.8K subscribers
45 photos
7 videos
378 links
Decentralized, no-code AI cloud platform that enables one-click deployment of AI agents and LLMs
Download Telegram
Meet Dia-1.6B-0626 — A Voice That Performs, Not Just Reads

Crafted by the small but brilliant team at Nari Labs, Dia-1.6B-0626 is a fully open dialogue-to-speech model that turns your scripts into vivid, expressive conversations — complete with speaker cues, emotions, and even gestures like (laughs), (sighs), or (coughs).

No locked APIs. No voice clones behind a paywall. Just pure open infrastructure that sounds as real as it feels.

And today, we’ve just dropped a complete step-by-step guide to run Dia-1.6B-0626 on a GPU-powered NodeShift Virtual Machine, fully optimized for CUDA 12.1!
Full setup on NodeShift
PyTorch + CUDA 12.1 compatibility
Gradio Web UI access
Voice cloning & dialogue generation
GPU configuration recommendations

If you’ve ever wanted to build an open, expressive voice experience — this is your launchpad.

Blog Link: https://nodeshift.com/blog/how-to-install-nari-dia-1-6b-0626-locally
🔥2
SmolLM3 from Hugging Face is here and, honestly, you might blown away by how much this little 3B model can do!

It handles long context (up to 128k tokens!), understands code, math, and reasoning tasks really well, and even supports 6 languages out of the box.
And the best part? It’s completely open - training data, weights, everything.

We just published a quick guide on how to get it up & running locally or on NodeShift GPUs.
If you're curious about small but capable LLMs, this one’s worth checking out.
🔗 Read here: https://nodeshift.com/blog/how-to-install-smollm3-by-hugging-face?utm_source=telegram&utm_medium=social&utm_campaign=smollm3_launch
🔥1
Mistral AI has just released Devstral Small & Medium 2507, pushing the boundaries of agentic coding capabilities!

Devstral Small 2507 already wowed us with its SWE-Bench Verified score of 53.6%, setting a new bar for open-source coding assistants.

What does this mean?
👉 Smarter code exploration
👉 Better multi-file editing
👉 Stronger software engineering agents
👉 Lightning-fast workflows with up to 128k context window

We’ve just published a full step-by-step guide on how to install, run, and deploy Devstral Small locally (including with NodeShift GPUs). It’s packed with all the commands, tips, and setup details you need to get started — whether you want a lightweight coding buddy or a powerhouse agent that transforms your engineering process.

Check out the guide here → https://nodeshift.com/blog/how-to-install-devstral-small-1-1-locally
🔥21
Want to use powerful models like ChatGPT, DeepSeek, Mistral, or Llama in your company? But worried about data privacy? You're not alone.

In sectors like finance, healthcare, defense, deep IP, and other critical industries - AI adoption stalls at one critical roadblock: "Won't our data get exposed or misused?"
But what if you could get the best of both worlds:
- Power of Gen AI
- along with zero data exposure in your own private AI environment completely isolated from outside world

In our latest article, we show you exactly how to use Generative AI without worrying about your data!
You’ll learn:
- Why most enterprises still hesitate to deploy AI
- How NodeShift flips the model: AI comes to your data, not the other way around
- What private, sovereign AI really looks like in action

🔗 Read the full article here: https://nodeshift.com/blog/how-to-use-generative-ai-without-worrying-about-your-data?utm_source=telegram&utm_medium=social&utm_campaign=nodeshift_sovereign_ai
🔥1
Mistral AI just dropped two crazy impressive audio models — Voxtral Mini (3B) and Voxtral Small (24B) — and we’re beyond excited to share that we’ve published a complete, step-by-step guide covering everything you need to know to get them running!

Transcription, translation, Q&A, summaries
Multi-audio + text inputs
Function calling from voice (!!)
Works across English, Spanish, French, Hindi, German & more
Fully ready for integration with tools like Gradio + Python scripts

In this guide, we walk you through:
👉 How to install both models
👉 How to deploy them on cloud GPU VMs (we used NodeShift)
👉 How to test them locally + in production
👉 How to measure real-world speed & performance
👉 How to build interactive web apps on top

The best part? You don’t need a massive GPU cluster to get started — Voxtral Mini runs beautifully even on a single high-memory GPU (like A100 40GB or RTX A6000). But if you’re ready to flex, Voxtral Small is here for those next-level workloads.

We’re seriously hyped about what this unlocks for developers, startups, researchers, and product teams. This isn’t just speech-to-text. This is speech-to-understanding-to-action.

Check out the full guide here: https://nodeshift.com/blog/how-to-install-mistral-voxtral-locally
2🔥1
Shadow AI is silently compromising your enterprise security behind your back!

Your teams are already using tools like ChatGPT, DeepSeek, Claude, and Gemini to write policies, summarize documents, or answer internal queries, which therefore, may involve feeding private documents or client details to these third-party AI models.

They’re not trying to break the rules, they just want to move faster.
But without a secure, internal AI solution, they’re unknowingly feeding sensitive data into public systems - HR records, financial info, source code, even confidential strategies.

This isn’t a future risk. It’s happening now.
And it’s happening inside your network. And this is what is called "Shadow AI".
In our latest article, we'll talk about:
- What Shadow AI is, in detail and why it’s dangerous
- Why traditional IT policies can’t stop it
- How Sovereign AI platforms like NodeShift give you full control, security, and productivity - without banning AI in your company

🔗Read here: https://nodeshift.com/blog/the-silent-risk-inside-your-enterprise-security-why-cisos-must-replace-shadow-ai-with-sovereign-ai?utm_source=telegram&utm_medium=social&utm_campaign=shadow_ai_blog
🔥2
LiquidAI LFM2-1.2B is a cutting-edge hybrid model designed by Liquid AI, built specifically for edge AI and on-device deployment. With ~1.2 billion parameters, it delivers outstanding speed, memory efficiency, and multilingual capabilities, making it ideal for tasks like agentic workflows, data extraction, RAG, creative writing, and multi-turn conversations — all while running smoothly even on limited hardware.

We successfully ran both versions of the LFM2-1.2B model:
The GGUF quantized version on Oobabooga Text Generation WebUI, providing an easy and interactive web interface.
The Transformers version on a Jupyter Notebook inside a CUDA-enabled virtual machine, powered by NodeShift Cloud, allowing full control through Python and code experimentation.

Setup Highlights:
Deployed on a GPU-powered virtual machine (RTX A6000, CUDA 12.1.1)
Installed required dependencies and libraries
Ran structured reasoning prompts and creative tasks
Achieved smooth performance across both web-based and code-based environments

We just published a full step-by-step guide if you want to set it up yourself — check it out here: https://nodeshift.com/blog/how-to-install-liquidai-lfm2-1-2b-locally
🔥1
Ever wondered you could combine the power of speech recognition and large language model in one model? right on your own machine?
Well, NVIDIA's latest Canary-Qwen-2.5B model has made this a reality.

With 2.5B parameters, it transcribes English with near state-of-the-art accuracy - punctuation, capitalization, fast decoding (418 RTFx), and then goes a step further to also summarize, answer questions, or refine transcripts with full LLM-level understanding, thanks to its two-models-in-one nature.

We wrote a quick hands-on guide for anyone curious to try this out, especially if you're building tools around audio, transcription, or voice+text interfaces.
🔗 Read it here: https://nodeshift.com/blog/combine-the-power-of-asr-llm-with-nvidias-canary-qwen-2-5b?utm_source=telegram&utm_medium=social&utm_campaign=canary_qwen_launch
🔥2
In the world of search engines and information retrieval, precision matters — and that’s exactly where ZeroEntropy (YC W25) new release, Zerank-1-Small, makes its mark.

Zerank-1-Small is a compact, 1.7B parameter reranker model, designed to boost the accuracy of search results across domains like finance, legal, STEM, code, and medical. Despite being over 2x smaller than its flagship sibling, Zerank-1, it consistently outperforms many closed-source rerankers and delivers massive improvements over traditional vector search methods.

In our latest technical guide, we walk you through step by step how we installed, configured, and ran Zerank-1-Small on a GPU virtual machine — using NodeShift Cloud.

Here’s a sneak peek of what we covered (and tested hands-on):
Simple script testing (run_zerank) → direct model inference on query-document pairs
Interactive CLI tool (cli_rerank) → type queries live in terminal, explore relevance scores
Batch reranking from CSV (batch_rerank) → process large sets of pairs, output results to CSV
Gradio web UI (gradio_rerank) → browser-based, no-code interface to test model live
FastAPI REST API (fastapi_rerank) → turn the model into a scalable, programmatic service

We didn’t just spin up the model — we built a complete, flexible stack for developers, researchers, and even non-technical users to interact with Zerank-1-Small however they need.

Check out the full guide here: https://nodeshift.com/blog/how-to-install-run-zeroentropy-zerank-1-small-locally

If you’re working in retrieval systems, search, or ranking tasks — or if you just love exploring the cutting edge of open-source models — this one’s for you.
🔥3
The Future of Clinical NLP Just Got More Powerful.
Microsoft's MediPhi-Instruct is not just another language model, it's a modular, clinically aligned AI built for real-world medical use cases.

MediPhi is built with the power of 7 expert models, fused using advanced techniques like SLERP and BreadCrumbs, and is designed to run efficiently even in low-resource settings, without sacrificing accuracy.

If you're working with Medical data, parsing medical guidelines, or building intelligent clinical assistants, MediPhi-Instruct is a 3.8B parameter model that performs way above its weight.
In our latest guide, we'll walk you through how to get it up and running in minutes locally or in GPU environments with NodeShift.
🔗 Read here: https://nodeshift.com/blog/transform-clinical-research-with-microsofts-mediphi-instruct?utm_source=telegram&utm_medium=social&utm_campaign=mediphi_launch
Qwen just dropped a beast — and it overperforms nearly every other model out there!

Their latest release, Qwen3-235B-A22B-Instruct-2507, is a mixture-of-experts language model that blends raw power with incredible instruction-following abilities.

With 256K token context, top-tier reasoning, multi-language skills, and standout performance on benchmarks like GPQA, ARC-AGI, and ZebraLogic — this model means business.

And we’ve just published a complete step-by-step guide to install and run it on a GPU VM!

Whether you're a researcher, developer, or someone just exploring what these massive models can really do — our guide walks you through everything from GPU setup to generating your first output.

We cover:
- Full Python + CUDA environment setup
- Multi-GPU VM configuration
- Optimized transformers installation
- A tested Python script with real outputs
- Tips for running MoE models smoothly without crashing your system

Dive in here: https://nodeshift.com/blog/how-to-install-run-qwen3-235b-a22b-instruct-2507-locally
🔥31
Ever heard a model that can speak in your cloned voice, narrate like a human, and translate the spoken words, all without a single fine-tuning step?
Meet Higgs Audio v2, an open-source audio foundation model currently trending on Hugging Face, and is trained on over 10M hours of data, packed with crazy capabilities like:
- Zero-shot emotional TTS
- Deep language + acoustic understanding
- Natural multi-speaker dialogues

We just published a hands-on guide to help you install it locally in minutes.
If you’re building with voice, this one’s worth your time.
🔗 Read here: Ever heard a model that can speak in your cloned voice, narrate like a human, and translate the spoken words, all without a single fine-tuning step?
Meet Higgs Audio v2, an open-source audio foundation model currently trending on Hugging Face, and is trained on over 10M hours of data, packed with crazy capabilities like:
- Zero-shot emotional TTS
- Deep language + acoustic understanding
- Natural multi-speaker dialogues

We just published a hands-on guide to help you install it locally in minutes.
If you’re building with voice, this one’s worth your time.
🔗 Read here:Ever heard a model that can speak in your cloned voice, narrate like a human, and translate the spoken words, all without a single fine-tuning step?
Meet Higgs Audio v2, an open-source audio foundation model currently trending on Hugging Face, and is trained on over 10M hours of data, packed with crazy capabilities like:
- Zero-shot emotional TTS
- Deep language + acoustic understanding
- Natural multi-speaker dialogues

We just published a hands-on guide to help you install it locally in minutes.
If you’re building with voice, this one’s worth your time.
🔗 Read here: https://nodeshift.com/blog/how-to-install-higgs-audio-v2-locally?utm_source=telegram&utm_medium=social&utm_campaign=higgs_audio_v2_launch
🔥2🥰1
The all-new Qwen3-Coder-480B-A35B-Instruct is here—a true powerhouse model designed for deep reasoning, agentic coding workflows, and massive long-context support (up to 256K tokens natively and 1 million with Yarn!). Whether you’re dealing with huge codebases, automating complex workflows, or pushing the limits of multilingual programming, Qwen3-Coder is built to deliver speed, precision, and seamless tool integration.

But that’s not all —
Meet the Qwen Code CLI: an AI-powered command-line workflow tool adapted from Gemini CLI, now fully optimized for Qwen3-Coder models. With enhanced parsing, robust code understanding, and the ability to automate coding tasks right from your terminal, Qwen Code CLI is perfect for both everyday scripting and pro-level workflow automation.

We’ve just published a complete, step-by-step guide that walks you through deploying Qwen3-Coder on GPU VMs and setting up Qwen Code CLI so you can harness the full power of both.

Read the full guide here: https://nodeshift.cloud/blog/how-to-install-run-qwen3-coder-480b-a35b-instruct-locally
2🔥1
Qwen is continuously launching their powerful models one after another. Meet the latest one, Qwen3-235B-A22B-Thinking-2507, the open-source reasoning beast with 235B parameters and 256K context length.
It crushes benchmarks in math, science, logic, and coding - rivaling proprietary giants like Claude and GPT.

And guess what? You can run it locally or in GPU accelerated environments.
We show you exactly how to install this model with NodeShift.
🔗Read here: https://nodeshift.cloud/blog/how-to-install-run-qwen-thinking?utm_source=telegram&utm_medium=social&utm_campaign=qwen-thinking-install-guide
3
HRM is a cutting-edge approach to tackling complex reasoning tasks in AI. With its innovative design that combines abstract planning and rapid, detailed computations, HRM is proving to be a game-changer. It excels in solving intricate puzzles like Sudoku and pathfinding, even outperforming larger models in key benchmarks such as the Abstraction and Reasoning Corpus (ARC).

We've just published a comprehensive step-by-step guide on running the Hierarchical Reasoning Model (HRM) locally!

This guide will walk you through:
🔹 Setting up the perfect GPU-powered environment
🔹 Installing and configuring Python, PyTorch, and FlashAttention
🔹 Running and evaluating your first HRM model on a real-world dataset
🔹 Tips and tricks to optimize your experiments

Whether you're a researcher or developer looking to dive deep into AI's reasoning capabilities, this guide is for you.

Get the full step-by-step instructions here: https://nodeshift.cloud/blog/how-to-install-run-hierarchical-reasoning-model-locally
🔥21
Sales teams are losing over 1,000 hours every year, and not on selling.
60%+ of a representative's time is spent on repetitive admin work:
- Outreach emails
- Proposal creation
- RFP responses
- CRM updates
- Meeting summaries

NodeShift’s Sovereign AI, your private, on-prem AI copilot built for sales.
- Works like ChatGPT, fully inside your infrastructure
- Integrates with HubSpot, Apollo, Salesforce
- Automates proposals, follow-ups, onboarding & more
- Powered by open-source LLMs like Mistral, DeepSeek, LLaMA
If your representatives are busy documenting instead of closing, it’s time to rethink AI.

Read how teams are reclaiming 1,000+ hours annually:
🔗 https://nodeshift.cloud/blog/how-ai-is-saving-sales-teams-1000-hours-annually-securely-and-at-scale?utm_source=telegram&utm_medium=social&utm_campaign=sales_ai_article
1🔥1💩1
Tencent Releases HunyuanWorld 1.0: Next-Level 3D World Generation from Text & Images!

We have Just published: A complete, step-by-step guide to installing and running Tencent HunyuanWorld 1.0—your toolkit for creating fully immersive, explorable 3D worlds from a simple prompt or picture!

Tencent’s HunyuanWorld 1.0 is a breakthrough framework that transforms text or images into richly detailed, interactive 3D environments. Unlike older tools that trade off quality for speed or realism for flexibility, HunyuanWorld 1.0 uses panoramic proxies, semantic layers, and mesh-based reconstruction to make world-building faster, sharper, and more creative—right from your GPU VM or cloud!

What's Inside the Guide?
Model intro and performance benchmarks (spoiler: it’s state-of-the-art!)
Full cloud setup on NodeShift (H100/A100 GPU VMs, CUDA, SSH)
System requirements, best GPU configs, HuggingFace login, and more
End-to-end install steps (Real-ESRGAN, ZIM, Draco, MoGe, etc.)
Batch demo scripts for both text-to-world and image-to-world generation
A ready-to-use Gradio web UI—generate panoramas and worlds in your browser!
Tips for artists, developers, and anyone experimenting with next-gen 3D

Check out the guide here: https://nodeshift.cloud/blog/how-to-install-run-tencent-hunyuan3d-world-1-0-locally
1🔥1
Now you can host GPT‑4‑level capabilities right on your own machine with Qwen3's latest and more accessible 30B version.

In the past few days Qwen3 is launching huge models which are powerful but not everyone could have access to them because of the huge size.
But now with Qwen3‑30B‑A3B‑Instruct‑2507 release, you can access the same power in a relatively lightweight version, that offers:
- top-tier instruction following, logic, coding, multilingual reasoning, and
- native 256K-token context support. All of this with just 3.3 B active parameters.

In our latest guide, we walk you through installing this model locally or in GPU-accelerated environment with NodeShift.
🔗 Read the full guide here: https://nodeshift.cloud/blog/a-step-by-step-guide-to-install-qwen3-30b-locally?utm_source=telegram&utm_medium=social&utm_campaign=qwen3_30b_install
2👍1
Unsloth AI released Qwen3-Coder-Flash!

Qwen3-Coder-Flash is the newest, code-focused language model from Unsloth—built for developers and technical creators who want speed, power, and big-context coding. This model delivers everything from advanced code completions and automation to interactive tool use, all with lightning-fast performance and huge context windows.

We’ve just published a complete, hands-on guide for Qwen3-Coder-Flash!
Here’s what you’ll find inside:
Model benchmarks and GPU recommendations for every use-case (from 4090s to H100s)
Step-by-step setup: how to deploy the model on NodeShift’s GPU cloud (or any provider), with the right CUDA image, Python environment, and SSH access
Ollama installation & usage: full commands to run Qwen3-Coder-Flash locally or in the cloud, plus how to pick and launch your favorite GGUF quantization
Open-WebUI integration: chat with the model, generate creative outputs, and live-preview interactive code in your browser
Real project demos: prompts for things like Matrix Code Rain, AI-powered cityscapes, and more
Pro tips for smooth operation and experimenting with new ideas

Ready to build, automate, and experiment with one of the top open coding models?

Check out our full tutorial to get started with Qwen3-Coder-Flash now!

Link: https://nodeshift.cloud/blog/how-to-install-run-qwen3-coder-flash-locally
🔥2👍1
Wan AI releases Wan2.2-TI2V-5B, a next-generation open-source video generation model designed for high-definition, cinematic results. Leveraging advanced Mixture-of-Experts (MoE) architecture and large-scale data training, Wan2.2 can transform text or images into smooth, detailed 720P videos at 24 FPS—all on a single powerful GPU. Whether you’re an artist, researcher, or creator, Wan2.2 brings real creative control to AI-powered video, combining top-tier quality with practical efficiency.

We have just a complete step-by-step guide showing you exactly how to run Wan2.2-TI2V-5B locally using NodeShift GPU Virtual machines.

Here’s what you’ll find inside:
🔹 Recommended GPU configs for best performance
🔹 One-click VM and GPU setup
🔹 Model download & environment preparation
🔹 Fast text-to-video and image-to-video generation
🔹 Easy Gradio web UI for rapid experimentation
🔹 Pro tips for creators and researchers

If you want to be at the frontier of open-source cinematic AI, don’t miss this resource.

Read the full tutorial here: https://nodeshift.cloud/blog/how-to-install-and-run-wan2-2-ti2v-5b-locally
👍1🔥1
Zhipu AI has launched GLM-4.5 and GLM-4.5-Air—two powerhouse language models designed for the next generation of digital assistants, coding agents, and smart automation. These models aren’t just about massive scale (up to 355B parameters!)—they bring advanced reasoning, flexible “thinking” modes, and top-tier efficiency, making them ideal for both experimentation and real-world deployment.

We have just published a full, step-by-step guide on how to install, run, and interact with GLM-4.5 locally.

What’s inside this guide?
Full walkthrough—from cloud VM provisioning to launching your GLM-4.5 model in FP8
Choosing hardware, setting up Python/CUDA, and installing all dependencies (step by step)
Downloading and running the model server (SGLang)
Testing with cURL and automating prompts with Python
Tips on switching between “thinking” and “immediate response” modes
Example benchmarks and links for model downloads

Check out the full guide and start creating with GLM-4.5: https://nodeshift.cloud/blog/how-to-install-run-glm-4-5-locally
2🔥1