π Sber has released two open-source MoE models: GigaChat-3.1 Ultra and Lightning
Both code and weights are available under the MIT license on HuggingFace.
π Key details:
β’ Trained from scratch (not a finetune) on proprietary data and infrastructure
β’ Mixture-of-Experts (MoE) architecture
Models:
π§ GigaChat-3.1 Ultra
β’ 702B MoE model for high-performance environments
β’ Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
β’ Supports FP8 training and MTP
β‘οΈ GigaChat-3.1 Lightning
β’ 10B model (1.8B active parameters)
β’ Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
β’ Efficient local inference
β’ Up to 256k context
Engineering highlights:
β’ Custom metric to detect and reduce generation loops
β’ DPO training moved to native FP8
β’ Improvements in post-training pipeline
β’ Identified and fixed a critical issue affecting evaluation quality
π Trained on 14 languages (optimized for English and Russian)
Use cases:
β’ chatbots
β’ AI assistants
β’ copilots
β’ internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
Both code and weights are available under the MIT license on HuggingFace.
π Key details:
β’ Trained from scratch (not a finetune) on proprietary data and infrastructure
β’ Mixture-of-Experts (MoE) architecture
Models:
π§ GigaChat-3.1 Ultra
β’ 702B MoE model for high-performance environments
β’ Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
β’ Supports FP8 training and MTP
β‘οΈ GigaChat-3.1 Lightning
β’ 10B model (1.8B active parameters)
β’ Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
β’ Efficient local inference
β’ Up to 256k context
Engineering highlights:
β’ Custom metric to detect and reduce generation loops
β’ DPO training moved to native FP8
β’ Improvements in post-training pipeline
β’ Identified and fixed a critical issue affecting evaluation quality
π Trained on 14 languages (optimized for English and Russian)
Use cases:
β’ chatbots
β’ AI assistants
β’ copilots
β’ internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
β€5π3π―1
π $0.15/GB - PROXYFOG.COM β SCALE WITHOUT LIMITS
π Premium Residential & Mobile Proxies
π 60M+ Real IPs β 195 Countries (πΊπΈ USA Included)
π° Prices as low as $0.15/GB
π― Instant & Precise Country Targeting
π Sticky Sessions + Fresh IP on Every Request
βΎοΈ Balance Never Expires
β‘ Built for Arbitrage. Automation. Scraping. Scaling.
β‘ Fast. Stable. High-Performance Infrastructure.
π Website: https://tglink.io/99ba3379f9de68
π© Telegram: https://t.me/proxyfog?utm_source=telegain&utm_medium=cpp&utm_campaign=s1&utm_content=codeprogrammer&utm_term=
Start today. Scale without limits. π
π Premium Residential & Mobile Proxies
π 60M+ Real IPs β 195 Countries (πΊπΈ USA Included)
π° Prices as low as $0.15/GB
π― Instant & Precise Country Targeting
π Sticky Sessions + Fresh IP on Every Request
βΎοΈ Balance Never Expires
β‘ Built for Arbitrage. Automation. Scraping. Scaling.
β‘ Fast. Stable. High-Performance Infrastructure.
π Website: https://tglink.io/99ba3379f9de68
π© Telegram: https://t.me/proxyfog?utm_source=telegain&utm_medium=cpp&utm_campaign=s1&utm_content=codeprogrammer&utm_term=
Start today. Scale without limits. π
β€5
βοΈ 10 Books to Understand How Large Language Models Function (2026)
1. Deep Learning
https://deeplearningbook.org
The definitive reference for neural networks, covering backpropagation, architectures, and foundational concepts.
2. Artificial Intelligence: A Modern Approach
https://aima.cs.berkeley.edu
A fundamental perspective on artificial intelligence as a comprehensive system.
3. Speech and Language Processing
https://web.stanford.edu/~jurafsky/slp3/
An in-depth examination of natural language processing, transformers, and linguistics.
4. Machine Learning: A Probabilistic Perspective
https://probml.github.io/pml-book/
An exploration of probabilities, statistics, and the theoretical foundations of machine learning.
5. Understanding Deep Learning
https://udlbook.github.io/udlbook/
A contemporary explanation of deep learning principles with strong intuitive insights.
6. Designing Machine Learning Systems
https://oreilly.com/library/view/designing-machine-learning/9781098107956/
Strategies for deploying models into production environments.
7. Generative Deep Learning
https://github.com/3p5ilon/ML-books/blob/main/generative-deep-learning-teaching-machines-to-paint-write-compose-and-play.pdf
Practical applications of generative models and transformer architectures.
8. Natural Language Processing with Transformers
https://dokumen.pub/natural-language-processing-with-transformers-revised-edition-1098136799-9781098136796-9781098103248.html
Methodologies for constructing natural language processing systems based on transformers.
9. Machine Learning Engineering
https://mlebook.com
Principles of machine learning engineering and operational deployment.
10. The Hundred-Page Machine Learning Book
https://themlbook.com
A highly concentrated foundational overview without extraneous detail. ππ€
1. Deep Learning
https://deeplearningbook.org
The definitive reference for neural networks, covering backpropagation, architectures, and foundational concepts.
2. Artificial Intelligence: A Modern Approach
https://aima.cs.berkeley.edu
A fundamental perspective on artificial intelligence as a comprehensive system.
3. Speech and Language Processing
https://web.stanford.edu/~jurafsky/slp3/
An in-depth examination of natural language processing, transformers, and linguistics.
4. Machine Learning: A Probabilistic Perspective
https://probml.github.io/pml-book/
An exploration of probabilities, statistics, and the theoretical foundations of machine learning.
5. Understanding Deep Learning
https://udlbook.github.io/udlbook/
A contemporary explanation of deep learning principles with strong intuitive insights.
6. Designing Machine Learning Systems
https://oreilly.com/library/view/designing-machine-learning/9781098107956/
Strategies for deploying models into production environments.
7. Generative Deep Learning
https://github.com/3p5ilon/ML-books/blob/main/generative-deep-learning-teaching-machines-to-paint-write-compose-and-play.pdf
Practical applications of generative models and transformer architectures.
8. Natural Language Processing with Transformers
https://dokumen.pub/natural-language-processing-with-transformers-revised-edition-1098136799-9781098136796-9781098103248.html
Methodologies for constructing natural language processing systems based on transformers.
9. Machine Learning Engineering
https://mlebook.com
Principles of machine learning engineering and operational deployment.
10. The Hundred-Page Machine Learning Book
https://themlbook.com
A highly concentrated foundational overview without extraneous detail. ππ€
β€5π2