Artem Ryblov’s Data Science Weekly
282 subscribers
71 photos
95 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
The System Design Primer. Learn how to design large-scale systems.

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

This repo is an organized collection of resources to help you learn how to build systems at scale.

Link: https://github.com/donnemartin/system-design-primer#the-system-design-primer

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #systemdesign #softwareengineering #softwaredevelopment #engineer #learning #design #help

@data_science_weekly
CS 329S: Machine Learning Systems Design

This course aims to provide an iterative framework for developing real-world machine learning systems that are deployable, reliable, and scalable.
It starts by considering all stakeholders of each machine learning project and their objectives. Different objectives require different design choices, and this course will discuss the tradeoffs of those choices.
Students will learn about data management, data engineering, feature engineering, approaches to model selection, training, scaling, how to continually monitor and deploy changes to ML systems, as well as the human side of ML projects such as team structure and business metrics.

Link: https://stanford-cs329s.github.io/index.html#overview

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #mlsystemdesign #systemdesign #machinelearningsystemdesign #machinelearning #algorithms #design #architecture #engineering #software

@data_science_weekly
Machine Learning System Design by Valerii Babushkin and Arseny Kravchenko

Get the big picture and the important details with this end-to-end guide for designing highly effective, reliable machine learning systems.

In "Machine Learning System Design: With end-to-end examples" you will learn:
- The big picture of machine learning system design
- Analyzing a problem space to identify the optimal ML solution
- Ace ML system design interviews
- Selecting appropriate metrics and evaluation criteria
- Prioritizing tasks at different stages of ML system design
- Solving dataset-related problems through data gathering, error analysis, and feature engineering
- Recognizing common pitfalls in ML system development
- Designing ML systems to be lean, maintainable, and extensible over time

Link: https://www.manning.com/books/machine-learning-system-design

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #systemdesign #machinelearningsystemdesign

@data_science_weekly
Machine Learning Engineering Online Book by Stas Bekman

An open collection of methodologies to help with successful training of large language models and multi-modal models.

This is a technical material suitable for LLM/VLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly address your needs.

This repo is an ongoing brain dump of my experiences training Large Language Models (LLM) (and VLMs); a lot of the know-how Stas acquired while training the open-source BLOOM-176B model in 2022 and IDEFICS-80B multi-modal model in 2023. Currently, he is working on developing/training open-source Retrieval Augmented models at Contextual.AI.

Table of Contents
Part 1. Insights
- The AI Battlefield Engineering - What You Need To Know
Part 2. Key Hardware Components
- Accelerator - the work horses of ML - GPUs, TPUs, IPUs, FPGAs, HPUs, QPUs, RDUs (WIP)
- Network - intra-node and inter-node connectivity, calculating bandwidth requirements
- IO - local and distributed disks and filesystems
- CPU - cpus, affinities (WIP)
- CPU Memory - how much CPU memory is enough - the shortest chapter ever.
Part 3. Performance
- Fault Tolerance
- Performance
- Multi-Node networking
- Model parallelism
Part 4. Operating
- SLURM
- Training hyper-parameters and model initializations
- Instabilities
Part 5. Development
- Debugging software and hardware failures
- And more debugging
- Reproducibility
- Tensor precision / Data types
- HF Transformers notes - making small models, tokenizers, datasets, and other tips
Part 6. Miscellaneous
- Resources - LLM/VLM chronicles

Link: https://github.com/stas00/ml-engineering

Navigational hashtags: #armknowledgesharing #armbooks #armrepo
General hashtags: #llm #gpt #gpt3 #gpt4 #ml #engineering #mlsystemdesign #systemdesign #reproducibility #performance

@data_science_weekly
System Design
Learn how to design systems at scale and prepare for system design interviews

What is system design?
System design is the process of defining the architecture, interfaces, and data for a system that satisfies specific requirements. System design meets the needs of your business or organization through coherent and efficient systems. It requires a systematic approach to building and engineering systems. A good system design requires us to think about everything, from infrastructure all the way down to the data and how it's stored.

Table of contents

- Getting Started
What is system design?
- Chapter I
IP, OSI Model, TCP and UDP, Domain Name System (DNS), Load Balancing, Clustering, Caching, Content Delivery Network (CDN), Proxy, Availability, Scalability, Storage
- Chapter II
Databases and DBMS, SQL databases, NoSQL databases, SQL vs NoSQL databases, Database Replication, Indexes, Normalization and Denormalization, ACID and BASE consistency models, CAP theorem, PACELC Theorem, Transactions, Distributed Transactions, Sharding, Consistent Hashing, Database Federation
- Chapter III
N-tier architecture, Message Brokers, Message Queues, Publish-Subscribe, Enterprise Service Bus (ESB), Monoliths and Microservices, Event-Driven Architecture (EDA), Event Sourcing, Command and Query Responsibility Segregation (CQRS), API Gateway, REST, GraphQL, gRPC, Long polling, WebSockets, Server-Sent Events (SSE)
- Chapter IV
Geohashing and Quadtrees, Circuit breaker, Rate Limiting, Service Discovery, SLA, SLO, SLI, Disaster recovery, Virtual Machines (VMs) and Containers, OAuth 2.0 and OpenID Connect (OIDC), Single Sign-On (SSO), SSL, TLS, mTLS
- Chapter V
System Design Interviews, URL Shortener, WhatsApp, Twitter, Netflix, Uber
- Appendix
Next Steps, References

Links:
- Direct link to the site with the course
- Direct link to the repository for the course
- Content Guide link
- Topic Guide link

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #systemdesign

@data_science_weekly
Designing Machine Learning Systems by Chip Huyen

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing a monitoring system to quickly detect and address issues your models might encounter in production
- Architecting an ML platform that serves across use cases
- Developing responsible ML systems

Link: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearningsystemdesign #systemdesign #machinelearning #ml #designingmachinelearningsystems

@data_science_weekly