Deep Learning

99 viewsVadim ☯️, edited 00:45

https://tf-explain.readthedocs.io/en/latest/index.html
tf-explain offers interpretability methods for Tensorflow 2.0 to ease neural network’s understanding.
#Frameworks

84 viewsVadim ✨🪁, 21:50

Deep Learning

#Tips Efficient Training Large Models on Multiple GPUs, Main Concepts (from https://huggingface.co/docs/transformers/perf_train_gpu_many):

DataParallel (DP) - the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) - each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) - the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) - Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesn’t need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP - is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.

#Frameworks :
https://www.deepspeed.ai/
https://fairscale.readthedocs.io/en/latest/
https://github.com/tunib-ai/oslo
https://github.com/microsoft/varuna

huggingface.co

Parallelism methods

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

111 viewsVadim ✨🪁, 15:54

Deep Learning

[2302.14045] Language Is Not All You Need: Aligning Perception with Language Models
https://arxiv.org/abs/2302.14045
#paper New generation

107 viewsVadim ✨🪁, edited 17:34

Deep Learning

https://github.com/huggingface/peft #Frameworks

GitHub

GitHub - huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft

108 viewsVadim ✨, 15:11

Deep Learning

https://arxiv.org/abs/2207.06881 #Paper Recurrent Memory Transformer. Scaling transformer architecture to long sequences.

116 viewsVadim ✨, 19:40

Deep Learning

https://www.deepspeed.ai/training/ Large scale training #Frameworks

DeepSpeed

Training Overview and Features

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

❤1

112 viewsVadim ✨, 18:26

Deep Learning

https://arxiv.org/pdf/2307.02486.pdf Scaling Transformers to
1,000,000,000 Tokens #Paper

113 viewsVadim ✨, 15:42

Deep Learning

https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action
New model translates vision and language into action based on LLM

Google DeepMind

RT-2: New model translates vision and language into action

Introducing Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for...

119 viewsVadim ✨, 16:01

Deep Learning

https://huyenchip.com/2023/08/16/llm-research-open-challenges.html

Chip Huyen

Open challenges in LLM research

[LinkedIn discussion, Twitter thread]

116 viewsArt 🐲 Zaborskiy, 07:48

Deep Learning

https://www.anyscale.com/blog/continuous-batching-llm-inference

LLM inference acceleration #Frameworks

Anyscale

Achieve 23x LLM Inference Throughput & Reduce p50 Latency

In this blog, we discuss continuous batching, a critical systems-level optimization that improves both throughput and latency under load for LLMs.

❤1

81 viewsVadim ✨, 16:42

Deep Learning

Medium

Goodbye databases, it’s time to embrace Vector Databases!

The AI revolution is reshaping industries, promising remarkable innovations while introducing new challenges. In this transformative…

https://codemaker2016.medium.com/goodbye-databases-its-time-to-embrace-vector-databases-0ffa7879980e
#Tips

🔥1

86 viewsVadim ✨, 13:18

Deep Learning

https://llava-vl.github.io/blog/2024-01-30-llava-next/

#Frameworks #Models

LLaVA

LLaVA-NeXT: Improved reasoning, OCR, and world knowledge

LLaVA team presents LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.

75 viewsVadim ✨, 18:50

Deep Learning

https://huggingface.co/collections/osanseviero/model-merging-65097893623330a3a51ead66

Model Merging: papers
#Paper

huggingface.co

Model Merging - a osanseviero Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!

🔥1

82 viewsVadim ✨, 05:34

Deep Learning

https://github.com/arcee-ai/mergekit
Model Merging: toolkit
#Frameworks

GitHub

GitHub - arcee-ai/mergekit: Tools for merging pretrained large language models.

Tools for merging pretrained large language models. - arcee-ai/mergekit

82 viewsVadim ✨, 05:34

Deep Learning

https://arxiv.org/pdf/2404.19756 Kolmogorov-Arnold Networks #paper

76 viewsVadim ✨, 16:43

Deep Learning

https://kindxiaoming.github.io/pykan/ Kolmogorov Arnold Network #framework #library

81 viewsVadim ✨, 16:44

Deep Learning

https://github.com/SeldonIO/alibi-detect Algorithms for outlier, adversarial and drift detection
https://github.com/SeldonIO/alibi Algorithms for explaining machine learning models
#Frameworks #library #anomaly #drift

GitHub

GitHub - SeldonIO/alibi-detect: Algorithms for outlier, adversarial and drift detection

Algorithms for outlier, adversarial and drift detection - SeldonIO/alibi-detect

72 viewsVadim ✨, edited 00:40

Deep Learning

Grokking:
1. Fist paper: https://arxiv.org/abs/2201.02177
2. Transformers: https://arxiv.org/pdf/2405.15071
3. Simple framework: https://arxiv.org/pdf/2405.20233

arXiv.org

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and...

78 viewsVadim ✨, 04:43

Deep Learning

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries.
https://mars-project.readthedocs.io/
#Frameworks

75 viewsVadim, edited 02:00

Deep Learning

Tutorial: Scalable and Distributed ML Workflows with DVC and Ray
Part 1: https://dvc.ai/blog/dvc-ray
Part 2: https://dvc.ai/blog/dvc-ray-part-2

DVC AI

Tutorial: Scalable and Distributed ML Workflows with DVC and Ray (Part 1)

This tutorial introduces you to integrating DVC (Data Version Control) with Ray, turning them into your go-to toolkit for creating automated, scalable, and distributed ML pipelines.

87 viewsVadim, 02:37

About

Blog

Apps

Platform