https://tf-explain.readthedocs.io/en/latest/index.html
tf-explain offers interpretability methods for Tensorflow 2.0 to ease neural network’s understanding.
#Frameworks
tf-explain offers interpretability methods for Tensorflow 2.0 to ease neural network’s understanding.
#Frameworks
#Tips Efficient Training Large Models on Multiple GPUs, Main Concepts (from https://huggingface.co/docs/transformers/perf_train_gpu_many):
DataParallel (DP) - the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) - each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) - the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) - Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesn’t need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP - is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.
#Frameworks :
https://www.deepspeed.ai/
https://fairscale.readthedocs.io/en/latest/
https://github.com/tunib-ai/oslo
https://github.com/microsoft/varuna
DataParallel (DP) - the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) - each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) - the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) - Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesn’t need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP - is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.
#Frameworks :
https://www.deepspeed.ai/
https://fairscale.readthedocs.io/en/latest/
https://github.com/tunib-ai/oslo
https://github.com/microsoft/varuna
huggingface.co
Parallelism methods
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://www.anyscale.com/blog/continuous-batching-llm-inference
LLM inference acceleration #Frameworks
LLM inference acceleration #Frameworks
Anyscale
Achieve 23x LLM Inference Throughput & Reduce p50 Latency
In this blog, we discuss continuous batching, a critical systems-level optimization that improves both throughput and latency under load for LLMs.
❤1
https://github.com/SeldonIO/alibi-detect Algorithms for outlier, adversarial and drift detection
https://github.com/SeldonIO/alibi Algorithms for explaining machine learning models
#Frameworks #library #anomaly #drift
https://github.com/SeldonIO/alibi Algorithms for explaining machine learning models
#Frameworks #library #anomaly #drift
GitHub
GitHub - SeldonIO/alibi-detect: Algorithms for outlier, adversarial and drift detection
Algorithms for outlier, adversarial and drift detection - SeldonIO/alibi-detect
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries.
https://mars-project.readthedocs.io/
#Frameworks
https://mars-project.readthedocs.io/
#Frameworks