https://elastiknn.com/ Elasticsearch Plugin for Nearest Neighbor Search on dense vectors
#Tools #Library
#Tools #Library
Elastiknn
Home
Elasticsearch Plugin for Nearest Neighbor Search
https://arxiv.org/pdf/2103.14030.pdf #paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
https://shap.readthedocs.io/en/latest/index.html
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
#Framework
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
#Framework
https://docs.determined.ai/latest/index.html#
#Framework distributed training, hyperparameter tuning
https://www.determined.ai/blog/data-version-control-determined
#Framework distributed training, hyperparameter tuning
https://www.determined.ai/blog/data-version-control-determined
Determined AI
Managing ML Training Data with DVC and Determined
Tracking machine learning data sets made easy with Data Version Control (DVC) and Determined.
https://keras.io/keras_tuner/
#Framework KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search.
#Framework KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search.
keras.io
Keras documentation: KerasTuner
โค1
https://www.microsoft.com/en-us/research/project/document-ai/
Microsoft Document AI (Intelligent Document Processing) #Framework
Microsoft Document AI (Intelligent Document Processing) #Framework
https://www.linkedin.com/posts/smasis_machinelearning-math-datascience-activity-6951137542079467520-4Bb8/
grid search vs Bayesian Optimization for hyperparameter tuning
grid search vs Bayesian Optimization for hyperparameter tuning
Linkedin
It irks me to see that grid search is still the most popular ๐ต๐๐ฝ๐ฒ๐ฟ๐ฝ๐ฎ๐ฟ๐ฎ๐บ๐ฒ๐๐ฒ๐ฟ - Serg Masรญs on LinkedIn | 96 comments
It irks me to see that grid search is still the most popular ๐ต๐๐ฝ๐ฒ๐ฟ๐ฝ๐ฎ๐ฟ๐ฎ๐บ๐ฒ๐๐ฒ๐ฟ ๐๐๐ป๐ถ๐ป๐ด method despite being usually the most inefficient... 96 comments on LinkedIn
https://github.com/yzhao062/pyod outlier/anomaly detection - many methods in one library #framework #library
GitHub
GitHub - yzhao062/pyod: A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques - yzhao062/pyod
https://faker.readthedocs.io/en/stable/index.html
https://sdv.dev/
https://gretel.ai/synthetics
Synthetic Data Generators! #Frameworks #Library
https://sdv.dev/
https://gretel.ai/synthetics
Synthetic Data Generators! #Frameworks #Library
sdv.dev
The Synthetic Data Vault. Put synthetic data to work!
The Synthetic Data Vault (SDV) enables end users to easily generate synthetic data for different data modalities, including single table, relational and time series data.
https://medmnist.com/
MedMNIST: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification
#Dataset
MedMNIST: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification
#Dataset
https://arxiv.org/pdf/2208.07339.pdf
https://huggingface.co/blog/hf-bitsandbytes-integration
#Performance
https://huggingface.co/blog/hf-bitsandbytes-integration
#Performance
huggingface.co
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
High-Performance Large-Scale Image Recognition Without Normalization
https://arxiv.org/pdf/2102.06171.pdf #Paper
https://arxiv.org/pdf/2102.06171.pdf #Paper
https://tf-explain.readthedocs.io/en/latest/index.html
tf-explain offers interpretability methods for Tensorflow 2.0 to ease neural networkโs understanding.
#Frameworks
tf-explain offers interpretability methods for Tensorflow 2.0 to ease neural networkโs understanding.
#Frameworks
#Tips Efficient Training Large Models on Multiple GPUs, Main Concepts (from https://huggingface.co/docs/transformers/perf_train_gpu_many):
DataParallel (DP) - the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) - each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) - the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) - Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesnโt need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP - is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.
#Frameworks :
https://www.deepspeed.ai/
https://fairscale.readthedocs.io/en/latest/
https://github.com/tunib-ai/oslo
https://github.com/microsoft/varuna
DataParallel (DP) - the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
TensorParallel (TP) - each tensor is split up into multiple chunks, so instead of having the whole tensor reside on a single gpu, each shard of the tensor resides on its designated gpu. During processing each shard gets processed separately and in parallel on different GPUs and the results are synced at the end of the step. This is what one may call horizontal parallelism, as the splitting happens on horizontal level.
PipelineParallel (PP) - the model is split up vertically (layer-level) across multiple GPUs, so that only one or several layers of the model are places on a single gpu. Each gpu processes in parallel different stages of the pipeline and working on a small chunk of the batch.
Zero Redundancy Optimizer (ZeRO) - Also performs sharding of the tensors somewhat similar to TP, except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesnโt need to be modified. It also supports various offloading techniques to compensate for limited GPU memory.
Sharded DDP - is another name for the foundational ZeRO concept as used by various other implementations of ZeRO.
#Frameworks :
https://www.deepspeed.ai/
https://fairscale.readthedocs.io/en/latest/
https://github.com/tunib-ai/oslo
https://github.com/microsoft/varuna
huggingface.co
Parallelism methods
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
[2302.14045] Language Is Not All You Need: Aligning Perception with Language Models
https://arxiv.org/abs/2302.14045
#paper New generation
https://arxiv.org/abs/2302.14045
#paper New generation