ββNeighbourhood Components Analysis
a PyTorch implementation of Neighbourhood Components Analysis
NCA learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.
The authors propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set.
It can also learn low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, this classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them.
The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.
paper (only pdf): https://www.cs.toronto.edu/~hinton/absps/nca.pdf
github: https://github.com/kevinzakka/nca
#kNN #pca #nca #PyTorch
a PyTorch implementation of Neighbourhood Components Analysis
NCA learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.
The authors propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set.
It can also learn low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, this classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them.
The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.
paper (only pdf): https://www.cs.toronto.edu/~hinton/absps/nca.pdf
github: https://github.com/kevinzakka/nca
#kNN #pca #nca #PyTorch
ββOpenCV βdnnβ with NVIDIA GPUs: 1.549% faster YOLO, SSD, and Mask R-CNN
- Object detection and segmentation
- Working Python implementations of each
- Includes pre-trained models
tutorial: https://t.co/Wt0IrJObcE?amp=1
#OpenCV #dl #nvidia
- Object detection and segmentation
- Working Python implementations of each
- Includes pre-trained models
tutorial: https://t.co/Wt0IrJObcE?amp=1
#OpenCV #dl #nvidia
Knowledge Graphs @ AAAI 2020
overview of several topics:
- KG-Augmented Language Models: in different flavours
- Entity Matching in Heterogeneous KGs: finally no manual mappings
- KG Completion and Link Prediction: neuro-symbolic and temporal KGs
- KG-based Conversational AI and Question Answering: going big
Link: https://medium.com/@mgalkin/knowledge-graphs-aaai-2020-c457ad5aafc0
#AAAI2020 #KnowledgeGraph #graph #kg
overview of several topics:
- KG-Augmented Language Models: in different flavours
- Entity Matching in Heterogeneous KGs: finally no manual mappings
- KG Completion and Link Prediction: neuro-symbolic and temporal KGs
- KG-based Conversational AI and Question Answering: going big
Link: https://medium.com/@mgalkin/knowledge-graphs-aaai-2020-c457ad5aafc0
#AAAI2020 #KnowledgeGraph #graph #kg
Medium
Knowledge Graphs @ AAAI 2020
The first major AI event of 2020 is already here! Hope you had a nice holiday break π, or happy New Year if your scientific calendarβ¦
ββBarak Obamaβs deep fake video used as intro to MIT 6.S191 class
Brilliant idea to win attention of students and to demonstrate at the very beggining of the course one of the applications of the materials they have to stydy.
YouTube: https://www.youtube.com/watch?v=l82PxsKHxYc
#DL #DeepFake #MIT #video
Brilliant idea to win attention of students and to demonstrate at the very beggining of the course one of the applications of the materials they have to stydy.
YouTube: https://www.youtube.com/watch?v=l82PxsKHxYc
#DL #DeepFake #MIT #video
YouTube
Barack Obama: Intro to Deep Learning | MIT 6.S191
MIT Introduction to Deep Learning 6.S191 (2020)
DISCLAIMER: The following video is synthetic and was created using deep learning with simultaneous speech-to-speech translation as well as video dialogue replacement (CannyAI).
** NOTE**: The audio qualityβ¦
DISCLAIMER: The following video is synthetic and was created using deep learning with simultaneous speech-to-speech translation as well as video dialogue replacement (CannyAI).
** NOTE**: The audio qualityβ¦
ODS breakfast in Paris! βοΈ π«π· See you this Saturday at 10:30 (some people come around 11:00) at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts. We are expecting from 6 to 17 people.
ββBERT-of-Theseus: Compressing BERT by Progressive Module Replacing
tl;dr
with a huggingface β compatible weights
take original BERT, replace some of his layers with new (smaller) ones randomly during the distillation. the probability of replacing the module will increase over time, resulting in a small model at the end.
them approach leverages only one loss function and one hyper-parameter, liberating human effort from hyper-parameter tuning.
also, they outperform existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression
paper: https://arxiv.org/abs/2002.02925
github: https://github.com/JetRunner/BERT-of-Theseus
#nlp #compressing #knowledge #distillation #bert
tl;dr
[ONE loss] + [ONE hyperparameter] + [NO external data] = GREAT PERFORMANCE
with a huggingface β compatible weights
take original BERT, replace some of his layers with new (smaller) ones randomly during the distillation. the probability of replacing the module will increase over time, resulting in a small model at the end.
them approach leverages only one loss function and one hyper-parameter, liberating human effort from hyper-parameter tuning.
also, they outperform existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression
paper: https://arxiv.org/abs/2002.02925
github: https://github.com/JetRunner/BERT-of-Theseus
#nlp #compressing #knowledge #distillation #bert
ββCatalyst β Accelerated DL & RL
tl;dr
β collect all the technical, dev-heavy, Deep Learning stuff in a framework
β make it easy to re-use boring day-to-day components
β focus on research and hypothesis testing in our projects
Most of the time in Deep Learning all you need to do is to specify the model dataflow, or how batches of data should be fed to the model. Why then, so much of our time is spent implementing those pipelines and debugging training loops rather than developing something new?
They think that it is possible to separate the engineering from the research so that we can invest our time once in the high-quality, reusable engineering backbone and use it across all the projects.
That is how Catalyst was born β an Open Source PyTorch framework, that allows you to write compact but full-features pipelines and let you focus on the core part of your project.
Link: https://github.com/catalyst-team/catalyst
Official TG channel: https://t.me/catalyst_team
tl;dr
β collect all the technical, dev-heavy, Deep Learning stuff in a framework
β make it easy to re-use boring day-to-day components
β focus on research and hypothesis testing in our projects
Most of the time in Deep Learning all you need to do is to specify the model dataflow, or how batches of data should be fed to the model. Why then, so much of our time is spent implementing those pipelines and debugging training loops rather than developing something new?
They think that it is possible to separate the engineering from the research so that we can invest our time once in the high-quality, reusable engineering backbone and use it across all the projects.
That is how Catalyst was born β an Open Source PyTorch framework, that allows you to write compact but full-features pipelines and let you focus on the core part of your project.
Link: https://github.com/catalyst-team/catalyst
Official TG channel: https://t.me/catalyst_team
ββIf you can play rock, paper, scissors with a robot?
ββPhotofeeler-D3
tl;dr: predict first impressions from a photo or video
some interesting items of note:
- notice how Smart is the dominant trait until he takes off his glasses
- when the glasses are taken off, his Attractive score rises
- thereβs a quick dip in scores every time he blinks
- the overall top scores result from the genuine smile at the very end!
Blog post: https://blog.photofeeler.com/photofeeler-d3/
ArXiV: https://arxiv.org/abs/1904.07435
Demo: available to the researchers on the request
#cv #dl #impression
tl;dr: predict first impressions from a photo or video
some interesting items of note:
- notice how Smart is the dominant trait until he takes off his glasses
- when the glasses are taken off, his Attractive score rises
- thereβs a quick dip in scores every time he blinks
- the overall top scores result from the genuine smile at the very end!
Blog post: https://blog.photofeeler.com/photofeeler-d3/
ArXiV: https://arxiv.org/abs/1904.07435
Demo: available to the researchers on the request
#cv #dl #impression
ββZeRO, DeepSpeed & Turing-NLG
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Turing-NLG: A 17-billion-parameter language model by Microsoft
Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models; compatible with PyTorch.
ZeRO β is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained.
ZeRO has three main optimization stages, which correspond to the partitioning of optimizer states, gradients, and parameters. When enabled cumulatively:
0. Optimizer State Partitioning (
1. Add Gradient Partitioning (
2. Add Parameter Partitioning (
They have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters, which you can learn more about in this accompanying blog post. Also, the abstract for Turing-NLG had been written by their own model
ZeRO & DeepSpeed: https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
paper: https://arxiv.org/abs/1910.02054
github: https://github.com/microsoft/DeepSpeed
Turing-NLG: https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
#nlp #dl #ml #microsoft #deepspeed #optimization
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Turing-NLG: A 17-billion-parameter language model by Microsoft
Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models; compatible with PyTorch.
ZeRO β is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained.
ZeRO has three main optimization stages, which correspond to the partitioning of optimizer states, gradients, and parameters. When enabled cumulatively:
0. Optimizer State Partitioning (
P_os_
) β 4x memory reduction, same communication volume as data parallelism1. Add Gradient Partitioning (
P_os+g_
) β 8x memory reduction, same communication volume as data parallelism2. Add Parameter Partitioning (
P_os+g+p_
) β memory reduction is linear with data parallelism degree N_d_
They have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters, which you can learn more about in this accompanying blog post. Also, the abstract for Turing-NLG had been written by their own model
ZeRO & DeepSpeed: https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
paper: https://arxiv.org/abs/1910.02054
github: https://github.com/microsoft/DeepSpeed
Turing-NLG: https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
#nlp #dl #ml #microsoft #deepspeed #optimization
ββSingle biological neuron can compute XOR
The active electrical properties of dendrites shape neuronal input and output and are fundamental to brain function. However, our knowledge of active dendrites has been almost entirely acquired from studies of rodents.
In this work, the authors investigated the dendrites of layer 2 & 3 (L2/3) pyramidal neurons of the human cerebral cortex ex vivo. In these neurons, they discovered a class of calcium-mediated dendritic action potentials (dCaAPs) whose waveform and effects on neuronal output have not been previously described.
In contrast to typical all-or-none action potentials, dCaAPs were graded; their amplitudes were maximal for threshold-level stimuli but dampened for stronger stimuli. These dCaAPs enabled the dendrites of individual human neocortical pyramidal neurons to classify linearly nonseparable inputs β a computation conventionally thought to require multilayered networks.
reddit: https://www.reddit.com/r/MachineLearning/comments/ejbwvb/r_single_biological_neuron_can_compute_xor
#neurons #human #brain
The active electrical properties of dendrites shape neuronal input and output and are fundamental to brain function. However, our knowledge of active dendrites has been almost entirely acquired from studies of rodents.
In this work, the authors investigated the dendrites of layer 2 & 3 (L2/3) pyramidal neurons of the human cerebral cortex ex vivo. In these neurons, they discovered a class of calcium-mediated dendritic action potentials (dCaAPs) whose waveform and effects on neuronal output have not been previously described.
In contrast to typical all-or-none action potentials, dCaAPs were graded; their amplitudes were maximal for threshold-level stimuli but dampened for stronger stimuli. These dCaAPs enabled the dendrites of individual human neocortical pyramidal neurons to classify linearly nonseparable inputs β a computation conventionally thought to require multilayered networks.
reddit: https://www.reddit.com/r/MachineLearning/comments/ejbwvb/r_single_biological_neuron_can_compute_xor
#neurons #human #brain
ββAutoFlip: An Open Source Framework for Intelligent Video Reframing
Google released a tool for smart video cropping. Video cropping doesn't seem like a poblem until you release that object that should be in focus can be in different parts of picture. Now there is great attempt to provide one-click solution to cropping.
Interesting part: #AutoFlip is an application of #MediaPipe framework for building multimodal ML #pipelines.
Github: https://github.com/google/mediapipe/blob/master/mediapipe/docs/autoflip.md
MediaPipe: https://github.com/google/mediapipe/
#Google #GoogleAI #DL #CV
Google released a tool for smart video cropping. Video cropping doesn't seem like a poblem until you release that object that should be in focus can be in different parts of picture. Now there is great attempt to provide one-click solution to cropping.
Interesting part: #AutoFlip is an application of #MediaPipe framework for building multimodal ML #pipelines.
Github: https://github.com/google/mediapipe/blob/master/mediapipe/docs/autoflip.md
MediaPipe: https://github.com/google/mediapipe/
#Google #GoogleAI #DL #CV
Forwarded from ΠΠ°Ρ
ΠΎΠ΄ΠΊΠΈ Π² ΠΎΠΏΠ΅Π½ΡΠΎΡΡΠ΅
ββDisappearing-People - Person removal from complex backgrounds over time.
Removing people from complex backgrounds in real time using TensorFlow.js in the web browser using #js
https://github.com/jasonmayes/Real-Time-Person-Removal
Looks awesome!
Removing people from complex backgrounds in real time using TensorFlow.js in the web browser using #js
https://github.com/jasonmayes/Real-Time-Person-Removal
Looks awesome!
conversation Lex Fridman with Andrew Ng
one of the most impactful educators, researchers, innovators, and leaders in the history of artificial intelligence. he has helped educate and inspire millions of people
outline: 1st in AI, early days of online education & DL, teaching on a whiteboard, Pieter Abbeel, deeplearning.ai, landing.ai, AI fund, deeplearning.ai, unsupervised learning, career in DL, PhD, Artificial general intelligence
video-podcast: https://youtu.be/0jspaMLxBig
one of the most impactful educators, researchers, innovators, and leaders in the history of artificial intelligence. he has helped educate and inspire millions of people
outline: 1st in AI, early days of online education & DL, teaching on a whiteboard, Pieter Abbeel, deeplearning.ai, landing.ai, AI fund, deeplearning.ai, unsupervised learning, career in DL, PhD, Artificial general intelligence
video-podcast: https://youtu.be/0jspaMLxBig
YouTube
Andrew Ng: Deep Learning, Education, and Real-World AI | Lex Fridman Podcast #73
Andrew Ng is one of the most impactful educators, researchers, innovators, and leaders in artificial intelligence and technology space in general. He co-founded Coursera and Google Brain, launched deeplearning.ai, Landing.ai, and the AI fund, and was theβ¦
ODS breakfast in Paris! βοΈ π«π· See you this Saturday at 10:30 (some people come around 11:00) at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts. We are expecting from 6 to 18 people.
ββWhat is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
β extraction of a sub-network of trending Wikipedia articles and identification of trends
β extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
β labeling of the trends with high-level topics using the extracted keywords
paper: https://arxiv.org/abs/2002.06885
github: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
β extraction of a sub-network of trending Wikipedia articles and identification of trends
β extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
β labeling of the trends with high-level topics using the extracted keywords
paper: https://arxiv.org/abs/2002.06885
github: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
ββOn Identifiability in Transformers
The authors tried to understanding better transformers from identifiability.
They started by proving that attention weights are non-identifiable when the sequence length is longer than the attention head dimension. Thus, infinitely many attention distributions can lead to the same internal representation and model output. They propose effective attention, a method that improves the interpretability of attention weights by projecting out the null space.
Also, showed that tokens remain largely identifiable through a learned linear transformation followed by the nearest neighbor lookup based on cosine similarity. However, input tokens gradually become less identifiable in later layers.
Presented Hidden Token Attribution, a gradient-based method to quantify information mixing. This method is general and can be used to investigate contextual embeddings in self-attention based models.
paper: https://arxiv.org/abs/1908.04211
#nlp #transformer #interpretability #attention #ICLR2020
The authors tried to understanding better transformers from identifiability.
They started by proving that attention weights are non-identifiable when the sequence length is longer than the attention head dimension. Thus, infinitely many attention distributions can lead to the same internal representation and model output. They propose effective attention, a method that improves the interpretability of attention weights by projecting out the null space.
Also, showed that tokens remain largely identifiable through a learned linear transformation followed by the nearest neighbor lookup based on cosine similarity. However, input tokens gradually become less identifiable in later layers.
Presented Hidden Token Attribution, a gradient-based method to quantify information mixing. This method is general and can be used to investigate contextual embeddings in self-attention based models.
paper: https://arxiv.org/abs/1908.04211
#nlp #transformer #interpretability #attention #ICLR2020
ββA Deep Learning Approach to Antibiotic Discovery
A new antibiotic was found using DL, claimed to be effective against several bacteria resistant to existing antibiotics on mice in the lab.
The problem with finding good antibiotics is potential molecule space is prohibitively large to test all possible molecules in the lab. They train a model that receives the molecule as a graph and tries to predict how effective it is against E. coli.
For every edge, they run single-layer NN receiving activations and features of input node and edge features, and producing new activations for the edge. Activations of the node = sum of all incoming edge activations. Overall activations vector for molecule = sum of all nodes.
Finally, there's a 2-layer NN receiving overall vector for the molecule + some standard handcrafted features with a binary classification output, trained end-to-end on 2.3K molecules. Then, they predict on the dataset of 6K molecules being in different stages of the investigation.
They looked at the top 51 predictions and manually ranked them by not being similar to the training dataset, being far in the investigation stage and scoring low on the external toxicity model. The top pick is what they called Halicin and tested in the lab.
article (only pdf): https://www.cell.com/cell/pdf/S0092-8674(20)30102-1.pdf
ps
thx @Sim0nsays for his cool abstract from the twitter
#medicine #molecule #antibiotic #dl #graph
A new antibiotic was found using DL, claimed to be effective against several bacteria resistant to existing antibiotics on mice in the lab.
The problem with finding good antibiotics is potential molecule space is prohibitively large to test all possible molecules in the lab. They train a model that receives the molecule as a graph and tries to predict how effective it is against E. coli.
For every edge, they run single-layer NN receiving activations and features of input node and edge features, and producing new activations for the edge. Activations of the node = sum of all incoming edge activations. Overall activations vector for molecule = sum of all nodes.
Finally, there's a 2-layer NN receiving overall vector for the molecule + some standard handcrafted features with a binary classification output, trained end-to-end on 2.3K molecules. Then, they predict on the dataset of 6K molecules being in different stages of the investigation.
They looked at the top 51 predictions and manually ranked them by not being similar to the training dataset, being far in the investigation stage and scoring low on the external toxicity model. The top pick is what they called Halicin and tested in the lab.
article (only pdf): https://www.cell.com/cell/pdf/S0092-8674(20)30102-1.pdf
ps
thx @Sim0nsays for his cool abstract from the twitter
#medicine #molecule #antibiotic #dl #graph
Forwarded from Graph Machine Learning
Fresh picks from ArXiv
More ICML and KDD submissions and large body on mathematical graph theory π
ICML
Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization
Neural Networks on Random Graphs
Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing
Adaptive Graph Auto-Encoder for General Data Clustering
Computationally Tractable Riemannian Manifolds for Graph Embeddings
Set2Graph: Learning Graphs From Sets
Node Masking: Making Graph Neural Networks Generalize and Scale Better
Deep Graph Mapper: Seeing Graphs through the Neural Lens
Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games by Microsoft and group of William L. Hamilton
Learning to Simulate Complex Physics with Graph Networks by Deepmind + group of Jure Leskovec
KDD
Self-Enhanced GNN: Improving Graph Neural Networks UsingModel Outputs
Graph4Code: A Machine Interpretable Knowledge Graph for Code
Localized Flow-Based Clustering in Hypergraphs by group of Jon Kleinberg
WWW
Beyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction
Graph Theory
Building large k-cores from sparse graphs
Distributed graph problems through an automata-theoretic lens
Computing the k Densest Subgraphs of a Graph
Seeing Far vs. Seeing Wide: Volume Complexity of Local Graph Problems
Planar graphs have bounded queue-number
Review
Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations
More ICML and KDD submissions and large body on mathematical graph theory π
ICML
Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization
Neural Networks on Random Graphs
Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing
Adaptive Graph Auto-Encoder for General Data Clustering
Computationally Tractable Riemannian Manifolds for Graph Embeddings
Set2Graph: Learning Graphs From Sets
Node Masking: Making Graph Neural Networks Generalize and Scale Better
Deep Graph Mapper: Seeing Graphs through the Neural Lens
Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games by Microsoft and group of William L. Hamilton
Learning to Simulate Complex Physics with Graph Networks by Deepmind + group of Jure Leskovec
KDD
Self-Enhanced GNN: Improving Graph Neural Networks UsingModel Outputs
Graph4Code: A Machine Interpretable Knowledge Graph for Code
Localized Flow-Based Clustering in Hypergraphs by group of Jon Kleinberg
WWW
Beyond Clicks: Modeling Multi-Relational Item Graph for Session-Based Target Behavior Prediction
Graph Theory
Building large k-cores from sparse graphs
Distributed graph problems through an automata-theoretic lens
Computing the k Densest Subgraphs of a Graph
Seeing Far vs. Seeing Wide: Volume Complexity of Local Graph Problems
Planar graphs have bounded queue-number
Review
Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations
Popular example of application AI to fashion
Ai can be used for chair design. Some generative models can definately be used in the fashion industry.
Link: https://qz.com/1770508/an-emerging-japanese-startup-is-mining-tradition-to-create-a-more-sustainable-fashion-future/
#aiapplication #generativedesign #meta
Ai can be used for chair design. Some generative models can definately be used in the fashion industry.
Link: https://qz.com/1770508/an-emerging-japanese-startup-is-mining-tradition-to-create-a-more-sustainable-fashion-future/
#aiapplication #generativedesign #meta