AI researchers allege that machine learning is alchemy | Science | AAAS
https://www.sciencemag.org/news/2018/05/ai-researchers-allege-machine-learning-alchemy
https://www.sciencemag.org/news/2018/05/ai-researchers-allege-machine-learning-alchemy
Science
AI researchers allege that machine learning is alchemy
Study cites ways to bolster scientific foundations of artificial intelligence
#ML
Silla CN, Freitas AA. A survey of hierarchical classification across different application domains. Data Min Knowl Discov. 2011;22: 31–72. doi:10.1007/s10618-010-0175-9
A survey paper on hierarchical classification problems. It is a bit old as it didn’t consider the classifier chains, but this paper summarizes most of the ideas in hierarchical classification.
The authors also proposed a framework for the categorization of such problems using two different dimensions (ranks).
Silla CN, Freitas AA. A survey of hierarchical classification across different application domains. Data Min Knowl Discov. 2011;22: 31–72. doi:10.1007/s10618-010-0175-9
A survey paper on hierarchical classification problems. It is a bit old as it didn’t consider the classifier chains, but this paper summarizes most of the ideas in hierarchical classification.
The authors also proposed a framework for the categorization of such problems using two different dimensions (ranks).
#ML
Voss, et al., "Branch Specialization", Distill, 2021. https://distill.pub/2020/circuits/branch-specialization/
TLDR;
- Branch: neuron clusters that are roughly segregated locally, e.g., AlexNet branches by design.
- Branch specialization: branches specialize in specific tasks, e.g., the two AlexNet branches specialize in different detectors (color detector or black-white filter).
- Is it a coincidence? No. Branch specialization repeatedly occurs in different trainings and different models.
- Do we find the same branch specializations in different models and tasks? Yes.
- Why? The authors' proposal is that a positive feedback loop will be established between layers, and this loop enhances what the branch will do.
- Our brains have specialized regions too. Are there any connections?
Voss, et al., "Branch Specialization", Distill, 2021. https://distill.pub/2020/circuits/branch-specialization/
TLDR;
- Branch: neuron clusters that are roughly segregated locally, e.g., AlexNet branches by design.
- Branch specialization: branches specialize in specific tasks, e.g., the two AlexNet branches specialize in different detectors (color detector or black-white filter).
- Is it a coincidence? No. Branch specialization repeatedly occurs in different trainings and different models.
- Do we find the same branch specializations in different models and tasks? Yes.
- Why? The authors' proposal is that a positive feedback loop will be established between layers, and this loop enhances what the branch will do.
- Our brains have specialized regions too. Are there any connections?
Distill
Branch Specialization
When a neural network layer is divided into multiple branches, neurons self-organize into coherent groupings.
I would like to say thank you for following this channel.
I use this channel as a notebook. Sometimes, I wonder if we could have more interactions. Maybe we could start with this question:
Which of the following do you read the most? (Multiple choice)
I use this channel as a notebook. Sometimes, I wonder if we could have more interactions. Maybe we could start with this question:
Which of the following do you read the most? (Multiple choice)
Anonymous Poll
47%
Data science (career related)
63%
Data science (technical)
47%
Machine learning (theoritical)
37%
Machine learning (applications, libraries)
21%
Something else (I would appreciate it if you leave a comment)
#DS
Wing JM. Ten research challenge areas in data science. Harvard Data Science Review. 2020;114: 1574–1596. doi:10.1162/99608f92.c6577b1f
https://hdsr.mitpress.mit.edu/pub/d9j96ne4/release/2
Wing JM. Ten research challenge areas in data science. Harvard Data Science Review. 2020;114: 1574–1596. doi:10.1162/99608f92.c6577b1f
https://hdsr.mitpress.mit.edu/pub/d9j96ne4/release/2
Harvard Data Science Review
Ten Research Challenge Areas in Data Science · Issue 2.3, Summer 2020
#statistics
This is the original paper of Fraser information.
Fisher information measures the second moment of the model sensitivity; Shannon information measures compressed information or variation of the information; Kullback (aka KL divergence) distinguishes two distributions.
Instead of defining a measure of information for different conditions, Fraser tweaked the Shannon information slightly and made it more generic. The Fraser information can be reduced to Fisher information, Shannon information, and Kullback information under certain conditions.
It is such a simple yet powerful idea.
Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061
https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-36/issue-3/On-Information-in-Statistics/10.1214/aoms/1177700061.full
This is the original paper of Fraser information.
Fisher information measures the second moment of the model sensitivity; Shannon information measures compressed information or variation of the information; Kullback (aka KL divergence) distinguishes two distributions.
Instead of defining a measure of information for different conditions, Fraser tweaked the Shannon information slightly and made it more generic. The Fraser information can be reduced to Fisher information, Shannon information, and Kullback information under certain conditions.
It is such a simple yet powerful idea.
Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061
https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-36/issue-3/On-Information-in-Statistics/10.1214/aoms/1177700061.full
EUCLID
On Information in Statistics
The three familiar definitions of statistical information, Fisher (1925), Shannon (1948), and Kullback (1959), are closely tied to asymptotic properties, hypothesis testing, and a general principle that information should be additive. A definition of information…
#DS #ML
The “AI Expert Roadmap”. This can be used as a checklist of prelims for data people.
https://i.am.ai/roadmap/#note
The “AI Expert Roadmap”. This can be used as a checklist of prelims for data people.
https://i.am.ai/roadmap/#note
AI Roadmap
Follow these roadmaps to become an Artificial Intelligence expert.
#DS
(This is an automated post by IFTTT.)
It is always good for a data scientist to understand more about data engineering. With some basic data engineering knowledge in mind, we can navigate through the blueprint of a fully productionized data project at any time. In this blog post, I listed some of the key concepts and tools that I learned in the past.
This is my blog post on Datumorphism https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/
(This is an automated post by IFTTT.)
It is always good for a data scientist to understand more about data engineering. With some basic data engineering knowledge in mind, we can navigate through the blueprint of a fully productionized data project at any time. In this blog post, I listed some of the key concepts and tools that I learned in the past.
This is my blog post on Datumorphism https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/
datumorphism.leima.is
Data Engineering for Data Scientists: Checklist
A checklist to get a shallow understanding of the basics and the ecosystem
#DS #EDA #Visualization
If you are keen on data visualization, the new Observable Plot is something exciting for you.
Observable Plot is based on d3 but it is easier to use in Observable Notebook. It also follows the guidelines of the layered grammar of graphics (e.g., marks, scales, transforms, facets.).
https://observablehq.com/@observablehq/plot
If you are keen on data visualization, the new Observable Plot is something exciting for you.
Observable Plot is based on d3 but it is easier to use in Observable Notebook. It also follows the guidelines of the layered grammar of graphics (e.g., marks, scales, transforms, facets.).
https://observablehq.com/@observablehq/plot
Observable
Observable Plot
The JavaScript library for exploratory data visualization
#career #DS
I believe this article is relevant.
Most data scientists have very good academic records. These experiences of excellence compete with another required quality in the industry: The ability to survive in a less ideal yet competitive environment.
We could be stubborn and find the environment that we fit well in or adapt based on the business playbook. Either way is good for us as long as we find the path that we love.
(I have a joke about this article: To reasoning productively, we do not need references for our claims at all.)
https://hbr.org/1991/05/teaching-smart-people-how-to-learn#
I believe this article is relevant.
Most data scientists have very good academic records. These experiences of excellence compete with another required quality in the industry: The ability to survive in a less ideal yet competitive environment.
We could be stubborn and find the environment that we fit well in or adapt based on the business playbook. Either way is good for us as long as we find the path that we love.
(I have a joke about this article: To reasoning productively, we do not need references for our claims at all.)
https://hbr.org/1991/05/teaching-smart-people-how-to-learn#
Harvard Business Review
Teaching Smart People How to Learn
Every company faces a learning dilemma: the smartest people find it the hardest to learn.
#ML
An interesting talk:
-------------------
Dear all,
We are pleased to have Anna Golubeva speak on "Are wider nets better given the same number of parameters?" on Wednesday May 19th at 12:00 ET.
You can find further details here and listen to the talk here.
We hope you can join!
Best,
Sven
An interesting talk:
-------------------
Dear all,
We are pleased to have Anna Golubeva speak on "Are wider nets better given the same number of parameters?" on Wednesday May 19th at 12:00 ET.
You can find further details here and listen to the talk here.
We hope you can join!
Best,
Sven
www.physicsmeetsml.org
Are wider nets better given the same number of parameters?
Anna Golubeva, Perimeter Institute, 12:00 ET
#DS
https://octo.github.com/projects/flat-data
Hmmm, so they gave it a name.
I've built so many projects using this approach. I started building such data repos using CI/CD services way before github actions was born. Of course github actions made it much easier.
One of them is the EU covid data tracking project ( https://github.com/covid19-eu-zh/covid19-eu-data ). It's been running for more than a year with very little maintenance. Some covid projects even copied our EU covid data tracking setup.
I actually built a system (https://dataherb.github.io) to pull such github actions based data scraping repos together.
https://octo.github.com/projects/flat-data
Hmmm, so they gave it a name.
I've built so many projects using this approach. I started building such data repos using CI/CD services way before github actions was born. Of course github actions made it much easier.
One of them is the EU covid data tracking project ( https://github.com/covid19-eu-zh/covid19-eu-data ). It's been running for more than a year with very little maintenance. Some covid projects even copied our EU covid data tracking setup.
I actually built a system (https://dataherb.github.io) to pull such github actions based data scraping repos together.
GitHub Next
GitHub Next | Flat Data
GitHub Next Project: Flat explores how to make it easy to work with data in git and GitHub, offering a simple pattern for bringing working datasets into your repositories and versioning them.
#DS
This paper serves as a good introduction to the declarative data analytics tools.
Declarative analytics performs data analysis using a declarative syntax instead of functions for specific algorithms. Using declarative syntax, one can “describe what you want the program to achieve rather than how to achieve it”.
To be declarative, the declarative language has to be specific on the tasks. With this, we can only turn the knobs of some predefined model. To me, this is a deal-breaker.
Anyways, this paper is still a good read.
Makrynioti N, Vassalos V. Declarative Data Analytics: A Survey. IEEE Trans Knowl Data Eng. 2021;33: 2392–2411. doi:10.1109/TKDE.2019.2958084
http://dx.doi.org/10.1109/TKDE.2019.2958084
This paper serves as a good introduction to the declarative data analytics tools.
Declarative analytics performs data analysis using a declarative syntax instead of functions for specific algorithms. Using declarative syntax, one can “describe what you want the program to achieve rather than how to achieve it”.
To be declarative, the declarative language has to be specific on the tasks. With this, we can only turn the knobs of some predefined model. To me, this is a deal-breaker.
Anyways, this paper is still a good read.
Makrynioti N, Vassalos V. Declarative Data Analytics: A Survey. IEEE Trans Knowl Data Eng. 2021;33: 2392–2411. doi:10.1109/TKDE.2019.2958084
http://dx.doi.org/10.1109/TKDE.2019.2958084
ieeexplore.ieee.org
Declarative Data Analytics: A Survey
The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in…
#fun
Germany, birthplace of the automobile, just gave the green light to robotaxis
https://fortune-com.cdn.ampproject.org/c/s/fortune.com/2021/05/28/germany-automobile-legalize-robotaxi-autonomous-vehicle/amp/
Germany, birthplace of the automobile, just gave the green light to robotaxis
https://fortune-com.cdn.ampproject.org/c/s/fortune.com/2021/05/28/germany-automobile-legalize-robotaxi-autonomous-vehicle/amp/
#ML
The Bayesian hierarchical model provides a process to use Bayesian inference hierarchically to update the posteriors.
What is a Bayesian model? In a Bayesian linear regression problem, we can take the posterior from the previous data points and use it as our new prior for inferring based on new data. In other words, as more data coming in, our belief is being updated.
However, this is a problem if some clusters in the dataset have small sample sizes, aka small support. As we take these samples and fit them onto the model, we may get a huge credible interval.
One simple idea to mitigate this problem is to introduce some constraints on how the priors can change. For example, we can introduce a hyperprior that is parametrized by new parameters. Then the model becomes hierarchical since we will also have to model the new parameters.
The referenced post, "Bayesian Hierarchical Modeling at Scale", provides some examples of coding such models using numpyro with performance in mind.
https://florianwilhelm.info/2020/10/bayesian_hierarchical_modelling_at_scale/
The Bayesian hierarchical model provides a process to use Bayesian inference hierarchically to update the posteriors.
What is a Bayesian model? In a Bayesian linear regression problem, we can take the posterior from the previous data points and use it as our new prior for inferring based on new data. In other words, as more data coming in, our belief is being updated.
However, this is a problem if some clusters in the dataset have small sample sizes, aka small support. As we take these samples and fit them onto the model, we may get a huge credible interval.
One simple idea to mitigate this problem is to introduce some constraints on how the priors can change. For example, we can introduce a hyperprior that is parametrized by new parameters. Then the model becomes hierarchical since we will also have to model the new parameters.
The referenced post, "Bayesian Hierarchical Modeling at Scale", provides some examples of coding such models using numpyro with performance in mind.
https://florianwilhelm.info/2020/10/bayesian_hierarchical_modelling_at_scale/
Florian Wilhelm's blog
Finally! Bayesian Hierarchical Modelling at Scale
For a long time, Bayesian Hierarchical Modelling has been a very powerful tool that sadly could not be applied often due to its high computations costs. With NumPyro and the latest advances in high-performance computations in Python, Bayesian Hierarchical…
This media is not supported in your browser
VIEW IN TELEGRAM
#DS
A library for interactive visualization directly from pandas.
https://github.com/santosjorge/cufflinks
A library for interactive visualization directly from pandas.
https://github.com/santosjorge/cufflinks
#ML
Geometric Deep Learning is an attempt to unify deep learning using geometry. Instead of building deep neural networks ignoring the symmetries in the data and leaving it to be discovered by the network, we apply the symmetries in the problem to the network. For example, instead of flattening the matrix of a cat image and have some predetermined order of the pixels, we apply a translational transformation on the 2D image and the cat should also be a cat without any doubt. This transformation can be enforced in the network.
BTW, If you come from a physics background, it is most likely that you have heard about the symmetries in physical theories like Noether's theorem. In the history of physics, there was an era of many theories yet most of them are connected or even unified under the umbrella of geometry. Geometric deep learning is another "benevolent propaganda" based on a similar idea.
References:
1. Bronstein, Michael. “ICLR 2021 Keynote - ‘Geometric Deep Learning: The Erlangen Programme of ML’ - M Bronstein.” Video. YouTube, June 8, 2021. https://www.youtube.com/watch?v=w6Pw4MOzMuo.
2. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond Euclidean data. arXiv [cs.CV]. 2016. Available: http://arxiv.org/abs/1611.08097
3. Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2104.13478
Geometric Deep Learning is an attempt to unify deep learning using geometry. Instead of building deep neural networks ignoring the symmetries in the data and leaving it to be discovered by the network, we apply the symmetries in the problem to the network. For example, instead of flattening the matrix of a cat image and have some predetermined order of the pixels, we apply a translational transformation on the 2D image and the cat should also be a cat without any doubt. This transformation can be enforced in the network.
BTW, If you come from a physics background, it is most likely that you have heard about the symmetries in physical theories like Noether's theorem. In the history of physics, there was an era of many theories yet most of them are connected or even unified under the umbrella of geometry. Geometric deep learning is another "benevolent propaganda" based on a similar idea.
References:
1. Bronstein, Michael. “ICLR 2021 Keynote - ‘Geometric Deep Learning: The Erlangen Programme of ML’ - M Bronstein.” Video. YouTube, June 8, 2021. https://www.youtube.com/watch?v=w6Pw4MOzMuo.
2. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond Euclidean data. arXiv [cs.CV]. 2016. Available: http://arxiv.org/abs/1611.08097
3. Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2104.13478
YouTube
ICLR 2021 Keynote - "Geometric Deep Learning: The Erlangen Programme of ML" - M Bronstein
Geometric Deep Learning: The Erlangen Programme of ML - ICLR 2021 Keynote by Michael Bronstein (Imperial College London / IDSIA / Twitter)
“Symmetry, as wide or as narrow as you may define its meaning, is one idea by which man through the ages has tried…
“Symmetry, as wide or as narrow as you may define its meaning, is one idea by which man through the ages has tried…
#ML
A Turing lecture article by the three famous DL guys.
It's an overview of the history, development, and future of AI. There are two very interesting points in the outlook section:
- "From homogeneous layers to groups of neurons that represent entities." In biological brains, there are memory engrams and motifs that almost do this.
- "Multiple time scales of adaption." This is another key idea that has been discussed numerous times. One of the craziest things about our brain is the diversity of time scales of plasticity, i.e., different mechanisms change the brain on different time scales.
Reference:
Bengio Y, Lecun Y, Hinton G. Deep learning for AI. Commun ACM. 2021;64: 58–65. doi:10.1145/3448250
https://dl.acm.org/doi/10.1145/3448250
A Turing lecture article by the three famous DL guys.
It's an overview of the history, development, and future of AI. There are two very interesting points in the outlook section:
- "From homogeneous layers to groups of neurons that represent entities." In biological brains, there are memory engrams and motifs that almost do this.
- "Multiple time scales of adaption." This is another key idea that has been discussed numerous times. One of the craziest things about our brain is the diversity of time scales of plasticity, i.e., different mechanisms change the brain on different time scales.
Reference:
Bengio Y, Lecun Y, Hinton G. Deep learning for AI. Commun ACM. 2021;64: 58–65. doi:10.1145/3448250
https://dl.acm.org/doi/10.1145/3448250
Communications of the ACM
Deep learning for AI | Communications of the ACM
How can neural networks learn the rich internal representations required for difficult
tasks such as recognizing objects or understanding language?
tasks such as recognizing objects or understanding language?
#fun
GitHub Copilot · Your AI pair programmer
https://copilot.github.com/
This is crazy.
What is GitHub Copilot? GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. The GitHub Copilot technical preview is available as a Visual Studio Code extension.
How good is GitHub Copilot? We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it’s getting smarter all the time.
GitHub Copilot · Your AI pair programmer
https://copilot.github.com/
This is crazy.
What is GitHub Copilot? GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. The GitHub Copilot technical preview is available as a Visual Studio Code extension.
How good is GitHub Copilot? We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it’s getting smarter all the time.
GitHub
GitHub Copilot · Your AI pair programmer
GitHub Copilot works alongside you directly in your editor, suggesting whole lines or entire functions for you.