Am Neumarkt 😱 – Telegram

Am Neumarkt 😱

289 subscribers

90 photos

3 videos

18 files

542 links

Machine learning and other gibberish
Archives: https://datumorphism.leima.is/amneumarkt/

Download Telegram

About

Blog

Apps

Platform

Am Neumarkt 😱

289 subscribers

Am Neumarkt 😱

#DS

https://octo.github.com/projects/flat-data

Hmmm, so they gave it a name.
I've built so many projects using this approach. I started building such data repos using CI/CD services way before github actions was born. Of course github actions made it much easier.
One of them is the EU covid data tracking project ( https://github.com/covid19-eu-zh/covid19-eu-data ). It's been running for more than a year with very little maintenance. Some covid projects even copied our EU covid data tracking setup.

I actually built a system (https://dataherb.github.io) to pull such github actions based data scraping repos together.

GitHub Next | Flat Data

GitHub Next Project: Flat explores how to make it easy to work with data in git and GitHub, offering a simple pattern for bringing working datasets into your repositories and versioning them.

205 viewsMarkt Mai, edited 05:13

Am Neumarkt 😱

#DS

This paper serves as a good introduction to the declarative data analytics tools.

Declarative analytics performs data analysis using a declarative syntax instead of functions for specific algorithms. Using declarative syntax, one can “describe what you want the program to achieve rather than how to achieve it”.
To be declarative, the declarative language has to be specific on the tasks. With this, we can only turn the knobs of some predefined model. To me, this is a deal-breaker.

Anyways, this paper is still a good read.

Makrynioti N, Vassalos V. Declarative Data Analytics: A Survey. IEEE Trans Knowl Data Eng. 2021;33: 2392–2411. doi:10.1109/TKDE.2019.2958084
http://dx.doi.org/10.1109/TKDE.2019.2958084

ieeexplore.ieee.org

Declarative Data Analytics: A Survey

The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in…

248 viewsMarkt Mai, 07:33

Am Neumarkt 😱

#fun

Germany, birthplace of the automobile, just gave the green light to robotaxis

https://fortune-com.cdn.ampproject.org/c/s/fortune.com/2021/05/28/germany-automobile-legalize-robotaxi-autonomous-vehicle/amp/

220 viewsMarkt Mai, edited 07:42

Am Neumarkt 😱

#academia
https://www.nature.com/articles/d41586-021-01468-z

Academic bullying: mediatiors hear both sides

Letter to the Editor

222 viewsMarkt Mai, edited 10:30

Am Neumarkt 😱

#ML

The Bayesian hierarchical model provides a process to use Bayesian inference hierarchically to update the posteriors.
What is a Bayesian model? In a Bayesian linear regression problem, we can take the posterior from the previous data points and use it as our new prior for inferring based on new data. In other words, as more data coming in, our belief is being updated.
However, this is a problem if some clusters in the dataset have small sample sizes, aka small support. As we take these samples and fit them onto the model, we may get a huge credible interval.
One simple idea to mitigate this problem is to introduce some constraints on how the priors can change. For example, we can introduce a hyperprior that is parametrized by new parameters. Then the model becomes hierarchical since we will also have to model the new parameters.

The referenced post, "Bayesian Hierarchical Modeling at Scale", provides some examples of coding such models using numpyro with performance in mind.

https://florianwilhelm.info/2020/10/bayesian_hierarchical_modelling_at_scale/

Florian Wilhelm's blog

Finally! Bayesian Hierarchical Modelling at Scale

For a long time, Bayesian Hierarchical Modelling has been a very powerful tool that sadly could not be applied often due to its high computations costs. With NumPyro and the latest advances in high-performance computations in Python, Bayesian Hierarchical…

204 viewsMarkt Mai, edited 10:28

Am Neumarkt 😱

This media is not supported in your browser

VIEW IN TELEGRAM

#DS

A library for interactive visualization directly from pandas.

https://github.com/santosjorge/cufflinks

190 viewsMarkt Mai, 21:23

Am Neumarkt 😱

#ML

Geometric Deep Learning is an attempt to unify deep learning using geometry. Instead of building deep neural networks ignoring the symmetries in the data and leaving it to be discovered by the network, we apply the symmetries in the problem to the network. For example, instead of flattening the matrix of a cat image and have some predetermined order of the pixels, we apply a translational transformation on the 2D image and the cat should also be a cat without any doubt. This transformation can be enforced in the network.

BTW, If you come from a physics background, it is most likely that you have heard about the symmetries in physical theories like Noether's theorem. In the history of physics, there was an era of many theories yet most of them are connected or even unified under the umbrella of geometry. Geometric deep learning is another "benevolent propaganda" based on a similar idea.

References:

1. Bronstein, Michael. “ICLR 2021 Keynote - ‘Geometric Deep Learning: The Erlangen Programme of ML’ - M Bronstein.” Video. YouTube, June 8, 2021. https://www.youtube.com/watch?v=w6Pw4MOzMuo.
2. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond Euclidean data. arXiv [cs.CV]. 2016. Available: http://arxiv.org/abs/1611.08097
3. Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2104.13478

ICLR 2021 Keynote - "Geometric Deep Learning: The Erlangen Programme of ML" - M Bronstein

Geometric Deep Learning: The Erlangen Programme of ML - ICLR 2021 Keynote by Michael Bronstein (Imperial College London / IDSIA / Twitter)

“Symmetry, as wide or as narrow as you may define its meaning, is one idea by which man through the ages has tried…

210 viewsMarkt Mai, edited 08:25

Am Neumarkt 😱

#ML

A Turing lecture article by the three famous DL guys.
It's an overview of the history, development, and future of AI. There are two very interesting points in the outlook section:
- "From homogeneous layers to groups of neurons that represent entities." In biological brains, there are memory engrams and motifs that almost do this.
- "Multiple time scales of adaption." This is another key idea that has been discussed numerous times. One of the craziest things about our brain is the diversity of time scales of plasticity, i.e., different mechanisms change the brain on different time scales.

Reference:
Bengio Y, Lecun Y, Hinton G. Deep learning for AI. Commun ACM. 2021;64: 58–65. doi:10.1145/3448250
https://dl.acm.org/doi/10.1145/3448250

Communications of the ACM

Deep learning for AI | Communications of the ACM

How can neural networks learn the rich internal representations required for difficult
tasks such as recognizing objects or understanding language?

172 viewsMarkt Mai, edited 10:58

Am Neumarkt 😱

#fun

GitHub Copilot · Your AI pair programmer
https://copilot.github.com/

This is crazy.

What is GitHub Copilot? GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. The GitHub Copilot technical preview is available as a Visual Studio Code extension.

How good is GitHub Copilot? We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it’s getting smarter all the time.

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

191 viewsMarkt Mai, edited 22:50

Am Neumarkt 😱

#ML

Great. Tensorflow implemented built-in decision forest models.

https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.html?m=1

blog.tensorflow.org

Introducing TensorFlow Decision Forests

TensorFlow Decision Forests is a collection of Decision Forest algorithms for classification, regression and ranking tasks, with the flexibility and c

200 viewsMarkt Mai, 14:02

Am Neumarkt 😱

#Academia

The distill team's thought on interactive publishing and self-publishing in academia.

https://distill.pub/2021/distill-hiatus/

After five years, Distill will be taking a break.

207 viewsMarkt Mai, edited 08:37

Am Neumarkt 😱

#TIL

In PyTorch, conversion from Torch tensors to numpy arrays is very fast on CPUs, though torch tensors and numpy arrays are very different things. This is because of the Python buffer protocol. The protocol makes it possible to use binary data directly from C without copying the object.

https://docs.python.org/3/c-api/buffer.htm

Reference:
Eli Stevens Luca Antiga. Deep Learning with PyTorch: Build, Train, and Tune Neural Networks Using Python Tools. Simon and Schuster, 2020;

261 viewsMarkt Mai, edited 11:15

Am Neumarkt 😱

#ML

Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning? – Off the convex path
http://www.offconvex.org/2021/07/08/imp-reg-tf/

Off the convex path

Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning?

Algorithms off the convex path.

246 viewsMarkt Mai, edited 10:33

Am Neumarkt 😱

#Coding

I found a nice place to practice programming thinking. It is not as comprehensive as hackerrank/leetcode but these problems are quite fun.

https://codingcompetitions.withgoogle.com/

284 viewsMarkt Mai, 10:41

Am Neumarkt 😱

#DS

PyData goes virtual this year.

https://pydata.org/global2021/present/

PyData Global 2021

Present | PyData Global 2021

270 viewsMarkt Mai, 08:15

Am Neumarkt 😱

#ML

Julia Computing got a lot of investment recently. I need to dive deeper into the Julia Language.

https://juliacomputing.com/blog/2021/07/series-a/

220 viewsMarkt Mai, edited 11:00

Am Neumarkt 😱

#DS

This is an interesting report by anaconda. We can kind of confirm from this that Python is still the king of languages for data science. SQL is right following Python.

Quote from the report:
> Between March 2020 to February 2021, the pandemic economic period, we saw 4.6 billion package downloads, a 48% increase from the previous year.
We have no data for other languages so no predictions can be made but it is interesting to see Python growing so fast.

The roadblocks different data professionals facing are quite different. If the professional is a cloud engineer or mlops, then they do not mention that skills gap in the organization that many times. But for data scientists/analysts, skills gaps (e.g., data engineering, docker, k8s) is mentioned a lot. This might be related to the cases when the organization doesn't even have cloud engineers/ops or mlops.

See the next message for the PDF file.

https://www.anaconda.com/state-of-data-science-2021

Anaconda | State of Data Science 2021

Anaconda is the birthplace of Python data science. We are a movement of data scientists, data-driven enterprises, and open source communities.

238 viewsMarkt Mai, edited 21:38

Am Neumarkt 😱

Anaconda-2021-SODS-Report-Final.pdf

I have downloaded the file so you don't need to.

Anaconda-2021-SODS-Report-Final.pdf

259 viewsMarkt Mai, edited 21:39

Am Neumarkt 😱

https://github.com/soumith/ganhacks

Training GAN can be baffling.
For example, the generator and the discriminator just don't "learn" at the same scale sometimes. Would you try to balance the generator loss and discriminator loss by hand?
Soumith Chintala ( @ FAIR ) put together this list of tips for training GAN. "Don't balance loss via statistics" is one of the 17 tips by Chintala. The list is quite inspiring.

GitHub - soumith/ganhacks: starter from "How to Train a GAN?" at NIPS2016

starter from "How to Train a GAN?" at NIPS2016. Contribute to soumith/ganhacks development by creating an account on GitHub.

238 viewsMarkt Mai, edited 11:40

Am Neumarkt 😱

#ML

https://thegradient.pub/systems-for-machine-learning/

challenges in data collection, verification, and serving tasks

Systems for Machine Learning

On the field of Machine Learning Systems and how it addresses the new challenges of ML with a lens shaped by traditional systems research

225 viewsMarkt Mai, edited 08:51

Am Neumarkt 😱

#science

Nielsen M. Reinventing discovery: The New Era of networked science. Princeton, NJ: Princeton University Press; 2011.

I found this book this morning and skimmed through it. It looks concise yet unique.
The author discusses how the internet is changing the way human beings think as one collective intelligence. I like the chapters about how the data web is enabling more scientific discoveries.

197 viewsMarkt Mai, edited 11:08