Am Neumarkt 😱
286 subscribers
88 photos
3 videos
17 files
513 links
Machine learning and other gibberish
Archives: https://datumorphism.leima.is/amneumarkt/
Download Telegram
#visualization

Beautiful, elegant, and informative. It reminds me of the Netflix movie chromatic storytelling visualization.

Full image:
https://zenodo.org/record/5828349

Other discussions:
https://www.reddit.com/r/dataisbeautiful/comments/s6vh8k/dutch_astronomer_cees_bassa_took_a_photo_of_the/
#visualization

Seaborn is getting a new interface.

Would be great if the author defines a dunder method _ _ add _ _ () instead of using .add() method. Using dunder add, we can simply use + on layers.

Nevertheless, we can all move away from plotnine when the migration is done.

https://seaborn.pydata.org/nextgen/
Forwarded from DPS Main
确实有很多,比如我用 ack 替代了 grep,速度快了不少。

https://www.ruanyifeng.com/blog/2022/01/cli-alternative-tools.html
graph-basics.pdf
3.3 MB
#ML

I made some slides to bootstrap a community in my company to share papers on graph related methods (spectral, graph neural networks, etc).
These slides are mostly based on the first two chapters of the book by William Hamilton. I added some intuitive interpretations on some key ideas. Some of these are frequently used in graph neural networks even transformers. Building intuitions helps us unboxing these neural networks. But the slides are only skeleton notes so I probably have to expand them at some point.

I am thinking about drawing more about the book and on this topic. Maybe even making some short videos using these slides. Let's see how far I can go. I am way too busy now. (<-no excuse)
#tool

I have been using Hugo for my public notes. I built a theme called connectome a while ago. This theme has been serving as my note-taking theme.

When building my notes website on data science, I have noticed many problems with the connectome theme. And today, I fixed most of the problems. The connectome theme deserves some visibility now.

If you are using Hugo and would like to build a website for connected notes, like this one I have https://datumorphism.leima.is/ , the Hugo connectome theme can help a bit.

The Connectome Theme: https://github.com/kausalflow/connectome
A template one could use to bootstrap a new website: https://github.com/kausalflow/hugo-connectome-theme-demo
Tutorials: https://hugo-connectome.kausalflow.com/projects/tutorials/
Real-world example: https://datumorphism.leima.is/


If you would like to know more about how it was done, the idea is quite simple. Before we move on, one FAQ I got is, why Hugo. The answer is simple, speed.

The key components of the connectome theme are:

- automated backlinks, and
- a graph visualization of the whole notebook.

Behind the scene, the heart of the theme is a metadata file that describes the connections between the notes.

For each note, we use the metadata to get all the notes that links to the current note, and build backlinks based on the metadata.
#python

I find poetry a great tool to manage Python requirements.

I used to manage Python requirements using requirements.txt(environment.yaml) and install them using pip(conda). The thing is, in this stack, we have to pin the version ranges manually. It is quite tedious, and we easily run into version problems for a large project.

Poetry is the savior here. When developing a package, we add some initial dependencies to the pyproject.yaml, a PEP standard. Whenever a new package is needed, we run poetry add package-name. Poetry tries to figure out the compatible versions. A lock file for the dependencies with restricted versions will be created or updated. To recreate an identical python environment, we only need to run poetry install.

There's one drawback and may be quite painful at some point. Recreating the lock file for dependencies is extremely slow when the complexity grows in the requirements. But this is not a problem if poetry but rather constraints from pypi. One solution to this problem is to use cache.

https://python-poetry.org/
#ml

I share similar thoughts with the top comment by theXYZT.

If I may add to her comment, I would say:
Embrace the new approach even if it shatters our philosophy.
But it's not only about what happened in the history of physics. It's about what we believe in science.
In some sense, the purpose of interpretability and parsimony is for human to come up with better ideas and making us happy. If a universal model is working well enough and can be improved gradually already, interpretability is not as important as predictability.
This is more or less the first principle of science, if I may say so.

https://www.reddit.com/r/MachineLearning/comments/t8fn7m/d_are_we_at_the_end_of_an_era_where_ml_could_be/
#visualization

Please click on the link and watch the animation. It's 3D.

------

"The clever people at @NASA have created this deceptively simple yet highly effective data visualisation showing monthly global temperatures between 1880-2021".: nextfuckinglevel
https://www.reddit.com/r/nextfuckinglevel/comments/tejc0l/the_clever_people_at_nasa_have_created_this/?utm_source=share&utm_medium=ios_app&utm_name=iossmf
#ml

It’s a lengthy article but also a well written one.

A few comments:

- The author wrote a paper on “The Next Decade in AI”: https://arxiv.org/abs/2002.06177
- Make things work in their own domain. If we are gonna come up with a “theory of everything” for computing or intelligence, we will hit the “mesoscopic” wall, where the bottom up theories and the top down approaches meet but we can’t really make a connection. In the case of intelligence, the wall is determined by the complexities (maybe MDL?). You can make symbols work for high complexities but not always. Similar thing happens to neural networks.
- The neural symbolic approach sounds good but it’s almost like patching a bike as wheels of a train.


https://nautil.us/deep-learning-is-hitting-a-wall-14467/
#ml

(WARNING: Promoting of my notes. This is a test.)

I learned something very interesting today: CRPS.

Suppose we would like to approximate the quantile function of some data points.
If we assume a parametric model of the quantile function, e.g., Q(x|theta), how do we find the parameters using the given dataset?
Naturally, we need a loss function to compare our quantile function to the datapoints. CRPS is a robust choice. I have seen it being used in several papers in time series forecasting.

You can find more details here:
https://datumorphism.leima.is/cards/time-series/crps/
#tool

I drafted a new release of the Hugo Connectome theme.

I like the command palette in VSCode. It is fast and accurate. So I added a command palette to the Hugo Connectome theme to help us navigate the notes and links.

Now we can use the command palette to navigate to backlinks, out links, references, and more.

See it in action:
https://datumorphism.leima.is/wiki/time-series/state-space-models/
Use Command+K or Windows+K to activate the command palette.

- Type in search to search for notes.
- Type in Note ID to copy the current note id to the clipboard.
- Type in graph to see the graph view of all the notes.
- Type in references to go to references.
- Type in backlinks to select from backlinks to navigate to.
- Type in links to select from all outgoing links to navigate to.

Release:
https://github.com/kausalflow/connectome/releases/tag/0.1.1
#ml

Beautiful and systematic derivation showing how and why negative sampling works

Negative sampling is a great technique to estimate the softmax especially when the calculation of the partition function is intractable. It's used in word2vec, and many other models such as node2vec.



Goldberg Y, Levy O. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv [cs.CL]. 2014. Available: http://arxiv.org/abs/1402.3722