Am Neumarkt 😱
286 subscribers
88 photos
3 videos
17 files
513 links
Machine learning and other gibberish
Archives: https://datumorphism.leima.is/amneumarkt/
Download Telegram
#smarthome #misc

I have, somehow, 5 different brands of smart home products in our little apartment.
I have no idea what is going on in the smart home industry. Every brand has its own app, hub, or even protocal. So I had to install five different apps to initialize the devices. I could, in principle, ditch these apps and use google/alexa only after I installed them, however, this is still extremely inconvenient as google/alexa doesn’t support all the fancy functions of the devices.

Any solutions to this problem?
#fun

Every chemistry graduate will be in charge of a molecule. Someone got to take care of “Titin” (189,819 characters), and she/he will have to recite the name first in every meeting: https://en.wiktionary.org/wiki/Appendix:Protologisms/Long_words/Titin#Noun

https://xkcd.com/2602/
#jobs

I vaguely feel there's a talent shortage in Germany. "Hiring is hard". I heard this several times. Our team also need more hires.

So the company came up with this: Land a job at Zalando within 3 days after the final interviews! 

https://jobs.zalando.com/en/jobs/4004181/?gh_src=%20f46af3281us
Which language(s) do you prefer for reading?
Anonymous Poll
73%
English
46%
中文
#ml #statistics

I read about conformal prediction a while ago and realized that I need to understand more about the hypothesis testing theories. As someone from natural science, I mostly work within the Neyman-Pearson ideas.
So I explored it a bit and found two nice papers. See the list below. If you have other papers on similar topics, I would appreciate some comments.

1. Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6: 223. doi:10.3389/fpsyg.2015.00223 https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00223/full
2. Lehmann EL. The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? J Am Stat Assoc. 1993;88: 1242–1249. doi:10.2307/2291263
#work

I realized something interesting about time management.

If I open my calendar now, I see these “tiles” of meetings filling up most of my working hours. It looks bad, but it was even worse in the past. The thing is, if I do meetings during my working hours, I will have to work extra hours to do some thinking and analysis. It is rather cruel.

So what changed? I think I realized the power of Google Docs. Instead of many people talking and nobody listening, someone should write up a draft first and send it out to the colleagues. Then, once people get the link to the docs, everyone can add comments.

This doesn’t seem to be very different from meetings. Oh, it is very different. The workflow can be async. We are not forced to use our precious focus time to attend meetings. We can read and comment on the document whenever we like: when we are commuting, when we are taking a dump, when we are on a phone/tablet, just, any, time.

Apart from the async workflow, I also like the "think, comment and forget" idea. I feel people deliver better ideas when we think first, comment next, and forget about it unless there are replies to our comments. No pressure, no useless debates.
#ml

I heard about information bottleneck so many times but didn't really go back and read the original papers.

I spent some time on it and I found it quite interesting. It is philosophically based on what was described in Vapnik's The Nature of Statistical Learning, where he discussed how generalizations work by enforcing parsimony.
Here in this information bottleneck paper, the most interesting thing is the quantified generalization gap and complexity gap. With these, we know where to go on the information plane.

It's a good read.

Tishby N, Zaslavsky N. Deep Learning and the Information Bottleneck Principle. arXiv [cs.LG]. 2015. Available: http://arxiv.org/abs/1503.02406,
#ml

Came across this post this morning. I realized the reason I am not writing a lot in Julia is simply because I don't know how to write quality code in Julia.

When we build a model in Python, we know all these details about making it quality code. For a new language, I'm just terrified by the amount of details I need to be aware of.

Ah I'm getting older.

JAX vs Julia (vs PyTorch) · Patrick Kidger
https://kidger.site/thoughts/jax-vs-julia/
#ml

https://ts.gluon.ai/

Highly recommended! If you are working on deep learning for forecasting, gluonts is a great package.
It simplifies all these tedious data preprocessing, slicing, backrest stuff. We can then spend time on implementing the models themselves (there're a lot of ready-to-use models).
What's even better, we can use pytorch lightning!

See this repository for a list of transformer based forecasting models.
https://github.com/kashif/pytorch-transformer-ts
#python

This post is a retro on how I learned Python.

Disclaimer: I can not claim that I am a master of Python. This post is a retrospective of how I learned Python in different stages.

I started using Python back in 2012. Before this, I was mostly a Matlab/C user.

Python is easy to get started, yet it is hard to master. People coming from other languages can easily make it work but will write some "disgusting" python code. And this is because Python people talk about "pythonic" all the time. Instead of being an actual style guide, it is rather a philosophy of styles.

When we get started, we are most likely not interested in [PEP8](https://peps.python.org/pep-0008/) and [PEP257](https://peps.python.org/pep-0257/). Instead, we focus on making things work. After some lectures from the university (or whatever sources), we started to get some sense of styles. Following these lectures, people will probably write code and use Python in some projects. Then we began to realize that Python is strange, sometimes even doesn't make sense. Then we started leaning about the philosophy behind it. At some point, we will get some peer reviews and probably fight against each other on some philosophies we accumulated throughout the years.

The attached drawing (in comments) somehow captures this path that I went through. It is not a monotonic path of any sort. This path is most likely to be permutation invariant and cyclic. But the bottom line is that mastering Python requires a lot of struggle, fights, and relearning. And one of the most effective methods is peer review, just as in any other learning task in our life.

Peer review makes us think, and it is very important to find some good reviewers. Don't just stay in a silo and admire our own code. To me, the whole journey helped me building one of the most important philosophies of my life: embrace open source and collaborate.