L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵ – Telegram

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

@iamluminousmen

504 subscribers

156 photos

32 videos

2 files

700 links

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ ✧ﾟ･: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0

Download Telegram

About

Blog

Apps

Platform

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

504 subscribers

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

There is a common misconception that GIL was invented to protect developers from problems with concurrent access to data. But this is not true.

GIL, of course, will prevent you from parallelizing an application using threads (but not processes). Simply put, GIL is a lock that must be taken before any access to Python (not that important if Python code is executing or calls using Python C API). Therefore, GIL will protect internal structures from non-consistent states, but you will have to use synchronization primitives like in any other language.

#python

220 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The popular method to declare an abstract method in Python is to use NotImplentedError exception:

def func(self):
    raise NotImplementedError

Though it's pretty popular and even has IDE support (Pycharm considers such a method to be abstract) this approach has a downside. You get the error only upon method call, not upon class instantiation.

Use abc module to avoid this problem:

from abc import ABCMeta, abstractmethod
class Service(metaclass=ABCMeta):
    @abstractmethod
    def func(self):
        pass

242 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

In Python if there are no references to the object, it is destroyed immediately, instead of waiting for garbage collection. GC is needed for complicated cases when we have cyclic references.

#python

234 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The list may contain itself. Python detects this and does not loop in the output.

>>> a = []
>>> a.append(a)
>>> a
[[...]]

234 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Python doesn't have tail recursion optimization, not because Guido couldn't handle it, but because he doesn't want to overcomplicate things.

If you really want to, you can implement a Y-combinator with optimization and use it.

#python

223 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Just like tail recursion in Python, Guido wanted to avoid getting lambda functions in the language. The lambda function is just sort of syntactic sugar. It creates an unnamed function - that's literally it. There is no magic to it.

Those two are equivalent:

foo = lambda x: x * 2

def foo(x): 
   return x * 2 
foo.__qualname__ = '<lambda>'

We all know the lambda functions were introduced in the language, but there was a long fight behind it, which Guido did not win. Lambda still exists because it is extremely convenient and nothing else.

#python

254 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

A function can have two returns called. For example:

def foo():
    try:
        return 1
    finally:
        return 2

2 will be returned here.

#python

253 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

There are compilers for Python code. And not only JIT ones like Numba, but also ordinary ones. For example, Cython, Nuitka that compiles your Python code into true machine instructions rather than interpreted.

Why?

Basically for the sake of both performance gains and a more portable runtime. So basically it's for the guys who shouldn't use Python in the first place.

#python

251 viewsedited 13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Remember I told you about numbers from -5 to 255 interned. That is, the numbers are preliminary placed and cached in memory. So, there is more to it.

You can access this memory if you want and change value. Say, change literal 4 to value 5. I'll probably be damned by many people for this post, but here is a sample code:

>>> import ctypes
>>> ctypes.memmove(id(4) + 24, id(5) + 24, 8)
>>> print(2 * 2) # 5

Have fun debugging!

P.S. Try to replace 0 with 1

#python

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Perhaps not every Python developer knows the interesting property of the CPython that makes newcomers go crazy:

>>> a = 255
>>> b = 255
>>> a == b
True
>>> a is b
True

The double equal operator checks that the objects values are equal and the is operator…

302 viewsedited 13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

295 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Let's talk about a couple of core concepts in data space and how they relate to each other and how they started.

⚡️ACID vs BASE: Comparison of two Design Philosophies

⚡️CAP and PACELC theorems in plain English

Blog | iamluminousmen

ACID vs BASE: Comparison of two Design Philosophies

Discover the differences between ACID and BASE design philosophies - from strong consistency to eventual consistency. Find out which suits your project better!

650 viewsedited 13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Google has made its AutoML algorithm public🤯

https://github.com/google/model_search

#ml

GitHub - google/model_search

Contribute to google/model_search development by creating an account on GitHub.

425 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Recent Advances in Language Model Fine-tuning

By Sebastian Ruder:

https://ruder.io/recent-advances-lm-fine-tuning/

Recent Advances in Language Model Fine-tuning

This post provides an overview of recent methods to fine-tune large pre-trained language models.

285 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Apache Software Foundation (ASF) has announced the retirement to its "Attic" of at least 19 open source projects, 13 of which are big data-related and ten of which are part of the Hadoop ecosystem.

https://www.zdnet.com/article/apache-software-foundation-retires-slew-of-hadoop-related-projects/

I've really never used any of these projects, but could this be a signal that the Hadoop era is coming to an end?

It's another proof that one shouldn't build one's career around a particular tool, but rely on concepts and understanding of the industry. Technology is a very volatile thing, especially in fast-paced industries like Big Data and Data Science. Especially when industry consolidation is still in process.

#big_data

Apache Software Foundation retires slew of Hadoop-related projects

Retirements of 13 big data-related Apache projects -- including Sentry, Tajo and Falcon -- have been announced in 11 days. It looks like the idealistic days of Hadoop and big data are officially over.

302 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Kafka Visualization

https://softwaremill.com/kafka-visualisation/

#big_data

SoftwareMill Kafka Visualization

Using the Kafka Visualization tool you can simulate how data flows through a replicated Kafka topic, to gain a better understanding of the message processing model.

239 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

To build this feature store, DoorDash benchmarked several key-value stores—Redis, Cassandra, CockroachDB, ScyllaDB, and YugabyteDB—before sticking with Redis. More details on their benchmarking process and optimizing Redis in their excellent write-up.

#ml

240 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Learn concepts not frameworks

In the world of engineering and data analytics, enthusiastic people often have strong opinions about using a particular platform. We've all come across such a person — someone who zealously promotes Apache Spark or pushes for all data work to be done using HQL. There's a strong emphasis on the tools and not so much on the problem-solution pair.

Sometimes this behavior is driven by a desire for standardization, aligning team skills, and simplification of the hiring process. Those are certainly valid considerations. But more often than not, I've seen people zealously defend a tool simply because they are passionate about it or, even worse, have built their professional careers around it.

I wish concepts were emphasized a bit more. In the real world, knowing Java doesn't make you a programmer.

Knowing C++ syntax doesn't make you a programmer.

Knowing concepts is what makes you a great programmer. Problem solving skills make you a great programmer.

Simply put, you should never build your professional career around a tool. Tools and applications come and go, and what's hot today may go bad tomorrow. That alone should give you a reason to pause before advocating for technology too quickly. Ask anyone who has worked with JavaScript.

The Silver Bullet Syndrome - Hadi Hariri

#whining

The Silver Bullet Syndrome - Hadi Hariri

We love our silver bullets don’t we? Constantly chasing the dream that the next big thing will solve all our past problems. It doesn’t matter if it’s a language, framework, platform or library, we’re out there chasing it. Why? Well because it’s going to solve…

361 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Caching... There is so much in that word - the pain of invalidation and the joy of reusing computation. In Spark, this is known as an optimization technique...

https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching

Blog | iamluminousmen

Explaining the mechanics of Spark caching

Caching... There is so much in that word - the pain of invalidation and the joy of reusing computation. In Spark, this is known as an optimization technique

580 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

A good conversation is like a miniskirt, short enough to retain interest but long enough to cover the subject — Celeste Headlee

When your job hinges on how well you talk to people, you learn a lot about how to have conversations -- and that most of us don't converse very well. I came across this Ted talk on how to have a better conversation.

Some of this may seem too obvious, but you have to admit there are some things here that you knew about, but you forgot to apply.

1. Don't do several things at once. Be in the moment, put your smartphone away.
2. You don't have to be important or clever. If you are not very interested in the opinion of others — blog as I do 😇
3. Ask open-ended questions (that don't involve a "yes" or "no" answer).
4. Swim with the flow. If a thought comes up during a conversation, let it go and listen to the person you're talking to.
5. If you don't know something, admit it. This will save you from embarrassment and preserve your authority.
6. There is no need to draw parallels with the experience of your interlocutors. You don't need to tell others what an asshole your boss is in response to your companion's complaints about his boss.
7. Don't repeat what you've said.
8. Avoid unnecessary details. Names, dates, and other details will quickly slip his mind.
9. Listen. You can't learn anything with your mouth open.
10. Be brief. Brevity is the sister of you-know-what.

#soft_skills

Celeste Headlee: 10 ways to have a better conversation | TED

When your job hinges on how well you talk to people, you learn a lot about how to have conversations -- and that most of us don't converse very well. Celeste Headlee has worked as a radio host for decades, and she knows the ingredients of a great conversation:…

320 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Great PySpark style guide from Palantir team:

https://github.com/palantir/pyspark-style-guide

#spark

GitHub - palantir/pyspark-style-guide: This is a guide to PySpark code style presenting common situations and the associated best…

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered. ...

274 viewsedited 13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Coming into the interview, engineers have the idea that HDFS and S3 are fundamentally the same system. Even if S3 clients make it seem like they are, they are not.

https://luminousmen.com/post/hdfs-vs-cloud-based-object-storage-s3

Blog | iamluminousmen

HDFS vs Cloud-based Object storage(S3)

I am very annoyed that all sorts of big data engineers confuse S3 and HDFS systems, assuming that S3 is the same as HDFS. That’s not true.

313 views16:19