L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
504 subscribers
156 photos
32 videos
2 files
700 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
Fuck it

FuckIt.py uses state-of-the-art technology to make sure your Python code runs whether it has any right to or not. Does some code have an error? Fuck it.

https://github.com/ajalt/fuckitpy

I want to draw your attention to the tests where you can see full proof that P ≠ NP.

#python
NumPy

This is an open-source library, once separated from the SciPy project. NumPy is based on the LAPAC library, which is written in Fortran. Fortran-based implementation makes NumPy a fast library. And by virtue of the fact that it supports vector operations with multidimensional arrays, it is extremely convenient.

The non-Python alternative for NumPy is Matlab.

Besides support for multidimensional arrays, NumPy includes a set of packages for solving specialized problems, for example:

▪️numpy.linalg - implements linear algebra operations;
▪️numpy.random - implements functions for dealing with random variables;
▪️numpy.fft - implements direct and inverse Fourier transform.

A guide to NumPy with many nice illustrations

#python
A little bit of Python. In Python exists no native empty-set-literal, because {} is already reserved for dictionaries. However, you can unpack an empty list and get a sort of empty-set-literal with Python >= 3.5 (see PEP 448):

 s = {*[]}  # or {*{}} or {*()}
>>> print(s)
set()

#python
PySpark documentation will follow numpydoc style. I do not see why — current Python docs for Spark always were ok. More readable than any of the java docs.

So this:

"""Specifies some hint on the current :class:DataFrame.

:param name: A name of the hint.
:param parameters: Optional parameters.
:return: :class:DataFrame


will be something like this:

"""Specifies some hint on the current :class:DataFrame.

:param name: A name of the hint.
:param parameters: Optional parameters.
:return: :class:DataFrame


will be something like this:

"""Specifies some hint on the current :class:DataFrame.

Parameters
----------
name : str
A name of the hint.
parameters : dict, optional
Optional parameters

Returns
-------
DataFrame


Probably it's gonna be more readable HTML and linking between pages. Will see.

#spark #python
What does the not operator do? It simply yields True if its argument is false, False otherwise. It turns out it's pretty hard to determine what true is.

When you look at the C implementation, the rule seems to be:
1. If True, then True;
2. If False, then False;
3. If None, then False;
4. Whatever __bool__ returns as long as it's a subclass of bool;
5. Calling len() on the object - True if greater than 0, otherwise False;
7. If none of the above applies, then True.

An in-depth article on the `not' operator in Python from the core developer

#python
PEP: 585

Started trying out the new release Python 3.9. I don't follow the features that much, but there are things that piss me off, like the implementation of static typing in Python.

Static typing has been built on top of the existing Python runtime incrementally over the time. As a consequence, collection hierarchies got duplicated, as an application could use the types from typing module at the same time as the built-in ones.

This created a bit of confusion, as we had two parallel type systems, not really competing with each other, but we always had to keep an eye out for that parallelism.

Well, now this is over.

Examples of types that previously had to be imported to use would be List, Dict, Set, Tuple, Optional. Right now, you can just import them as a general list or dict, set, tuple, optional, etc.

>>> issubclass(list, T.List)
True


These types can also be parameterized. A parameterized type is an example of a generic universal type with expected types for container elements of type list[str].

PEP 585

#python
PEP 584

Another news on Python 3.9 about merging dictionaries.

Python already had a few ways to merge two or more dictionaries. But there were always some issues:

dict1.update(dict2) This way you can merge only two dictionaries at once and this method requires a temporary variable to store the merged dictionary.
{**dict1, **dict2} – This unpacking method ignores the types of mappings. It fails for dict subclasses such as defaultdict that have an incompatible __init__ method
ChainMap(dict1, dict2) – Any changes to the ChainMap will modify the original dictionaries because the Chaimap variables are wrappers of the original dictionaries.

Now in Python 3.9 we have Dictionary Union Operator ( | ). Yep, all caps.

>>> a = {'GME': 20, 'AMC': 20, 'TSLA': 1001}
>>> b = {'GME': 400}
>>> c = {'GME': 60}
>>> a | b | c
{'GME': 60, 'AMC': 20, 'TSLA': 1001}
>>> a |= b
>>> a
{'GME': 400, 'AMC': 20, 'TSLA': 1001}

This example shows how the dictionary union operator obeys the order of the items in the dictionary. So whichever dictionary stands first, the dictionary items from it are pulled out and the second dictionary’s elements are appended to the first one.

PEP 584

#python
When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. But Alpine builds are vastly slower, the image is bigger.

I've faced the same issues with the custom Airflow/Python container as the author describing here, and as a solution, I switched to the slim docker images. Doing fine so far :)

#python
JetBrains published the results of its Python Developers Survey for 2020. After questioning over 28,000 Python developers and fans from nearly 200 countries/regions in October 2020, here are some of the notable insights:

⚡️ JavaScript is the most popular language for developers to use with Python, particularly for web developers. As the survey notes, with HTML/CSS, Bash/Shell, and SQL, "they create a stack of languages where 2 out of every 5 Python devs are using at least one of them."
⚡️ JavaScript and C/C++ are the most common main languages for those who use Python as a secondary language.
⚡️ 55% surveyed use Python for data analysis, making it the top use case, with 50% using it for web development.
⚡️ 94% use Python 3, with 6% using Python 2. 44% are using Python 3.8.
⚡️ Flask is the most popular web framework (46%), followed by Django at 43% and FastAPI at 12%.
⚡️ Most Pythonistas who use Flask prefer SQLAlchemy, while Django users use Django ORM.
⚡️ PostgreSQL is the most popular database amongst Python developers (45%).
⚡️ AWS is the top cloud platform (53%), followed by Google Cloud (33%).
⚡️ Linux is the most popular operating system (68%), followed by Windows (48%).

Check out the full report for more insights.

#python
Pattern matching

For a long time, the Python community has put forward various suggestions for the realization of specified multi-branch conditions in the Python language (similar to the switch statement in C/C++), but none of the proposals can be finally implemented. In the past year or so, the Python community discussed a proposal that might solve the multi-branch condition problem (or even more) and was adopted the pattern matching suggestions. The specific situation is to adopt PEP 634, 635, and 636. It will be a feature of new release Python 3.10.

Examples are as follows:

match command.split(): 
case ["quit"]:
print("Goodbye!")
quit_game()
case ["get", obj]:
character.get(obj, current_room)
case ["go", direction]:
current_room = current_room.neighbor(direction)
case _:
print(f"Sorry, I couldn't understand {command!r}")

For me it look like some Scala-like syntax. Which is not bad.

#python
​​Hey everybody,

Channel is growing so let me collect all the interesting posts from this channel that I think should get more attention:

#dev
⚡️Technological degradation
⚡️Define "production-ready"
⚡️Abstraction is not OOP

#python
⚡️Use pathlib
⚡️Pip constraints files
⚡️.pth files

#soft_skills
⚡️Your top skill
⚡️Ask stupid questions
⚡️Let robots yell at people
⚡️Soft skills thoughts

#big_data
⚡️S3 vs HDFS
⚡️Famous in-memory data format
⚡️Complexity in distributed systems
⚡️Snowflake

#ml
⚡️ML system basic framework
⚡️AutoML
⚡️MLOps
⚡️Testing and validation in ML

Bigger posts I'm sharing on my small website ✌️
​​Perhaps not every Python developer knows the interesting property of the CPython that makes newcomers go crazy:

>>> a = 255
>>> b = 255
>>> a == b
True
>>> a is b
True

The double equal operator checks that the objects values are equal and the is operator checks that the variables refer to the same object. a and b are different objects, so a is b will return False.

There is actually an optimization in Python regarding small integers (-5 to 256 inclusive). These objects are loaded into the interpreter's memory when the interpreter starts up. This results in a small internal cache. Because of this, the variables with the same values point to the same object and the result is True.

A similar example for number > 256 will work as expected:

>>> a = 257
>>> b = 257
>>> a == b
True
>>> a is b
False

#python
Btw this also work for strings:

>>> a = 'the'
>>> b = 'the'
>>> a == b
True
>>> a is b
True

As the strings are immutable it makes sense for the interpreter to store the string literal only once and point all the variables to the same object. It is called string interning. And it's kind of internal optimization technique for working with valid identifiers.

In Python 3.6, any string with length ≤ 20 will get interned. But Python 3.7 uses the AST optimizer and (most) strings up to 4096 characters are interned.

>>> a = 'the thing'
>>> b = 'the thing'
>>> a == b
True
>>> a is b
False

Wtf you may be wonder. This string is just not a valid identifier that's all. But we can make interning explicit:

>>> import sys
>>> b = sys.intern('the thing')
>>> a = sys.intern('the thing')
>>> a == b
True
>>> a is b
True

Why?

🔸 Saving memory
🔸 Fast comparisons
🔸 Fast dictionary lookups

#python
There is a common misconception that GIL was invented to protect developers from problems with concurrent access to data. But this is not true.

GIL, of course, will prevent you from parallelizing an application using threads (but not processes). Simply put, GIL is a lock that must be taken before any access to Python (not that important if Python code is executing or calls using Python C API). Therefore, GIL will protect internal structures from non-consistent states, but you will have to use synchronization primitives like in any other language.

#python