Python etc
6.37K subscribers
17 photos
161 links
Regular tips about Python and programming in general

Owner — @pushtaev
The current season is run by @orsinium

Tips are appreciated: https://ko-fi.com/pythonetc / https://sobe.ru/na/pythonetc

© CC BY-SA 4.0 — mention if repost
Download Telegram
to view and join the conversation
In Python, an object is physically destroyed and unloaded from the memory when nobody has a reference to it anymore. That is also true for any number of objects that have cross-references but are not available for the rest of the objects (so-called reference cycles).

There might be a case when you want to have a reference to an object but don't want to prevent its destruction if your reference is the last one. The reference you want to have in this case is called weak. Weak references are extremely helpful for old kind of caches or indexes.

The weakref module allows you to create weak references explicitly or you dictionaries with them inside. Unfortunately, not all types support weak referencing; sometimes you have to create trivial subclasses:

class List(list):
pass

weakref.ref creates an object that you must call to get the original value:

>>> x = List()
>>> r = weakref.ref(x)
>>> r()
[]
>>> del x
>>> r
<weakref at 0x7f302db036d8; dead>
>>> r()
>>>


weakref.proxy creates an object that acts almost like a standard reference:

>>> x = List()
>>> p = weakref.proxy(x)
>>> p
<weakproxy at 0x7f302db03688 to List at 0x7f302db87ea8>
>>> list(p)
[]
>>> p.append(42)
>>> p[0]
42
>>> del x
>>> p
<weakproxy at 0x7f302db03688 to NoneType at 0x8a1a80>
>>> p[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ReferenceError: weakly-referenced object no longer exists
itertools.cycle(x) gives you an iterator that repeats all that x yields indefinitely:

In : c = cycle([1, 2, 3])
In : next(c)
Out: 1
In : next(c)
Out: 2
In : next(c)
Out: 3
In : next(c)
Out: 1
In : next(c)
Out: 2
In : next(c)
Out: 3
In : next(c)
Out: 1


Note, that not all iterables can be reiterated, so cycle makes a copy of all elements so it can yield them again. That can be unnecessarily inefficient for iterables that can, e. g. lists. You probably shouldn't care about it unless your list is big enough. If this is the case, you should reimplement cycle somehow like this:

def safe_cycle(iterable):
while True:
empty = True
for x in iterable:
empty = False
yield x

if empty:
return
Every call to next(x) returns the new value from the x iterator unless an exception is raised. If this is StopIteration, it means the iterator is exhausted and can supply no more values. If a generator is iterated, it automatically raises StopIteration upon the end of the body:

>>> def one_two():
... yield 1
... yield 2
...
>>> i = one_two()
>>> next(i)
1
>>> next(i)
2
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration


StopIteration is automatically handled by tools that calls next for you:

>>> list(one_two())
[1, 2]


The problem is, any unexpected StopIteration that is raised within a generator causes it to stop silently instead of actually raising an exception:


def one_two():
yield 1
yield 2

def one_two_repeat(n):
for _ in range(n):
i = one_two()
yield next(i)
yield next(i)
yield next(i)

print(list(one_two_repeat(3)))

The last yield here is a mistake: StopIteration is raised and makes list(...) to stop the iteration. The result is [1, 2], surprisingly.


However, that was changed in Python 3.7. Such foreign StopIteration is now replaced with RuntimeError:

Traceback (most recent call last):
File "test.py", line 10, in one_two_repeat
yield next(i)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "test.py", line 12, in <module>
print(list(one_two_repeat(3)))
RuntimeError: generator raised StopIteration


You can enable the same behavior since python3.5 by from __future__ import generator_stop.
The default list slicing in Python creates copies. It may be undesirable if a sliced part is too big to be copied, you want it to reflect changes in the list, or even want to modify the slice to affect the original object.

To solve the problem with copying a lot of data, one can use itertools.islice. It lets you iterate over the part of the list, but doesn't support indexing or modification.

To achieve more than this, we have to write a custom class. Luckily Python provides the suitable abstract base class: collections.abc.MutableSequence. You only need to override __getitem__, __setitem__, __delitem__, __len__ and insert.

This is the example of how you do it. It doesn't support deletion and inserting, but supports slicing slices and modifications.
All objects in Python are created via the call to the __new__ method. Even if you provide custom __new__ for your class, you have to call super().__new__(...).

You might think that object.__new__ is a root implementation that is responsible for the creation of all objects. That is not entirely true. There are several such implementations, and they are incompatible. For example, dict has its own low-level __new__ and objects of types derived from dict can't be created with object.__new__:

In : class D(dict):
...: pass
...:

In : class A:
...: pass
...:

In : object.__new__(A)
Out: <__main__.A at 0x7f200c8902e8>

In : object.__new__(D)
...
TypeError: object.__new__(D) is not safe,
use D.__new__()
To sort a dictionary by its values you use sorted with the custom key function:

>>> d = dict(a=1, c=3, b=2)
>>> sorted(d.items(), key=lambda item: item[1])
[('a', 1), ('b', 2), ('c', 3)]


However, such function already exists in the operator module:

>>> sorted(d.items(), key=itemgetter(1))
[('a', 1), ('b', 2), ('c', 3)]


You can also sort keys instead of items:

>>> sorted(d, key=lambda k: d[k])
['a', 'b', 'c']


Again, this lambda can be replaced with the already existing method:

>>> sorted(d, key=d.get)
['a', 'b', 'c']
The popular method to declare an abstract method in Python is to use NotImplentedError exception:

def human_name(self):
raise NotImplementedError


Though it's pretty popular and even has IDE support (PyCharm considers such method to be abstract), this approach has a downside. You get the error only upon method call, not upon class instantiation.

Use abc to avoid this problem:

from abc import ABCMeta, abstractmethod
class Service(metaclass=ABCMeta):
@abstractmethod
def human_name(self):
pass


Also be aware that NotImplemented is not the same that NotImplementedError. It's not even an exception. It's a special value (like True and False) that has an absolutely different meaning. Some special methods may return it (e.g., __eq__(), __add__(), etc.) so Python tries to reflect operation. If a.__add__(b) returns NotImplemented, Python tries to call b.__radd__.
We often say that a coroutine may be interrupted at the point of any await. Strictly speaking, this is not true. A coroutine indeed may be interrupted on some awaits, but not at any of them. To be interruptable, await should await a future, or a coroutine that awaits a future, and so on.

Usually, that await-chain of coroutines ends with awaiting future except for the async functions that have no await in their body:

import asyncio

async def p(x):
print(x)

async def task(x):
while True:
await p(x)

loop = asyncio.get_event_loop()
loop.create_task(task(1))
loop.create_task(task(2))
loop.run_forever()


This code prints 1 infinitely and never prints 2. You may wonder, why to make function async if it has no await int the body. It may happen if you override a parent method that is async but don't need to await in the child implementation. Also, an author of public API may make functions async in case it will have to use await in future releases.
collections.defaultdict allows you to create a dictionary that returns the default value if the requested key is missing (instead of raising KeyError). To create a defaultdict you should provide not a default value but a factory of such values.

That allows you to create a dictionary that virtually contains infinite levels of nested dicts, allowing you to do something like d[a][b][c]...[z].

>>> def infinite_dict():
... return defaultdict(infinite_dict)
...
>>> d = infinite_dict()
>>> d[1][2][3][4] = 10
>>> dict(d[1][2][3][5])
{}


Such behavior is called “autovivification”, the term came from the Perl language.
An object instantiation includes two significant steps. First, the __new__ method of a class is called. It creates and returns a brand new object. Second, Python calls the __init__ method of that object. Its work is to set up the initial state of the object.

However, __init__ isn't called if __new__ returns an object that is not an instance of the original class. The reason for this is that it was probably created by another class, hence __init__ was already called for that object:

class Foo:
def __new__(cls, x):
return dict(x=x)

def __init__(self, x):
print(x) # Never called

print(Foo(0))


That also means that you should not ever create instances of the same class in __new__ with a regular constructor (Foo(...)). It could lead to the double __init__ execution or even infinite recursion.

Infinite recursion:

class Foo:
def __new__(cls, x):
return Foo(-x) # Recursion


Double __init__:

class Foo:
def __new__(cls, x):
if x < 0:
return Foo(-x)
return super().__new__(cls)

def __init__(self, x):
print(x)
self._x = x


The proper way:

class Foo:
def __new__(cls, x):
if x < 0:
return cls.__new__(cls, -x)
return super().__new__(cls)

def __init__(self, x):
print(x)
self._x = x
Native Python float values use your computer hardware directly, so any value is represented internally as a binary fraction.

That means that you usually work with approximations, not exact values:

In : format(0.1, '.17f')
Out: '0.10000000000000001'


The decimal module lets you use decimal floating point arithmetic with arbitrary precision:

In : Decimal(1) / Decimal(3)
Out: Decimal('0.3333333333333333333333333333')


That's still can be not enough:

In [61]: Decimal(1) / Decimal(3) * Decimal(3) == Decimal(1)
Out[61]: False


For perfect computations, you can use fractions, that stores any number as a rational one:

In : Fraction(1) / Fraction(3) * Fraction(3) == Fraction(1)
Out: True


The obvious limitation is you still have to use approximations to irrational numbers (such as π).
To set the default values of attributes in a constructor, you usually use a simple if:

def __init__(self, cache=None):
if cache is None:
cache = {}
self._cache = cache


It can be rewritten a little shorter:

def __init__(self, cache=None):
self._cache = cache or {}


This method a couple of drawbacks though. First, the intent of such or may not be clean enough since it is usually used in boolean context. Second, or checks for False, not for None, that can lead to obscure bugs.
The simplest way to use the logging module is to call functions directly from it, without creating a logger object.

import logging
logging.error('xxx')


This global logger can be configured via the logging.basicConfig() call:

import logging
logging.basicConfig(format='-- %(message)s --')
logging.error('xxx') # -- xxx --


Due to its global nature, basicConfig has some limitation. First, only the first call actually does something, any further calls of basicConfig are entirely ignored. Second, any function that writes a log message calls basicConfig, so you must configure logging before logging any messages:

import logging
logging.error('xxx') # ERROR:root:xxx
logging.basicConfig(format='-- %(message)s --')
logging.error('xxx') # ERROR:root:xxx
Python lets you know the path to any source file. Within a file, __file__ returns the relative path to it:

$ cat test/foo.py
print(__file__)
$ python test/foo.py
test/foo.py


The typical usage for that is to find the path where the script is located. It can be helpful for finding other files such as configs, assets, etc.

To get the absolute path form the relative one you can use os.path.abspath. So the common idiom to get the script directory path is:

dir_path = os.path.dirname(
os.path.abspath(__file__)
)
There are two concepts with similar names that can be easily confused: overriding and overloading.

Overriding happens when a child class defines a method that is already provided by its parents effectively replacing it. In some languages you have to explicitly mark the overriding method (C# requires the override modifier), in some languages it's optional (the @Override annotation in Java). Python doesn't require any special modifier nor does it have a standard way to mark such methods (some people like to use a custom @override decorator that does virtually nothing, just for the sake of readability).

Overloading is another story. Overloading is having multiple functions with the same name but different signatures. It's supported by languages like Java and C++ and is often used as a way to provide default arguments:

class Foo {
public static void main(String[] args) {
System.out.println(Hello());
}

public static String Hello() {
return Hello("world");
}

public static String Hello(String name) {
return "Hello, " + name;
}
}


Python doesn't support finding functions by their signatures, only be their names. You can write code that analyzes the types and number of arguments explicitly. That usually looks clumsy and generally is not a nice thing to do:

def quadrilateral_area(*args):
if len(args) == 4:
quadrilateral = Quadrilateral(*args)
elif len(args) == 1:
quadrilateral = args[0]
else:
raise TypeError()

return quadrilateral.area()


If you need type hints for this, the typing module can help you with the @overload decorator:

from typing import overload

@overload
def quadrilateral_area(
q: Quadrilateral
) -> float: ...

@overload
def quadrilateral_area(
p1: Point, p2: Point,
p3: Point, p4: Point
) -> float: ...
Storing users' passwords is a big deal. You can't be sure no intruder ever accesses your database, but you must be sure they can't learn password that your clients may use in some other places.

So you can't store the password as is, but you have to save enough information to verify a password once a user reaccesses your service. The simple solution is to store y =h(x) where h is a hash function, and x is a password. To verify that some p is a valid password, you check whether h(p) == y.

An intruder may have h(x) calculated for a huge amount of simple x values. As a countermeasure, you should store y = h(x + s), where s is salt. Salt has to be unique for every password stored; if it's not, h(x + s) is just some other hash function: g(x). Salt has to be long enough; if it's not, x + s might be in the intruder's dictionary as easily as x.

However, it's not that simple. You can't use any general purpose hash functions as h for that task. They are fast enough and allow intruders to brute-force your whole database. Functions like md5 and sha- of any kind don't suit this purpose. You need a function that is deliberately designed for hashing passwords, e. g. bcrypt or scrypt. Both of them are available through modules of the same name.
In Python, you can override square brackets operator ([]) by defining __getitem__ magic method. This is how you create an object that virtually contains an infinite number of repeated elements:

class Cycle:
def __init__(self, lst):
self._lst = lst

def __getitem__(self, index):
return self._lst[
index % len(self._lst)
]

print(Cycle(['a', 'b', 'c'])[100]) # 'b'


The unusual thing here is that the [] operator supports a unique syntax. It can be used not only like this — [2], but also like this — [2:10], or [2:10:2], or [2::2], or even [:]. The semantic is [start:stop:step], but you can use it any way you want for your custom objects.

But what __getitem__ gets as an index parameter if you call it using that syntax? The slice objects exist precisely for that.

In : class Inspector:
...: def __getitem__(self, index):
...: print(index)
...:
In : Inspector()[1]
1
In : Inspector()[1:2]
slice(1, 2, None)
In : Inspector()[1:2:3]
slice(1, 2, 3)
In : Inspector()[:]
slice(None, None, None)


You can even combine tuple and slice syntaxes:

In : Inspector()[:, 0, :]
(slice(None, None, None), 0, slice(None, None, None))


slice is not doing anything for you except simply storing start, stop and step attributes.

In : s = slice(1, 2, 3)
In : s.start
Out: 1
In : s.stop
Out: 2
In : s.step
Out: 3
Sometimes you want to run a piece of code and ignore all exceptions that it may raise. It's reasonable for plugins, foreign modules, and other units you don't understand nor trust.

The proper way to do this is to use try with except Exception, not bare except:

try:
foreign()
except Exception:
logging.warn('fail', exc_info=True)

except without explicit exception type is an equivalent for except BaseException. The difference between BaseException and Exception is that the former includes exceptions you usually don't want to be caught, such as KeyboardInterrupt.
In Python, a variable name may consist of a single underscore: _. Though usually such names are not descriptive enough and should not be used, there are at least three cases when _ has a conventional meaning.

First, interactive Python interpreters use _ to store the result of the last executed expression:

>>> 2 + 2
4
>>> _
4


Second, the gettext module's manual recommends to alias its gettext() function to _() as a way to minimize cluttering your code.

Third, _ is used when you have to come up with names for values you don't' care about:

>>> log_entry = '10:50:24 14234 GET /api/v1/test'
>>> time, _, method, location = log_entry.split()
If you import a module that was already imported, it doesn't do anything, since Python tracks what modules were already loaded. All such modules are placed into the sys.modules dictionary:

In : import sys
In : 'sys' in sys.modules.keys()
Out: True


If you really need to reload a module, you should use the importlib.reload(m) function. m is an object of a module that was successfully imported before, not a string with its name:

In : import importlib
In : importlib.reload(importlib)
Out[5]: <module 'importlib' from '/home/pushtaev/.ve/pythonetc/lib/python3.6/importlib/__init__.py'>
Any running asyncio coroutine can be cancelled via the cancel() method. CancelledError will be thrown into the coroutine that will lead for it and all wrapping coroutines to be terminated, unless the error is caught and suppressed.

CancelledError is a subclass of Exception that means that it can be accidentally caught by try ... except Exception that is meant to catch “any error”. To safely do this within a coroutine, you stuck with something like this:

try:
await action()
except asyncio.CancelledError:
raise
except Exception:
logging.exception('action failed')