Python etc
6.37K subscribers
17 photos
161 links
Regular tips about Python and programming in general

Owner — @pushtaev
The current season is run by @orsinium

Tips are appreciated: https://ko-fi.com/pythonetc / https://sobe.ru/na/pythonetc

© CC BY-SA 4.0 — mention if repost
Download Telegram
to view and join the conversation
At the moment of a method decoration, you can't gain access to this method's class since the method is a mere function yet and class is not yet even created.

Let's suppose we want to store names of all methods that are decorated with @mega. There is no way we could figure out klass in this example:


def mega(f):
klass.megamethods.append(f.__name__)
return f

class Foo:
megamethods = []

@mega
def x(self):
pass


print(Foo.megamethods)


This is how you solve it: let's mark all @mega-decorated methods using some private attribute (e. g., _mega), but process them later, using a class decorator:

def megamethods(klass):
methods = []
for attr_name in klass.__dict__:
attr = getattr(klass, attr_name)
if hasattr(attr, '_mega'):
if attr._mega:
methods.append(attr_name)

klass.megamethods = methods

return klass

def mega(f):
f._mega = True
return f

@megamethods
class Foo:
megamethods = []

@mega
def x(self):
pass


print(Foo.megamethods)
The default Python sorting is stable meaning it preserves the order of equal objects:

In : a = [2, -1, 0, 1, -2]

In : sorted(a, key=lambda x: x**2)
Out: [0, -1, 1, 2, -2]


max and min functions also tries to be consistent with sorted. max works as sorted(a, reverse=True)[0] while min does sorted(a)[0]. That means that both max and min returns the leftmost possible answer:

In : max([2, -2], key=lambda x: x**2)
Out: 2

In : max([-2, 2], key=lambda x: x**2)
Out: -2

In : min([2, -2], key=lambda x: x**2)
Out: 2

In : min([-2, 2], key=lambda x: x**2)
Out: -2
The collections module provides the ChainMap class. It allows you to use several mappings as though they were merged:

>>> d = ChainMap(dict(a=1), dict(a=2, b=2))
>>> d['a']
1
>>> d['b']
2
>>> d['c']
...
KeyError: 'c'


ChainMap sequentially scans all underlying mappings and returns the first value found. All modifying actions, however, also affects the first mapping only:

>>> d = ChainMap(dict(a=1), dict(a=2, b=2))
>>> d['c'] = 3
>>> d
ChainMap({'a': 1, 'c': 3}, {'a': 2, 'b': 2})
You can use any object as a dictionary key in Python as long as it implements the __hash__ method. This method can return any integer as long as the only requirement is met: equal objects should have equal hashes (not vice versa).

You also should avoid using mutable objects as keys, because once the object becomes not equal to the old self, it can't be found in a dictionary anymore.

There is also one bizarre thing that might surprise you during debugging or unit testing:

 : class A:
...: def __init__(self, x):
...: self.x = x
...:
...: def __hash__(self):
...: return self.x
...:
In : hash(A(2))
Out: 2
In : hash(A(1))
Out: 1
In : hash(A(0))
Out: 0
In : hash(A(-1)) # sic!
Out: -2
In : hash(A(-2))
Out: -2


In CPython -1 is internally reserved for error states, so it's implicitly converted to -2.
Probably the most common newbie mistake with Python it providing a mutable object as a default function argument. That object is shared between all function calls that can lead to bizarre results:

def append_length(lst=[]):
lst.append(len(lst))
return lst

print(append_length([1, 2])) # [1, 2, 2]
print(append_length()) # [0]
print(append_length()) # [0, 1]


However, for various caches sharing may be a good thing:

def fact(x, cache={0: 1}):
if x not in cache:
cache[x] = x * fact(x - 1)

return cache[x]

print(fact(5))


In this example, we store calculated factorial values inside the default function value. It can even be extracted:

>>> fact.__defaults__
({0: 1, 1: 1, 2: 2, 3: 6, 4: 24, 5: 120},)
Even though the timedelta object's constructor accepts various arguments (such as weeks or hours), all of them are normalized, so only days, seconds and microseconds are actually stored. In order to have unique representation for any given time interval, the number of microseconds never reaches 10**7, and the number of seconds never reaches (3600*24).

That means that you can't rely on getting the same amount of seconds from timedelta that you used during an object creation:

>>> from datetime import timedelta
>>> timedelta(seconds=1000000)
datetime.timedelta(11, 49600)
>>> timedelta(seconds=1000000).seconds
49600


To get the whole number of seconds in the given delta, one can use the total_seconds method:

>>> timedelta(seconds=1000000).total_seconds()
1000000.0
>>> timedelta(weeks=1, seconds=1).total_seconds()
604801.0
Some generators need to yield all of the elements of another one:

>>> def enclose(gen, before='{', after='}'):
... yield before
... for x in gen:
... yield x
... yield after
...
>>> list(enclose(range(5)))
['{', 0, 1, 2, 3, 4, '}']


The preferred method to do so, however, is to use yield from:

>>> def enclose(gen, before='{', after='}'):
... yield before
... yield from gen
... yield after


yield from not only works faster but it also automatically handles sending values to inner generators, returning values from generators and even raising exceptions inside a nested generator.
You can modify the code behavior during unit tests not only by using mocks and other advanced techniques but also with straightforward object modification:

import random
import unittest
from unittest import TestCase

class Foo:
def is_positive(self):
return self.rand() > 0

def rand(self):
return random.randint(-2, 2)


class FooTestCase(TestCase):
def test_is_positive(self):
foo = Foo()
foo.rand = lambda: 1
self.assertTrue(foo.is_positive())


unittest.main()


That's not gonna work if rand is property or any other descriptor. In that case, you should modify the class, not the object. However, modifying Foo might affect other tests, so the best way to deal with it is to create descendant.

class Foo:
def is_positive(self):
return self.rand > 0

@property
def rand(self):
return random.randint(-2, 2)


class FooTestCase(TestCase):
def test_is_positive(self):
class TestFoo(Foo):
@property
def rand(self):
return 1
foo = TestFoo()
self.assertTrue(foo.is_positive())
Python allows you to work with filesystem paths with the os.path module. The module contains a lot of functions that treat strings as paths and perform useful operations like concating paths and stuff like that:

>>> import os.path
>>> os.path.join('/usr', 'local')
'/usr/local'
>>> os.path.dirname('/var/log')
'/var'


However, since Python 3.4 the new pathlib module is available which offers an object-oriented approach:

>>> from pathlib import Path
>>> Path('/usr') / Path('local')
PosixPath('/usr/local')
>>> Path('/usr') / 'local'
PosixPath('/usr/local')
>>> Path('/var/log').parent
PosixPath('/var')
>>> Path('/var/log').parent.name
'var'
Creating a new variable is essentially creating a new name for an already existing object. That's why it's called name binding in Python.

There are numerous ways to bind names, these are the examples of how x can be bind:

x = y
import x
class x: pass
def x(): pass
def y(x): pass
for x in y: pass
with y as x: pass
except y as x:
You also can bind an arbitrary name by manipulating global namespaces:

In : x
NameError: name 'x' is not defined
In : globals()['x'] = 42
In : x
Out: 42
Note, however, that you cannot do the same with locals() since updates to the locals dictionary are ignored.
Objects in Python store their attributes in dictionaries that can be accessed by __dict__ magic attribute:

In [1]: class A: pass
In [2]: a = A()
In [3]: a.x = 1
In [4]: a.__dict__
Out[4]: {'x': 1}



By direct accessing it you can even create attributes that are not Python identifiers (which means you can't get them with a standard obj.attr syntax):

In [6]: a.__dict__[' '] = ' '
In [7]: getattr(a, ' ')
Out[7]: ' '



You can also ask Python to store attributes directly in memory (like a simple C struct) using __slots__. It will save some memory and some CPU cycles that are used for dictionary lookups.

class Point:
__slots__ = ['x', 'y']



There are some things you should remember while using slots. First, you can't set any attributes that are not specified in __slots__. Second, if some class is inherited from a class with slots, its __slots__ don't override parental __slots__ but are added to it:

class Parent: __slots__ = ['x']
class Child(Parent): __slots__ = ['y']
c = Child()
c.x = 1
c.y = 2



Third, you can't inherit from two different classes with nonempty __slots__, even if they are identical. You can get more information from this excellent Stack Overflow answer.

Remember, that slots is meant for optimization, not for constraining attributes.
The biggest drawback of objects with __slots__ is that they can't dynamically have arbitrary attributes. However, you can mix the __slots__ approach with the regular __dict__ one.

To enable dynamic assignment for the object just put '__dict__' into __slots__:

class A:
__slots__ = ('a', 'b', '__dict__')

A().x = 3


Also, mind that inherited classes automatically has __dict__ unless empty __slots__ is explicitly specified:

class A:
__slots__ = ('a', 'b')

class B(A):
pass

B().x = 3
In Python, you can create a callable object not only by creating functions (with def or lambda). An object is also callable if it has the __call__ magic method:

class truncater:
def __init__(self, length):
self._length = length

def __call__(self, s):
return s[0:self._length]


print(truncater(4)('abcdabcd')) # abcd


Since a decorator is basically a higher-order function, it can also be expressed with a callable object instead of a function:

class cached:
def __init__(self, func):
self._func = func
self._cache = {}

def __call__(self, arg):
if arg not in self._cache:
self._cache[arg] = self._func(arg)

return self._cache[arg]


@cached
def sqr(x):
return x * x
When you write custom __repr__ for some object, you usually want to include representation of its attributes. For that, you should make formatting call repr() on objects, since it calls str() by default.

It is done with the !r notation:

class Pair:
def __init__(self, left, right):
self.left = left
self.right = right

def __repr__(self):
class_name = type(self).__name__
return f'{class_name}({self.left!r}, {self.right!r})'
The problem with calling repr of other objects in your own __repr__ method is that you can't guarantee none of the other objects is not equal to self and the call isn't recursive:

In : p = Pair(1, 2)
In : p
Out: Pair(1, 2)
In : p.right = p
In : p
Out: [...]
RecursionError: maximum recursion depth exceeded while calling a Python object


To easily solve this problem you can use reprlib.recursive_repr decorator:

@reprlib.recursive_repr()
def __repr__(self):
class_name = type(self).__name__
return f'{class_name}({self.left!r}, {self.right!r})'



Now it works:

In : p = Pair(1, 2)
In : p.right = p
In : p
Out: Pair(1, ...)
In some rare cases, you need to copy a function object, not just a reference to it.

Since a function is a regular Python object, it can be created not only by def ... or lambda ... syntax but by calling its class's constructor directly. The class of functions can be obtained by calling type(any_function) or, more gracefully, within the types module: from types import FunctionType.

So, to create new function we call the class with old function attributes as arguments:

g = FunctionType(
f.__code__, f.__globals__,
name=f.__name__,
argdefs=f.__defaults__,
closure=f.__closure__,
)



Sadly, that doesn't preserve __kwdefaults__, defaults for keyword-only arguments, so we have to copy them manually. You also may want to call functools.update_wrapper to preserve mostly cosmetic attributes such as __doc__ and __annotations__.

g = FunctionType(
f.__code__, f.__globals__, name=f.__name__,
argdefs=f.__defaults__,
closure=f.__closure__,
)
g = functools.update_wrapper(g, f)
g.__kwdefaults__ = f.__kwdefaults__
Tail recursion is a special case of recursion where the recursive call is the last expression in the function:

def fact(x, result=1):
if x == 0:
return result
else:
return fact(x - 1, result * x)



The cool thing about it is you don't have to return to the caller once callee returns the result since the caller has nothing more to do. That means that you don't have to save the stack frame of the caller.

That technique is called TRE, tail recursion elimination. And Python doesn't support it. It was considered and declined by Guido, mostly because removing stack frames makes stack trace looks cryptic.
Python lacks the infamous goto operator. However, some clean uses of goto can still be emulated. For example:

try:
if error:
raise label() # goto label
...
except label: # label is here
...



To make this work you should declare label as an exception:

class label(Exception): pass
If you want to catch both IndexError and KeyError, you may and should use LookupError, their common ancestor. It proved to be useful while accessing complex nested data:

try:
db_host = config['databases'][0]['hosts'][0]
except LookupError:
db_host = 'localhost'
Function decorators don't have to return new functions, they might as well return any other value:

def call(*args, **kwargs):
def decorator(func):
return func(*args, **kwargs)

return decorator

@call(15)
def sqr_15(x):
return x * x

assert sqr_15 == 225



That can be useful to create trivial classes with only one method to define:

from abc import ABCMeta, abstractmethod

class BinaryOperation(metaclass=ABCMeta):
def __init__(self, left, right):
self._left = left
self._right = right

def __repr__(self):
klass = type(self).__name__
left = self._left
right = self._right
return f'{klass}({left}, {right})'

@abstractmethod
def do(self):
pass

@classmethod
def make(cls, do_function):
return type(
do_function.__name__,
(BinaryOperation,),
dict(do=do_function),
)

class Addition(BinaryOperation):
def do(self):
return self._left + self._right

@BinaryOperation.make
def Subtraction(self):
return self._left - self._right
If during iteration you need to access adjacent elements, you may create an iterator that automatically handles it for you.

from itertools import tee

def neighbours(iterable, n):
neighbours = tee(iterable, n)
for i, neighbour in enumerate(neighbours):
for _ in range(i):
next(neighbour)

return zip(*neighbours)

fibb = [1, 1, 2, 3, 5, 8, 13, 21]

for a, b, c in neighbours(fibb, 3):
assert c == a + b



In this example, we fork the original iterable with tee, then shift resulting iterables with next so the second one starts with the second element of the iterable and the third one starts with the third one, and then aggregate them back with zip.