Python etc
6.13K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
Python supports chained assignment with the following syntax:

a = b = c = 42

It does look like a chained C assignment, but it works in an entirely different manner. In C, the result of one assignment is used for another:

a = (b = (c = 42))


That's not the case in Python. The assignment doesn't even have a result in Python, that's a statement, not an expression. Instead, that just works as multiple assignments, from left to right:

  2           0 LOAD_CONST       1 (42)
2 DUP_TOP
4 STORE_FAST 0 (a)
6 DUP_TOP
8 STORE_FAST 1 (b)
10 STORE_FAST 2 (c)
Some Python modules are compiled into the interpreter itself. They are called built-in modules, not to be confused with the standard library. One can use sys.builtin_module_names to get the full list of such modules. The notable examples are sys, gc, time and so on.

Usually you don't care whether the module is built-in or not; however, you should be aware, that import always looks for a module among built-ins first. So, the built-in sys module is loaded even if you have sys.py available. On the other hand, if you have, say, datetime.py in the current directory it indeed can be loaded instead of the standard datetime module.
One of the techniques of metaprogramming is to use code generation. That means that you write some program that produces another program as an output. Even if your goal is to create Python code, the program that generates code can be written in any language.

Here is the Perl program that generates a line of Python source code, that is executed by Python interpreter afterward:

$ perl -e 'print "print(100)"' | python
100

It's worth noting that not every program that is syntactically correct may be executed, it's possible that expression is simply too long. That usually doesn't happen when you write source code manually, but it's entirely possible to have such problem while generating code:

$ python -c 'print("not " * 1000000 + "False")' | python
s_push: parser stack overflow
MemoryError
In Python, you can easily modify all standard variables that are available in the global namespace:

>>> print = 42
>>> print(42)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable


That may be helpful if your module defines some functions that have the same name as built-in ones. That also happens if you practice metaprogramming and you accept an arbitrary string as an identifier.

However, even if you shadow some built-in names, you still may want to have access to things they initially referred to. The builtins module exists exactly for that:


>>> import builtins
>>> print = 42
>>> builtins.print(1)
1


The __builtins__ variable is also available in most modules. There is a catch though. First, this is a cpython implementation detail and usually should not be used at all. Second, __builtins__ might refer to either builtins or builtins.__dict__, depending on how exactly the current module was loaded.
Dealing with exceptions in asynchronous programs may be not a simple task.

In asyncio, if coroutine raises an exception, it's then propagated to the code that awaits the corresponding future. If multiple places do await, every one of them gets the exception (since it's stored in the exception). The following code prints error five times:

import asyncio

async def error():
await asyncio.sleep(1)
raise ValueError()

async def waiter(task):
try:
await task
except ValueError:
print('error')
else:
print('OK')

async def main():
task = asyncio.get_event_loop().create_task(error())

for _ in range(5):
asyncio.get_event_loop().create_task(waiter(task))

await asyncio.sleep(2)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


If an exception is raised, but the task is never awaited, the exception is lost. In that case, when the task is destroyed, it warns you with “Task exception was never retrieved” message.

When you use await asyncio.gather(tasks) and one of the tasks raises an exception, it is propagated to you. However, if multiple tasks raise exceptions, you still only get the first one, the others are silently lost:

import asyncio

async def error(i):
await asyncio.sleep(1)
raise ValueError(i)

async def main():
try:
await asyncio.gather(
error(1),
error(2),
error(3),
)
except ValueError as e:
print(e)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


You may use gather with return_exceptions=True that make it return exceptions as though they are regular result values. The following code prints [42, ValueError(2,), ValueError(3,)]:

import asyncio

async def error(i):
await asyncio.sleep(1)
if i > 1:
raise ValueError(i)
return 42

async def main():
results = await asyncio.gather(
error(1),
error(2),
error(3),
return_exceptions=True,
)

print(results)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Python has a very short list of built-in constants. One of them is Ellipsis which is also can be written as .... This constant has no special meaning for the interpreter but is used in places where such syntax looks appropriate.

numpy support Ellipsis as a __getitem__ argument, e. g. x[...] returns all elements of x.

PEP 484 defines additional meaning: Callable[..., type] is a way to define a type of callables with no argument types specified.

Finally, you can use to indicate that function is not yet implemented. This is a completely valid Python code:

def x():
...
When you do obj.x = y you can't be sure that the attributed of obj named x is now equal to y. Python descriptor protocol lets define how attribute assignment is handled.

class Descriptor:
def __set__(self, obj, value):
obj.test = value

class A:
x = Descriptor()


In this example, x is never assigned, but the test attribute is assigned instead:

In : a = A()
In : a.x = 42
In : a.test
Out: 42
In : a.x
Out: <__main__.Descriptor at 0x7ff7baef51d0>


In case you actually need to change the x attribute as a part of tests or advanced metaprogramming, you have to modify __dict__ directly:

In : a.__dict__['x'] = 42

In : a.x
Out: 42
itertools.tee() creates multiple iterators from the single one. That may be helpful if numerous consumers need to read the same stream.

In : a, b, c = tee(iter(input, ''), 3)

In : next(a), next(c)
FIRST
Out: ('FIRST', 'FIRST')

In : next(a), next(b)
SECOND
Out: ('SECOND', 'FIRST')

In : next(a), next(b), next(c)
THIRD
Out: ('THIRD', 'SECOND', 'SECOND')


The data that is not yet used by all iterators is stored in memory. If some of the created iterators are not yet started at the time another one is finished, that means that all of the generated elements are saved in memory for future use. In that case, it's simpler and more efficient to use list(iter(input, '')) instead of tee.
To store any information in memory or on a storage device, you should represent it in bytes. Python usually provides the level of abstraction where you can think about data itself, not its byte form.

Still, when you write, say, a string to a file, you deal with a physical structure of data. To put characters into a file you should transform them into bytes; that is called encoding. When you get bytes from a file, you probably want to convert them into meaningful characters; that is call decoding.

There are hundreds of encoding methods out there. The most popular one is probably Unicode, but you can't transform anything to bytes with it. In the sense of byte representation, Unicode is not even an encoding. Unicode defines a mapping between characters and their integer codes. 🐍 is 128 013, for example.

But to put integers into a file, you need a real encoding. Unicode is usually used with utf-8, which is (usually) a default in Python. When you read from a file, Python automatically decodes utf-8. You can choose any other encoding with encoding= parameter of the open function, or you can read plane bytes by appending b to its mode.
PEP 492 introduces async and await keywords not only for asyncio but as a universal Python mechanism for any asynchronous libraries. It significantly influenced the Python data model by introducing a lot of new magic methods such as __await__, __aiter__, __anext__ and many others.

That said, there are different libraries in the wild that can be used instead of asyncio. One of them is uvloop, that does not only use new API introduced by PEP 492 but also completely mimics the asyncio interface since it's meant to be a fast drop-in replacement of asyncio.

uvloop is created by the author of the PEP mentioned above and is built on top of libuv, the asynchronous library that they use in nodejs. uvloop is declared to be at least twice as fast as asyncio.
If you create new objects inside your __init__ it may be better to pass them as arguments and have a factory method instead. It separates business logic from technical details on how objects are created.

In this example __init__ accepts host and port to construct a database connection:

class Query:
def __init__(self, host, port):
self._connection = Connection(host, port)


The possible refactoring is:

class Query:
def __init__(self, connection):
self._connection = connection

@classmethod
def create(cls, host, port):
return cls(Connection(host, port))


This approach has at least these advantages:

• It makes dependency injection easy. You can do Query(FakeConnection()) in your tests.
• The class can have as many factory methods as needed; the connection may be constructed not only by host and port but also by cloning another connection, reading a config file or object, using the default, etc.
• Such factory methods can be turned into asynchronous functions; this is completely impossible for __init__.
The same string can be represented in different ways in Unicode and the standard is aware of it. It defines two types of equivalence: sequences can be canonically equivalent or compatible.

Canonically equivalent sequences look exactly the same but contain different code points. For example, ö can be just LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) or a combination of o and a diaeresis modifier: LATIN SMALL LETTER O (U+006F) + COMBINING DIAERESIS (U+0308).

Compatible sequences look different but may be treated the same semantically, e. g. and ff.

For each of these types of equivalence, you can normalize a Unicode string by compressing or decompressing sequences. In Python, you can use unicodedata for this:

modes = [
# Compress canonically equivalent
'NFC',
# Decompress canonically equivalent
'NFD',
# Compress compatible
'NFKC',
# Decompress compatible
'NFKD',
]
s = 'ff + ö'

for mode in modes:
norm = unicodedata.normalize(mode, s)
print('\t'.join([
mode,
norm,
str(len(norm.encode('utf8'))),
]))


NFC     ff + ö   8
NFD ff + ö 9
NFKC ff + ö 7
NFKD ff + ö 8
Usually, you communicate with a generator by asking for data with next(gen). You also can send some values back with g.send(x) in Python 3. But the technique you probably don't use every day, or maybe even isn't aware of, is throwing exceptions inside a generator.

With gen.throw(e) you may raise an exception at the point where the gen generator is paused, i. e. at some yield. If gen catches the exception, get.throw(e) returns the next value yielded (or StopIteration is raised). If gen doesn't catch the exception, it propagates back to you.

In : def gen():
...: try:
...: yield 1
...: except ValueError:
...: yield 2
...:
...: g = gen()
...:

In : next(g)
Out: 1

In : g.throw(ValueError)
Out: 2

In : g.throw(RuntimeError('TEST'))
...
RuntimeError: TEST

You can use it to control generator behavior more precisely, not only by sending data to it but by notifying about some problems with values yielded for example. But this is rarely required, and you have a little chance to encounter g.throw in the wild.

However, the @contextmanager decorator from contextlib does exactly this to let the code inside the context catch exceptions.

In : from contextlib import contextmanager
...:
...: @contextmanager
...: def atomic():
...: print('BEGIN')
...:
...: try:
...: yield
...: except Exception:
...: print('ROLLBACK')
...: else:
...: print('COMMIT')
...:

In : with atomic():
...: print('ERROR')
...: raise RuntimeError()
...:
BEGIN
ERROR
ROLLBACK
The standard path expansion mechanism in a shell is called globbing. Patterns you use to match paths are called globs.

$ echo /li*
/lib /lib64


Python supports globbing via the glob module. However, there is an important caveat: a shell returns the pattern itself if no files are matched while Python doesn't:

$ echo /zz**
/zz**


$ python -c 'from glob import glob; print(glob("/zz**"))'
[]
The super() function allows referring to the base class. This can be extremely helpful in cases when a derived class wants to add something to the method implementation instead of overriding it completely:

class BaseTestCase(TestCase):
def setUp(self):
self._db = create_db()

class UserTestCase(BaseTestCase):
def setUp(self):
super().setUp()
self._user = create_user()


The function's name doesn't mean excellent or very good. The word super implies above in this context (like in superintendant). Despite what I said earlier, super() doesn't always refer to the base class, it can easily return a sibling. The proper name could be next() since the next class according to MRO is returned.

class Top:
def foo(self):
return 'top'

class Left(Top):
def foo(self):
return super().foo()

class Right(Top):
def foo(self):
return 'right'

class Bottom(Left, Right):
pass

# prints 'right'
print(Bottom().foo())


Mind that super() may produce different results since they depend on the MRO of the original call.

>>> Bottom().foo()
'right'
>>> Left().foo()
'top'
The creation of a class consists of two big steps. First, the class body is evaluated, just like any function body. Second, the resulted namespace (the one that is returned by locals()) is used by a metaclass (type by default) to construct an actual class object.

class Meta(type):
def __new__(meta, name, bases, ns):
print(ns)
return super().__new__(
meta, name,
bases, ns
)


class Foo(metaclass=Meta):
B = 2


The above code prints {'__module__': '__main__', '__qualname__': 'Foo', 'B': 3}.

Obviously, if you do something like B = 2; B = 3, then the metaclass only knows about B = 3, since only that value is in ns. This limitation is based on the fact, that a metaclass works after the body evaluation.

However, you can interfere in the evaluation by providing custom namespace. By default, a simple dictionary is used but you can provide a custom dictionary-like object using the metaclass __prepare__ method.

class CustomNamespace(dict):
def __setitem__(self, key, value):
print(f'{key} -> {value}')
return super().__setitem__(key, value)


class Meta(type):
def __new__(meta, name, bases, ns):
return super().__new__(
meta, name,
bases, ns
)

@classmethod
def __prepare__(metacls, cls, bases):
return CustomNamespace()


class Foo(metaclass=Meta):
B = 2
B = 3


The output is the following:

__module__ -> __main__
__qualname__ -> Foo
B -> 2
B -> 3


And this is how enum.Enum is protected from duplicates.
The map function calls another function for every element of some iterable. That means that function should accept a single value as an argument:

In : list(map(lambda x: x ** 2, [1, 2, 3]))
Out: [1, 4, 9]


However, if each element of the iterable is tuple, then it would be nice to pass each element of that tuple as a separate argument. It was possible in Python 2, thanks to the tuple parameter unpacking (note the parentheses):

>>> map(lambda (a, b): a + b, [(1, 2), (3, 4)])
[3, 7]


In Python 3, this feature is gone, but there is another solution. itertools.starmap unpacks tuple for you, as though a function is called with a star: f(*arg) (hence the function's name):

In [3]: list(starmap(lambda a, b: a + b, [(1, 2), (3, 4)]))
Out[3]: [3, 7]
Lambdas in Python can't do a lot of things that ordinary functions can. You can only have one expression as a lambda body, you can't use statements (a = b, yield, await etc.), lambdas are not allowed to have type hints or be declared async.

However, if you really need to turn lambda into an asynchronous function, you can use the asyncio.coroutine decorator. It was useful until Python 3.4 before async keyword was introduced, but has no much use in the modern Python.

In : f = asyncio.coroutine(lambda x: x ** 2)
In : asyncio.get_event_loop().run_until_complete(f(12))
Out: 144


Of course, that doesn't allow you to use await inside the lambda.
When you use the multiprocessing module, and there is an exception in one of the processes, it's propagated to the main program using pickling. The exception is pickled, passed to another process and then unpickled back.

However, pickling exceptions may be tricky. Exception is created with any number of arguments that are stored in the attribute named args. The same arguments are used to recreate an Exception object during unpickling.

However, that might not work as you expect if inheritance is in use. Look at the example:

import pickle

class TooMuchWeightError(Exception):
def __init__(self, weight):
super().__init__()
self._weight = weight

pickled = pickle.dumps(TooMuchWeightError(42))
pickle.loads(pickled)


TooMuchWeightError.__init__ calls Exception.__init__. Exception.__init__ sets args equal to the empty tuple. This empty tuple is used as arguments during unpickling which obviously leads to:

TypeError: __init__() missing 1 required positional argument: 'weight'

A workaround is either not to call super().__init__() at all (which is usually not a nice thing to do for a subclass) or pass all arguments to the parent's constructor explicitly:

class TooMuchWeightError(Exception):
def __init__(self, weight):
super().__init__(weight)
self._weight = weight
matplotlib is a complex and flexible Python plotting library. It's supported by a wide range of products, Jupyter and Pycharm including.

This is how you draw a simple fractal figure with matplotlib: https://repl.it/@VadimPushtaev/myplotlib