Python etc
6.12K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
PEP 424 allows generators and other iterable objects that don't have the exact predefined size to expose a length hint. For example, the following generator will likely return ~50 elements:

(x for x in range(100) if random() > 0.5)



If you write an iterable and want to add the hint, define the __length_hint__ method. If the length is known for sure, use __len__ instead.

If you use an iterable and want to know its expected length, use operator.length_hint.
You can call a function with both concrete keyword arguments and a dictionary of arguments to unpack:

def test(*args, **kwargs):
print(args, kwargs)

kwargs = dict(a=1)

test(b=2, **kwargs)



However, having the same key in the dictionary that is already specified explicitly as a keyword argument leads to an error:

>>> kwargs  = {'a': 2}
>>> test(a=1, **kwargs )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: test() got multiple values for keyword argument 'a'
Every Python class has two magic attributes which can be used to learn about its base classes.

The first is __bases__; it returns immediate parents of the class:

class A:
pass

class B(A):
pass

class C(A):
pass

class D(B, C):
pass


print(D.__bases__)
# (<class '__main__.B'>, <class '__main__.C'>)


The second is __mro__; it returns a tuple with all classes that are used during method resolution (hence the name), i. e. parents, their parents etc.

print(D.__mro__)
# (<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>)
The class of an object is available through the __class__ attribute:

>>> [1, 2].__class__
<class 'list'>



The more conventional way to get the class, however, is to use the type function. It's also the only way that works for old-style classes.

>>> type([1, 2])
<class 'list'>



Also, if you want to check whether some object is an instance of the given class, you should use isinstance instead of comparison:

>>> class A:
... pass
...
>>> class B(A):
... pass
...
>>> type(B())
<class '__main__.B'>
>>> isinstance(B(), A)
True
The in operator can be used with generators: x in g. Python will iterate over g until x is found or g is exhausted.

>>> def g():
... print(1)
... yield 1
... print(2)
... yield 2
... print(3)
... yield 3
...
>>> 2 in g()
1
2
True


However, range() does more than this for you. It has the __contains__ magic method overriden that allows in to work with the O(1) complexity:

In [1]: %timeit 10**20 in range(10**30)
375 ns ± 10.7 ns per loop


Mind that it doesn't work for the Python 2 xrange() function.
In Python, += and + are two separate operators. __iadd__ and __add__ methods respectively are responsible for their behavior.

class A:
def __init__(self, x):
self.x = x

def __iadd__(self, another):
self.x += another.x
return self

def __add__(self, another):
return type(self)(self.x + another.x)


If __iadd__ is not defined, a += b fallbacks to simple a = a + b.

The usual difference between += and + is that the first one changes the object while the second produces the new one:

>>> a = [1, 2, 3]
>>> b = a
>>> a += [4]
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> a = a + [5]
>>> a
[1, 2, 3, 4, 5]
>>> b
[1, 2, 3, 4]
In Python, the else block could be presented not only after if, but after for and while as well. The code inside else is executed unless the loop was interrupted by break.

The common usage for this is to search something in a loop and use break when found:

In : first_odd = None
In : for x in [2,3,4,5]:
...: if x % 2 == 1:
...: first_odd = x
...: break
...: else:
...: raise ValueError('No odd elements in list')
...:
In : first_odd
Out: 3

In : for x in [2,4,6]:
...: if x % 2 == 1:
...: first_odd = x
...: break
...: else:
...: raise ValueError('No odd elements in list')
...:
...
ValueError: No odd elements in list
The creation of a class consists of two big steps. First, the class body is evaluated, just like any function body. Second, the resulted namespace (the one that is returned by locals()) is used by a metaclass (type by default) to construct an actual class object.

class Meta(type):
def __new__(meta, name, bases, ns):
print(ns)
return super().__new__(
meta, name,
bases, ns
)


class Foo(metaclass=Meta):
B = 2


The above code prints {'__module__': '__main__', '__qualname__': 'Foo', 'B': 3}.

Obviously, if you do something like B = 2; B = 3, then the metaclass only knows about B = 3, since only that value is in ns. This limitation is based on the fact, that a metaclass works after the body evaluation.

However, you can interfere in the evaluation by providing custom namespace. By default, a simple dictionary is used but you can provide a custom dictionary-like object using the metaclass __prepare__ method.

class CustomNamespace(dict):
def __setitem__(self, key, value):
print(f'{key} -> {value}')
return super().__setitem__(key, value)


class Meta(type):
def __new__(meta, name, bases, ns):
return super().__new__(
meta, name,
bases, ns
)

@classmethod
def __prepare__(metacls, cls, bases):
return CustomNamespace()


class Foo(metaclass=Meta):
B = 2
B = 3


The output is the following:

__module__ -> __main__
__qualname__ -> Foo
B -> 2
B -> 3


And this is how enum.Enum is protected from duplicates.
The else block for both for and try expressions is pretty rarely used. However, combining them together you can write code that iterates over a collection until the first success without extra flags:

import logging
from typing import List, Optional

logging.basicConfig(level=logging.DEBUG)

def first_int(iterable: List[str]) -> Optional[int]:
for x in iterable:
try:
result = int(x)
except ValueError:
logging.debug('Bad int: %s', x)
else:
break
else:
result = None
logging.error('No int found')

return result

print(first_int(('a', 'b', '42', 'c')))


Output:

DEBUG:root:Bad int: a
DEBUG:root:Bad int: b
42
There is always a cost to writing tests. It comes in different forms on all lifecycle stages, which means that we need tools and approaches to keep it in check. Every Tuesday, @NikolayRys and I will present here an installment of a series consisting of 16 software design patterns, which you can apply to your production code to have an easier time during the testing. They base on the idea that different programming techniques differently affect our ability to have automatic tests, so there is a particular value in letting them shape our implementation.
Pattern 1: Dependency Injection

Motivation

I’ll start with the most well-known pattern — Dependency Injection, which is a case of the “D” in the SOLID acronym. Among those five principles, this one has the most direct and significant influence on the ease of testing. Out there is already enough literature about it, but for the sake of completeness, it is included here for an overview.

Usually, to be able to set up a test you need to gain control over all the components that affect the outcome or cause any side-effects, but there are also many other cases when you might what to isolate a piece of functionality from the tested code. For example, if it complicates or prevents automatic testing in some way - being too slow, using threads, being non-deterministic, affecting global state or working with I/O in all its diversity. The list goes on, but all these components are, in general, harder to manipulate than pure code, so splitting them from the business logic yields testing benefits.

Approach

This pattern proposes to turn a hard-wired dependency into an argument to make it configurable and explicit. It gives us control over the input, so such component might be substituted with a convenient test double if necessary, and then tested separately. While in production it may be equipped with the dependency that was in place before, for example, by using a default value. Another benefit is that you get rid of implicit dependencies which reduce the risk of missing important test cases.

Example

The initial implementation of a function:

def perform_job(params_data):
# It contains several implicit dependencies
logging.getLogger(__name__).info(
f"Execute API with {params_data}"
)

api_client = APIClient()
prepared_data = prepare_data(params_data)

api_client.execute(prepared_data, datetime.now())
api_client.disconnect()


This function contains several components that we would like to move out. There are many possible ways to accomplish this, — for example, if you work with an object-oriented language, we can move those new params to the constructor of the object, or just use some existing DI library. But we will use the most straightforward version of the pattern, that does not require any complex DI libraries or containers, because all such tools base on the same idea and their availability depends on your the language of your choice.

After the refactoring we get the following, which lets us control the input and observe the outcome in tests:

def perform_job(
params_data,
api_client=None,
execute_at=None,
logger=None,
):
if api_client is None:
api_client = APIClient()
if execute_at is None:
execute_at = datetime.now()
if logger is None:
logger = logging.getLogger(__name__)

logger.info(
f"Execute API with {params_data}"
)
prepared_data = prepare_data(params_data)
api_client.execute(prepared_data, execute_at)
bytes and str objects can't be concated directly:

>>> b'abc' + 'def'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str


To do that, you should either decode bytes or encode the string:

>>> b'abc'.decode('latin1') + 'def'
'abcdef'
>>> b'abc' + 'def'.encode('latin1')
b'abcdef'


bytes and str also have two different incompatible join methods:

>>> b':'.join([b'a', b'b'])
b'a:b'
>>> ':'.join(['a', 'b'])
'a:b'
You can't store a function as a class attribute because it's automatically converted into a method if accessed through an instance:

>>> class A:
... CALLBACK = lambda x: x ** x
...
>>> A.CALLBACK
<function A.<lambda> at 0x7f68b01ab6a8>
>>> A().CALLBACK
<bound method A.<lambda> of <__main__.A object at 0x7f68b01aea20>>
>>> A().CALLBACK(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: <lambda>() takes 1 positional argument but 2 were given


You can wrap function into a trivial descriptor as a workaround:

>>> class FunctionHolder:
... def __init__(self, f):
... self._f = f
... def __get__(self, obj, objtype):
... return self._f
...
>>> class A:
... CALLBACK = FunctionHolder(lambda x: x ** x)
...
>>> A().CALLBACK
<function A.<lambda> at 0x7f68b01ab950>


Using a class method instead of an attribute can also be a solution:

class A:
@classmethod
def _get_callback(cls):
return lambda x: x ** x
Sometimes you want to run a code block with multiple context managers:

with open('f') as f:
with open('g') as g:
with open('h') as h:
pass


Since Python 2.7 and 3.1, you can do it with a single with expression:

o = open
with o('f') as f, o('g') as g, o('h') as h:
pass


Before that, you could you use the contextlib.nested function:

with nested(o('f'), o('g'), o('h')) as (f, g, h):
pass


If you are working with the uknown number of context manager, the more advanced tool suits you well. contextlib.ExitStack allows you to enter any number of contexts at the arbitrary time but guarantees to exit them at the end:

with ExitStack() as stack:
f = stack.enter_context(o('f'))
g = stack.enter_context(o('g'))
other = [
stack.enter_context(o(filename))
for filename in filenames
]
This post is the second installment of the series of 16 software design patterns dedicated to improving code testability. Published every Tuesday.
Boundary classes

Motivation

If we apply the previous pattern, we still need to decide what to do with the extracted dependencies if they turn out to be particularly hard to test. As an extreme example, you may consider a situation when your code launches rockets: you still have to make sure that it works together, but you cannot afford this side effect to happen.

Unfortunately, the current unit testing technology does not usually allow to cross the border between the computer memory and the outside world. All such facilities need to be designed with the testing in mind. In the case of rockets, we would want them to support some form of a “fake start”/“dry run” feature, or to have some other facility that imitates them. Both options might not be feasible.

In this case, it may be reasonable to recur to the approach of last resort - exclude them from automatic test coverage at all. With a conscious attempt to reduce the amount of code that stays uncovered, you can cost-effectively minimize the risks.

Approach

The idea is to extract an automatically untestable feature to a simple Boundary Class, and then encapsulate it there with the absolute minimum of code, leaving it completely free of business logic. It is important to note that I do not suggest that you ship it untested, but instead do not include as a part of the automatic test suite. You are still expected to verify them manually on a higher level of the system.

Example

# All additional code or calculations are
# moved out from this class.
# Only the unavoidable and untestable interactions
# with the outside world remain.
# We may exclude it from coverage
# without feeling too bad about it.
class RocketGateway:
LAUNCH_PAD_PORT = 80

def __init__(self, launch_pad_ip):
self._launch_pad_ip = launch_pad_ip
self._connection = None

def connect(self):
self._connection = Connection.create(
self._launch_pad_ip, LAUNCH_PAD_PORT
)

def disconnect(self):
if self._connection is not None:
self._connection.close()

# The "command_string" is contstructed outside -
# in the tested code
def send_command(self, command_string):
self._connection.send(command_string)

# The processing of recieved data is
# also handled outside
def recieve_data(self):
return [
line for line in
self._connection.recv_all()
]
All objects that currently exist in the interpreter memory can be accessed via gc.get_objects():

In : class A:
...: def __init__(self, x):
...: self._x = x
...:
...: def __repr__(self):
...: class_name = type(self).__name__
...: x = self._x
...: return f'{class_name}({x!r})'
...:

In : A(1)
Out: A(1)

In : A(2)
Out: A(2)

In : A(3)
Out: A(3)

In : [x for x in gc.get_objects() if isinstance(x, A)]
Out: [A(1), A(2), A(3)]
Since Python 3.0, raising an exception in an except block will automatically add the caught exception in the __context__ attribute of the new one. That will cause both exceptions to be printed:

try:
1 / 0
except ZeroDivisionError:
raise ValueError('Zero!')


 (most recent call last):
File "test.py", line 2, in <module>
1 / 0
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 4, in <module>
raise ValueError('Zero!')
ValueError: Zero!


You also can add __cause__ to any exception with the raise ... from expression:

division_error = None

try:
1 / 0
except ZeroDivisionError as e:
division_error = e

raise ValueError('Zero!') from division_error


 (most recent call last):
File "test.py", line 4, in <module>
1 / 0
ZeroDivisionError: division by zero

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "test.py", line 8, in <module>
raise ValueError('Zero!') from division_error
ValueError: Zero!
In : int('୧৬𝟙༣')
Out: 1613


0 1 2 3 4 5 6 7 8 9 are not the only characters that are considered digits. Python follows Unicode rules and treats several hundreds of symbols as digits, here is the full list.

That affects functions like int, unicode.isdecimal and even re.match:

In : int('෯')
Out: 9

In : '٢'.isdecimal()
Out: True

In : bool(re.match('\d', '౫'))
Out: True
>>> bool(datetime(2018, 1, 1).time())
False
>>> bool(datetime(2018, 1, 1, 13, 12, 11).time())
True


Before Python 3.5, datetime.time() objects were considered false if they represented UTC midnight. That can lead to obscure bugs. In the following examples if not may run not because create_time is None, but because it's a midnight.

def create(created_time=None) -> None:
if not created_time:
created_time = datetime.now().time()


You can fix that by explicitly testing for None: if created_time is None.