Python etc
6.12K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
Python provides the powerful library to work with date and time: datetime. The interesting part is datetime objects have the special interface for timezone support (namely tzinfo attribute), but this module only has limited support of its interface, leaving the rest of the job to different modules.

The most popular module for this job is pytz. The tricky part pytz don't fully satisfy tzinfo interface. The pytz documentation states this at one the first lines: “This library differs from the documented Python API for tzinfo implementations.”

You can't use pytz timezone objects as a tzinfo attribute. If you try, you may get the absolute insane results:

In : paris = pytz.timezone('Europe/Paris')
In : str(datetime(2017, 1, 1, tzinfo=paris))
Out: '2017-01-01 00:00:00+00:09'


Look at this +00:09 offset. The proper use of pytz is following:

In : str(paris.localize(datetime(2017, 1, 1)))
Out: '2017-01-01 00:00:00+01:00'


Also, after any arithmetic operations, you should normalize your datetime object in case of offset changes (on the edge of DST period for instance).

In : new_time = time + timedelta(days=2)
In : str(new_time)
Out: '2018-03-27 00:00:00+01:00'
In : str(paris.normalize(new_time))
Out: '2018-03-27 01:00:00+02:00'


Since Python 3.6 it's recommended to use dateutil.tz instead of pytz. It's fully compatible with tzinfo, can be passed as an attribute, doesn't require normalize, though works a bit slower.

If you are interested why pytz doesn't support datetime API, or you wish to see more examples, consider reading the decent article on the topic.
Generators are one of the most influential Python mechanics. They have many uses, and one of them is to create context managers easily. Usually, you have to manually define __enter__ and __exit__ magic methods, but @contextmanager decorator from contextlib makes it far more convenient:

from contextlib import contextmanager

@contextmanager
def atomic():
print('BEGIN')

try:
yield
except Exception:
print('ROLLBACK')
else:
print('COMMIT')


Now atomic is a context manager that can be used like this:

In : with atomic():
...: print('ERROR')
...: raise RuntimeError()
...:
BEGIN
ERROR
ROLLBACK


Additionally, the @contextmanager magic allows to use it as a decorator as well as a context manager:

In : @atomic()
...: def ok():
...: print('OK')
...:
In : ok()
...:
BEGIN
OK
COMMIT
In Python, you can override square brackets operator ([]) by defining __getitem__ magic method. The example is Cycle object that virtually contains an infinite number of repeated elements:

class Cycle:
def __init__(self, lst):
self._lst = lst

def __getitem__(self, index):
return self._lst[index % len(self._lst)]

print(Cycle(['a', 'b', 'c'])[100]) # prints 'b'


The unusual thing here is [] operator supports a unique syntax. It can be used not only like this — [2], but also like this — [2:10], or [2:10:2], or [2::2], or even [:]. The semantic is [start:stop:step] but you can use it any way you want for your custom objects.

But what __getitem__ gets as an index parameter if you call it using that syntax? The slice objects exist precisely for this case.

In : class Inspector:
...: def __getitem__(self, index):
...: print(index)
...:
In : Inspector()[1]
1
In : Inspector()[1:2]
slice(1, 2, None)
In : Inspector()[1:2:3]
slice(1, 2, 3)
In : Inspector()[:]
slice(None, None, None)


You can even combine tuple and slice syntaxes:

In : Inspector()[:, 0, :]
(slice(None, None, None), 0, slice(None, None, None))


slice is not doing anything for you except simply storing start, stop and step attributes.

In : s = slice(1, 2, 3)
In : s.start
Out: 1
In : s.stop
Out: 2
In : s.step
Out: 3
A regular language is a formal language that can be recognized by a finite-state machine (FSM). That means that reading text, character by character, you only need memory to remember current state, and the number of such states is finite.

The beautiful and simple example is a machine that checks whether an input is a simple number like -3, 2.2 or 001. The following diagram is an FSM diagram. Double circles mean accept states, they identify where the machine can stop.
The machine starts at (1), possibly matches minus sign, then processes as many digits as required. After that, it may match a dot (3->4) which must be followed by one digit (4->5), but maybe more.

The classic example of a non-regular language is a family of strings like:

a-b
aaa-bbb
aaaaa-bbbbb


Formally, we need a line that consists of N occurrences of a, then -, then N occurrences of b. N is any integer greater than zero. You can't do it with a finite machine, because you have to remember the number of a chars you encountered which leads you to the infinite number of states.

Regular expressions can match only regular languages. Remember to check whether the line you are trying to process can be handled by FSM at all. JSON, XML or even simple arithmetic expression with nested brackets cannot be.

Mind, however, that a lot of modern regular expression engines are not regular. For example, Python regex module supports recursion (which will help with that aaa-bbb problem).
Apart from regular languages, Chomsky distinguishes three more types (ordered by descending strictness): context-free, context-sensitive, and unrestricted.

Context-free languages are more powerful than regular ones but still can be efficiently parsed by a program. XML, JSON and SQL are context-free for example.

Many tools allow you to parse such languages easily. Usually, they require you to define some grammar, the rules on how to parse and create a parser automatically. The most popular way to define such grammar is the BNF language. Here is the grammar to parse simple arithmetical expressions (only + supported) defined in BNF:

 <expr> ::= <operand> "+" <expr>  |  <operand>
<operand> ::= "(" <expr> ")" | <const>
<const> ::= integer


This is the set of rules. An expression is an operand plus another expression or just operand. An operand is either a constant or an expression enclosed in brackets. This way we can see the recursive nature of this language, which makes it non-regular.

The example of a context-free grammar parser for Python is lark. It is what you want if regexes are not enough or code that does the parsing gets messy.
Since BNF is a context-free language itself, you can represent its syntax with a BNF :).

<syntax>         ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> "<" <rule-name>
">" <opt-whitespace>
"::=" <opt-whitespace> <expression>
<line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> <opt-whitespace>
"|" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> |
<line-end> <line-end>
<list> ::= <term> |
<term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text1> '"' | "'" <text2> "'"
<text1> ::= "" | <character1> <text1>
<text2> ::= "" | <character2> <text2>
<character> ::= <letter> | <digit> | <symbol>
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" |
"G" | "H" | "I" | "J" | "K" | "L" |
"M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" |
"Y" | "Z" | "a" | "b" | "c" | "d" |
"e" | "f" | "g" | "h" | "i" | "j" |
"k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" |
"w" | "x" | "y" | "z"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" |
"6" | "7" | "8" | "9"
symbol> ::= "|" | " " | "!" | "#" | "$" | "%" |
"&" | "(" | ")" | "*" | "+" | "," |
"-" | "." | "/" | ":" | ";" | ">" |
"=" | "<" | "?" | "@" | "[" | "\" |
"]" | "^" | "_" | "`" | "{" | "}" |
"~"
<character1> ::= <character> | "'"
<character2> ::= <character> | '"'
<rule-name> ::= <letter> | <rule-name> <rule-char>
<rule-char> ::= <letter> | <digit> | "-"
Usually, you communicate with a generator by asking for data with next(gen). You also can send some values back with g.send(x) in Python 3. But the technique you probably don't use every day, or maybe even isn't aware of, is throwing exceptions inside a generator.

With gen.throw(e) you may raise an exception at the point where the gen generator is paused, i. e. at the point of some yield. If gen catches the exception, get.throw(e) returns the next value yielded (or StopIteration is raised). If gen doesn't catch the exception, it propagates back to you.

In : def gen():
...: try:
...: yield 1
...: except ValueError:
...: yield 2
...:
...: g = gen()
...:

In : next(g)
Out: 1

In : g.throw(ValueError)
Out: 2

In : g.throw(RuntimeError('TEST'))
...
RuntimeError: TEST


You can use it to control generator behavior more precisely, not only be sending data to it but by notifying about some problems with values yielded for example. But this is rarely required, and you have a little chance to encounter g.throw in the wild.

However, @contextmanager decorator from contextlib does exactly this to let the code inside the context catch exceptions.

In : from contextlib import contextmanager
...:
...: @contextmanager
...: def atomic():
...: print('BEGIN')
...:
...: try:
...: yield
...: except Exception:
...: print('ROLLBACK')
...: else:
...: print('COMMIT')
...:

In : with atomic():
...: print('ERROR')
...: raise RuntimeError()
...:
BEGIN
ERROR
ROLLBACK
The default list slice in Python creates a copy. It may be undesirable if a slice is too big to be copied, you want a slice to reflect changes in the list, or even want to modify a slice to affect the original object.

To solve the problem with copying a lot of data, one can use itertools.islice. It lets you iterate over the part of the list, but doesn't support indexing or modification.

The way to have a class for modifiable slices is to create it. Luckily Python provides the suitable abstract base class: collections.abc.MutableSequence (just collections.MutableSequence in Python 2). You only need to override __getitem__, __setitem__, __delitem__, __len__ and insert.

The example below doesn't support deletion and inserting, but supports slicing slices and modifications.
Reduce is a higher-order function that processes an iterable recursively, applying some operation to the next element of the iterable and the already calculated value. You also may know it termed fold, inject, accumulate or somehow else.

Reduce with result = result + element brings you the sum of all elements, result = min(result, element) gives you the minimum and result = element works for getting the last element of a sequence.

Python provides reduce function (that was moved to functools.reduce in Python 3):

In : reduce(lambda s, i: s + i, range(10))
Out: 45
In : reduce(lambda s, i: min(s, i), range(10))
Out: 0
In : reduce(lambda s, i: i, range(10))
Out: 9


Also, if you ever need such simple lambdas like a, b: a + b, Python got you covered with operator module:

In : from operator import add
In : reduce(add, range(10))
Out: 45
When you write a decorator, you almost always should use @functools.wraps:

def atomic(func):
@functools.wraps(func)
def wrapper():
print('BEGIN')
func()
print('COMMIT')

return wrapper


It updates wrapper, so it looks like an original func. It copies __name__, __module__ and __doc__ from func to wrapper.

It may help if you generate documentation by pydoc, practice doctest or use some introspection tools. Mind, however, that you still see the original name of the function in a stack trace (it's stored in wrapper.__code__.co_name).
If you want to ignore some exception, you probably do something like this:

try:
lst = [1, 2, 3, 4, 5]
print(lst[10])
except IndexError:
pass


That will work (without printing anything), but contextlib let you do the same more expressively and semantically correct:

from contextlib import suppress
with suppress(IndexError):
lst = [1, 2, 3, 4, 5]
lst[10]
Python 2 can unpack function parameters if you define them like a tuple:

In : def between(x, (start, stop)):
...: return start < x < stop
...:
In : interval = (5, 10)
In : between(2, interval)
Out: False
In : between(7, interval)
Out: True


It can even do it recursively:

In : def determinant_2_x_2(((a,b), (c,d))):
...: print a*d - c*b
...:

In : determinant_2_x_2([
...: (1, 2),
...: (3, 4),
...: ])
-2


However, this feature was removed in Python 3. You still can do the same by unpacking manually:

In : def determinant_2_x_2(matrix):
...: row1, row2 = matrix
...: a, b = row1
...: c, d = row2
...:
...: return a*d - c*b
...:

In : determinant_2_x_2([
...: (1, 2),
...: (3, 4),
...: ])
Out: -2
In Python, range() defines all integers in a half-open interval. So range(2, 10) means, speaking mathematically, [2, 10). Or, speaking Python, [2, 3, 4, 5, 6, 7, 8, 9].

Despite asymmetry, that is not a mistake nor an accident. It makes perfect sense since it allows you to glue together two adjacent intervals without risk of one-off errors:

[a, c) = [a, b) + [b, c) 


Compare to closed intervals that feel more “natural”:

[a, c] = [a, b] + [b+1, c]


This is also a reason for indexing to start from zero: range(0, N) has exactly N elements.

Dijkstra wrote an excellent article on the subject back in 1982.
coverage is a simple tool that can tell which part of your code was run and which was not during program execution. It's usually useful for unit-testing to detect parts that are probably not tested thoroughly enough.

Say, we need coverage result for the following script:

if 2 > 1:
print(':)')
else:
print(':(')


After we install coverage with pip install coverage we just run:

$ coverage run test.py
:)


As a result, the .coverage file is created in the current directory. Now we want to see the actual report:

$ coverage report
Name Stmts Miss Cover
-----------------------------
test.py 3 1 67%


It says that out of three statements we have, one was never executed (hence total coverage is ~67%).

For prettier and more detailed representation we can use coverage html:
Sometimes you need to know the size of a generator without retrieving the actual values. Some generators support len(), but this is not the rule:

In : len(range(10000))
Out: 10000

In : gen = (x ** 2 for x in range(10000))
In : len(gen)
...
TypeError: object of type 'generator' has no len()


The straightforward solution is to use an intermediate list:

In : len(list(gen))
Out: 10000


Though fully functional, this solution requires enough memory to store all the yielded values. The simple idiom allows to avoid such a waste:

In : sum(1 for _ in gen)
Out: 10000
When you want to empty a list in Python, you probably do lst = []. In fact, you just create a new empty list and assign it to lst, while all others owners of the same list still have the same content:

In : lst = [1, 2, 3]
In : lst2 = lst
In : lst = []
In : lst2
Out: [1, 2, 3]


While this may seem pretty obvious, the correct solution wasn't straightforward until lst.clear() was introduced in Python 3.3.

Before that, you should do del lst[:] or lst[:] = []. It works since slice syntax allows you to modify part of the list, and that part is the whole list in case of [:].
A lot of Python classes start with a similar boilerplate: straightforward constructor, trivial repr and stuff like that:

class Server:
def __init__(self, ip, version=4):
self.ip = ip
self._version = version

def __repr__(self):
return '{klass}("{ip}", {version})'.format(
klass=type(self).__name__,
ip=self.ip,
version=self._version,
)


One way to deal with it is to use popular attrs package, which does a lot of default things automatically driving by few declarations:

@attrs
class Server:
ip = attrib()
_version = attrib(default=4)

server = Server(ip='192.168.0.0.1', version=4)


It not only creates initializer and repr for you but a complete set of comparison methods as well.

That said, there is the upcoming change in Python 3.7, that brings us data classes, the standard library addition that should solve the same problem (and more). It uses the variable annotations, another comparably new Python feature. Here is an example:

@dataclass
class InventoryItem:
name: str
unit_price: float
quantity_on_hand: int = 0

def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand