Python etc
4.45K members
15 photos
79 links
Regular tips about Python and programming in general — @pushtaevhttps://ko-fi.com/pythonetc© CC BY-SA 4.0
Download Telegram
to view and join the conversation
Guys, I'm looking for experienced or simply smart Python developers to work with my team in Mail.Ru Group in Moscow, Russia. A unique experience and decent benefits guaranteed. Contact @pushtaev if you are the one.
Python allows you to annotate arguments of a function and its return value with any expressions. That doesn't affect function's behavior, all annotations just stored in __annotations__ and do nothing more:

In [2]: def pow(x: 1, p: 2) -> 3:
...: result = 1
...: for _ in range(p):
...: result *= x
...:
...: return result
...:

In [3]: pow.__annotations__
Out[3]: {'p': 2, 'return': 3, 'x': 1}

The point of having annotations is usually a decorator using them.
For example, you can assign types of values as annotations and force them to be cast to that types by a decorator.

import inspect

def autocast(f):
def decorated(*args):
spec = inspect.getfullargspec(f)
new_args = []
for i, arg_name in enumerate(spec.args):
cast = spec.annotations[arg_name]
new_args.append(cast(args[i]))

return_cast = spec.annotations['return']
return return_cast(f(*new_args))

return decorated

@autocast
def pow(x: float, p: int) -> str:
result = 1
for _ in range(p):
result *= x

return result

print(repr(pow('1.1', '2')))
Though you can use function annotations for whatever you want, type hints are the most popular application. PEP 484 standardizes such use, but not requires or enforces it, however.

Any class can be used as a type hint, but a hint doesn't need to be a class. typing module introduces additional types, such as Any or Union. PEP 484 also allows forward references by specifying string instead of actual types: def x() -> 'B'.

Type hits are typically used for static type analysis rather than in runtime. One of the most popular analyzers out there is mypy:

$ cat test.py
def pow(x: float, p: int) -> str:
result = 1.0
for _ in range(p):
result *= x

return result
$ mypy test.py
test.py:6: error: Incompatible return value type (got "float", expected "str")
Python allows you to dynamically change a class of an already created object. It's simple as that:

obj.__class__ = AnyClass

Though it's probably a bad idea to use such tricks as part of your regular architecture, it can be extremely useful during debugging. Here is how you can track all attribute accesses of an object without modifying its original code:

class User:
def __init__(self, name):
self._name = name

def to_str(self):
return '<{}>'.format(self._name)


class LoggedUser(User):
def __getattribute__(self, attr):
print('`{}` accessed'.format(attr))
return super().__getattribute__(attr)


u = User('lol')
u.__class__ = LoggedUser

print(u.to_str())
There are two built-in functions that let you analyze iterables without writing trivial and redundant ifs. These are all and any.

any returns True if some of the values are true; all returns True if all of them are. all returns True for an empty iterable while any returns False in that case.

Both functions are usually useful while used together with list comprehensions:

package_broken = any(
part.is_broken() for part package.get_parts()
)
package_ok = all(
part.ok() for part package.get_parts()
)


any and all are usually interchangeable thanks to De Morgan's laws. Choose one that is easier to understand.
When Python executes a method call, say a.f(b, c, d), it should first select the right f function. Due to polymorphism, what is selected depends on the type of a. The process of choosing the method is usually called dynamic dispatch.

Python supports only single-dispatch polymorphism, that means a single object alone (a in the example) affects the method selection. Some other languages, however, may also consider types of b, c and d. This mechanism is called multiple dispatch. C# is a notable example of languages that support that technique.

However, multiple dispatch can be emulated via single-dispatch. The visitor design pattern is created exactly for this. visitor essentially just uses single-dispatch twice to imitate double-dispatch.

Mind, that the ability to overload methods (like in Java and C++) is not the same as multiple dispatch. Dynamic dispatch works in runtime while overloading solely affects compile time.

These are some code examples to understand the topic better: Python visitor example, Java overloading doesn't work as multiple dispatch, C# multiple dispatch.
Two things in Python can be confused: iterables and iterators.

Iterables are objects that can be iterated, i. e. there is a way to extract some values from that object, one by one, probably infinitely. Iterables are usually some collections like arrays, sets, lists, etc.

There are two ways an object can become a proper iterable. The first one is to have __getitem__ method:

In : class Iterable:
...: def __getitem__(self, i):
...: if i > 10:
...: raise IndexError
...: return i
...:

In : list(Iterable())
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The second way is to define __iter__ method that returns an iterator. An iterator is an object with a __next__ method that returns next value from the original iterable once called:

In : class Iterator:
...: def __init__(self):
...: self._i = 0
...:
...: def __next__(self):
...: i = self._i
...: if i > 10:
...: raise StopIteration
...: self._i += 1
...: return i
...:
...: class Iterable:
...: def __iter__(self):
...: return Iterator()
...:
...:

In : list(Iterable())
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Usually, an iterator also has an __iter__ method that just returns self: it allows iterator to be iterated too, that means that most of the iterators are also iterables.
Blockchain is a group of technologies which solves the following problems: the need of trust a service or other third-party; elimination of intermediary; lowering operational costs. The problems are solved by means of cryptography and so-called consensus algorithms.

Can you imagine a service which runs your code, makes it available to everyone and cannot be shut down by a decision of one company? The closest analogy from the past is file-sharing services. The simple example of such blockchain code (called smart contracts) is a penny bank contract. You or any other person can transfer cryptocurrency to this contract, but there is literally no way to withdraw funds (even for you!) until say $1000 cap is reached. Also, this penny bank can't be blocked by any bank or stolen by any intruder.

Cryptographic signatures were here for a long time, so there was a solution to a problem of "who signed a money transfer?". The important question was "in which order transfers were executed?". The key breakthrough was the invention of consensus algorithm called Proof of Work which allows network participants to reach consensus on order of transfers (transactions), and as a consequence - on a current network state (accounts, balances, and so on) in a network of many thousands participants which can join and leave during network operation.

The first battle-tested blockchain was Bitcoin, and since then many more blockchains and related technologies came to life, e.g., smart contracts, zero-knowledge proofs, different consensus algorithms, etc.

Current dominant blockchain with the support of Turing-complete smart contracts is Ethereum. Here is documentation for Ethereum Python bindings, and here is a relevant part on interacting with smart contracts from Python.
Getting funds from your existing penny bank contract programmatically can be as simple as that:

from web3 import Web3

# contract ABI is generated
# by contract compiler and
# pretty self-describing
contract_abi = [{
"constant": false,
"inputs": [
{
"name": "destination",
"type": "address"
}
],
"name": "withdraw",
"outputs": [],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
}]

w3 = Web3(Web3.EthereumTesterProvider())
contract = w3.eth.contract(
address='0x5A0b54 ....',
abi=contract_abi
)

contract.functions.withdraw('0x882cf ...').transact()

Here is the introductory post to a blockchain for programmers.
Today's post is written by Python & smart contract developer working for mixbytes.io company. (They are hiring Django developers btw, contact @therealal for more details.) I'm always happy to host your post, contact me if you have something to share.
Usually, you don't care about iterator objects; they are created and used automatically by for, list or other things that do the iteration for you. However, in some rare cases, you need to get an iterator from an iterable explicitly. The proper way to do this is to use the iter built-in function (that uses __iter__ or __getitem__ methods of an object in order to get an iterator):

part_sizes = [3, 2, 5]
iterator = iter(range(100))

result = []
for size in part_sizes:
part = []
for _ in range(size):
part.append(next(iterator))
result.append(part)

assert result == [
[0, 1, 2],
[3, 4],
[5, 6, 7, 8, 9],
]


The funny thing is, iter can be used in the entirely different way. Instead of creating an iterator from an object, it is capable of making one from a function (or any callable). If you call iter with two arguments, the first one must be a callable object and the second is a sentinel. Upon every __next__, the created iterator will call the callable without arguments. If the value returned is equal to sentinel, StopIteration will be raised; otherwise, the value will be returned.

This is usually helpful for reading lines until some marker:

In : list(iter(input, 'END'))
a
b

END
Out: ['a', 'b', '']
In Python, each value has a boolean value. It is implicitly cast when you use if, bool, not etc.

False objects are None, False, 0 of any type, and empty collections: "", [], {} etc., including custom collections with the __len__ method as long as __len__ returns 0.

You can also define custom truth value testing for your objects, the __bool__ magic method is there for this:

class Rectangle:
def __init__(self, width, height):
self._w = width
self._h = height

def __bool__(self):
return bool(self._w and self._h)


In : bool(Rectangle(2, 3))
Out: True
In : bool(Rectangle(2, 0))
Out: False
In : bool(Rectangle(0, 2))
Out: False


Mind, that __bool__ is called __nonzero__ in Python 2.
One of the most inconsistent part of the Python syntax is tuple literals.

Basically, to create a tuple you just write values separated by commas: 1, 2, 3. OK, so far, so good. What about tuple containing only one element? You just add trailing comma to the only value: 1,. Well, that’s somewhat ugly and error prone, but makes sense.

What about empty tuple? Is it a bare ,? No, it’s (). Do parentheses create tuple as well as commas? No, they don’t, (4) is not a tuple, it’s just 4.

In : a = [
...: (1, 2, 3),
...: (1, 2),
...: (1),
...: (),
...: ]

In : [type(x) for x in a]
Out: [tuple, tuple, int, tuple]


To make things more obscure, tuple literals often require additional parentheses. If you want a tuple to be the only argument of a function, that f(1, 2, 3) doesn’t work for an obvious reason, you need f((1, 2, 3)) instead.
ipython saves every output so that you can use it later. You can find saved values in special variables: _ is the previous result, __ is next previous, ___ is next-next previous.

All results that don't return anything or return None are skipped:

In [1]: 1
Out[1]: 1

In [2]: print(':)')
:)

In [3]: 2
Out[3]: 2

In [4]: x = 42

In [5]: 3
Out[5]: 3

In [6]: __ # Out[3]
Out[6]: 2


To obtain other results you can use special variables like _42, where 42 is the number of output. All results are stored in the dictionary with two aliases: Out and _oh.

Mind that all that references to results are not weak. That means that destructors will never be called and memory will never be freed. You can use %xdel _42 to remove the variable and all other references to the same object.
If you have a CPU-heavy task and want to utilize all the cores you have, then multiprocessing.Pool is for you. It spawns multiple processes and delegates tasks to them automatically. Simply create a pool with Pool(number_of_processes) and run p.map with the list of inputs.

In : import math
In : from multiprocessing import Pool
In : inputs = [i ** 2 for i in range(100, 130)]
In : def f(x):
...: return len(str(math.factorial(x)))
...:

In : %timeit [f(x) for x in inputs]
1.44 s ± 19.2 ms per loop (...)

In : p = Pool(4)
In : %timeit p.map(f, inputs)
451 ms ± 34 ms per loop (...)

You can also omit the number_of_processes parameter, the default value for it is the number of CPU cores on the current system.
PEP 8 is a famous style guide for Python code. It's not enforced by the interpreter but you are highly discouraged to ignore it.

There is a tool to automatically check whether your code is following PEP 8 recommendations. Its former name is pep8, but it was renamed to pycodestyle at the request of Guido. Now you should use pycodestyle installed with pip intall pycodestyle only.

You can check whether pycodestyle is happy with your project like this:

$ pycodestyle . -qq --statistics
1 E302 expected 2 blank lines, found 1
1 E305 expected 2 blank lines after class
or function definition, found 1
20 E501 line too long (83 > 79 characters)
Sometimes software starts to behave weirdly in the production. Instead of simply restarting it, you probably wish to understand what exactly is happening so you can fix it later.

The obvious way to do it is to analyze what a program does and try to guess which piece of code is executing. Surely proper logging makes that task easier, but your application's logs may be not verbose enough, either by design or because the high level of logging is set in the configuration.

In that case, strace may be quite beneficial. It's a Unix utility which traces system calls for you. You can run it in advance — strace python script.py — but usually connecting to the already executing application is more suitable: strace -p PID.

$ cat test.py
with open('/tmp/test', 'w') as f:
f.write('test')
$ strace python test.py 2>&1 | grep open | tail -n 1
open("/tmp/test", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3

Each line in the trace contains the system call name, followed by its arguments in parentheses and its return value. Since some arguments are used for returning a result from the system call, not for passing data into it, line outputting may be interrupted until system call is finished.

In this example, the output is interrupted until someone writes to STDIN:

$ strace python -c 'input()'
read(0,
To be used as a dictionary key, an object should be hashable. Hashable objects support the __hash__ method that returns an integer value. To get a hash of the value, the hash built-in function is used.

Built-in types that are not mutable are hashable by default. All custom objects are also hashable, but there is a catch. If you define __eq__ method for your custom type, then you should define such __hash__ that hash(a) == hash(b) for every a and b that are equal. Violating this rule may result in dictionary malfunctioning:

class A:
def __init__(self, x):
self.x = x

def __hash__(self):
return random.randrange(10000)

def __eq__(self, other):
return self.x == other.x


In : d = {}
In : d[A(2)] = 2
In : d.get(A(2), 0)
Out: 0


Mind that though once you define __eq__ in the class, the default __hash__ method is removed since the default implementation is no longer suitable (with it all values are unequal).
A regular language is a formal language that can be recognized by a finite-state machine (FSM). Simply put, that means that to process text character by character, you only need to remember the current state, and the number of such states is finite.

The beautiful and simple example is a machine that checks whether an input is a simple number like -3, 2.2 or 001. The following diagram is an FSM diagram. Double circles mean accept states, they identify where the machine can stop.
The machine starts at ①, possibly matches minus sign, then processes as many digits as required at ③. After that, it may match a dot (③ → ④) which must be followed by one digit (④ → ⑤), but maybe more (⑤ → ⑤).

The classic example of a non-regular language is a family of strings like:

a-b
aaa-bbb
aaaaa-bbbbb


Formally, we need a line that contains N occurrences of a, then -, then N occurrences of b. N is an integer greater than zero. You can't do it with a finite machine, because you have to remember the number of a chars you encountered which leads you to the infinite number of states.

Regular expressions can match only regular languages. Remember to check whether the line you are trying to process can be handled by FSM at all. JSON, XML or even mere arithmetic expression with nested brackets cannot be.

The funny thing is, a lot of modern regular expression engines are not regular. For example, Python regex module supports recursion (which will help with that aaa-bbb problem).