В новом конкурсе нашел на Kaggle отличный "мануал", про то как работать c bson (архив базы Монго).
Очень рекомендую к прочтению
- https://www.kaggle.com/humananalog/keras-generator-for-reading-directly-from-bson/notebook
#data_science
#python
Очень рекомендую к прочтению
- https://www.kaggle.com/humananalog/keras-generator-for-reading-directly-from-bson/notebook
#data_science
#python
Kaggle
Keras generator for reading directly from BSON
Using data from Cdiscount’s Image Classification Challenge
Пара отличных тредов про то, как сделать ваш генератор на питоне thread-safe, то есть минимальными усилиями использовать параметр workers > 1 у fit_generator в Keras. Полезно, если ваша модель сильно CPU-bound.
- https://github.com/fchollet/keras/issues/1638
- https://stackoverflow.com/questions/41194726/python-generator-thread-safety-using-keras
- http://anandology.com/blog/using-iterators-and-generators/
#data_science
#python
- https://github.com/fchollet/keras/issues/1638
- https://stackoverflow.com/questions/41194726/python-generator-thread-safety-using-keras
- http://anandology.com/blog/using-iterators-and-generators/
#data_science
#python
GitHub
Proper way of making a data generator which can handle multiple workers · Issue #1638 · fchollet/keras
I am having difficulty in writing a data generator which can work with multiple workers. My data generator works fine with one worker, but with > 1 workers, it gives me the following error:
Unbound...
Unbound...
Отличная паста чтобы проверять хеши файлов.
#python
#data_science
# make sure you downloaded the files correctly
import hashlib
import os.path as path
def sha256(fname):
hash_sha256 = hashlib.sha256()
with open(fname, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
hash_sha256.update(chunk)
return hash_sha256.hexdigest()
filenames = ['', '', '', '', ']
hashes = ['', '', '', '', '']
data_root = path.join('data/') # make sure you set up this path correctly
# this may take a few minutes
for filename, hash_ in zip(filenames, hashes):
computed_hash = sha256(path.join(data_root, filename))
if computed_hash == hash_:
print('{}: OK'.format(filename))
else:
print('{}: fail'.format(filename))
print('expected: {}'.format(hash_))
print('computed: {}'.format(computed_hash))
#python
#data_science
Оказывается уже есть готовый squeeze-net для keras с весами =)
Неплохо
- https://github.com/wohlert/keras-squeezenet
#python
#neural_nets
Неплохо
- https://github.com/wohlert/keras-squeezenet
#python
#neural_nets
GitHub
wohlert/keras-squeezenet
Pretrained Squeezenet 1.1 implementation in Keras. Contribute to wohlert/keras-squeezenet development by creating an account on GitHub.
У меня встал вопрос расширения класса Pytorch, который мне понравился. Если бы все было банально - я бы просто написал функцию и вызвал бы ее и передал ей объект класса, но но одна проблема - некоторые утилиты в классе вызывают локальные утилиты, которые не совсем понятно как модифицировать при импорте.
Вдохновившись примером итератора с bson (было выше - https://goo.gl/xvZErF), как оказалось расширение классов делается довольно просто:
- Раз https://goo.gl/JZpfiV
- Два https://goo.gl/D3KkLm
- Ну и старая наркомания для тех кому внутрянка питона интересна
-- https://www.artima.com/weblogs/viewpost.jsp?thread=237121
-- https://www.artima.com/weblogs/viewpost.jsp?thread=236278
-- http://www.artima.com/weblogs/viewpost.jsp?thread=236275
#python
#data_science
Вдохновившись примером итератора с bson (было выше - https://goo.gl/xvZErF), как оказалось расширение классов делается довольно просто:
- Раз https://goo.gl/JZpfiV
- Два https://goo.gl/D3KkLm
- Ну и старая наркомания для тех кому внутрянка питона интересна
-- https://www.artima.com/weblogs/viewpost.jsp?thread=237121
-- https://www.artima.com/weblogs/viewpost.jsp?thread=236278
-- http://www.artima.com/weblogs/viewpost.jsp?thread=236275
#python
#data_science
Kaggle
Keras generator for reading directly from BSON
Explore and run machine learning code with Kaggle Notebooks | Using data from Cdiscount’s Image Classification Challenge
Из серии извращений - как загрузить k-means объект из второго питона в третий, причем с ростом версии sklearn?
Очевидное решение не работает по причине смены версии sklearn
- https://goo.gl/s8V5zf
А такое работает
#python
Очевидное решение не работает по причине смены версии sklearn
- https://goo.gl/s8V5zf
А такое работает
# saving - python2
import numpy as np
np.savetxt('centroids.txt', centroids, delimiter=',')
# loading - python3
from sklearn.cluster import KMeans
import numpy as np
centroids = np.loadtxt('centroids.txt', delimiter=',')
kmeans = KMeans(init = centroids)
#python
Stackoverflow
Unpickling a python 2 object with python 3
I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.
I've been running 2to3 on a large amount of company legacy code to get it up to date.
Having don...
I've been running 2to3 on a large amount of company legacy code to get it up to date.
Having don...
Великолепная либа на питоне для работы с видео
- https://github.com/Zulko/moviepy
Она построена сверху над image.io и по сути позволяет работать с видео в 1 строчку (вместо просто итерации или ручного использования ffmpeg). Как хорошо что на питоне есть такие инструменты!
#python
#video
- https://github.com/Zulko/moviepy
Она построена сверху над image.io и по сути позволяет работать с видео в 1 строчку (вместо просто итерации или ручного использования ffmpeg). Как хорошо что на питоне есть такие инструменты!
#python
#video
GitHub
GitHub - Zulko/moviepy: Video editing with Python
Video editing with Python. Contribute to Zulko/moviepy development by creating an account on GitHub.
На новой работе увидел, что люди тренируют свои модели на 2 питоне (ЩИТО?), на tensorflow (WTF???) и грузят данные в 1 поток (2017 год на дворе!).
По этой причине сделал коллегам такую немного трололо презентацию. Может и вам понравится
- https://goo.gl/ne9RH4
Все простое - очень просто, главное просто знать где искать)
#data_science
#deep_learning
#python
По этой причине сделал коллегам такую немного трололо презентацию. Может и вам понравится
- https://goo.gl/ne9RH4
Все простое - очень просто, главное просто знать где искать)
#data_science
#deep_learning
#python
Just found a book on practical Python programming patterns
- http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonForProgrammers.html
Looks good
#python
- http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonForProgrammers.html
Looks good
#python
Amazing article about the most popular warning in Pandas
- https://www.dataquest.io/blog/settingwithcopywarning/
#data_science
- https://www.dataquest.io/blog/settingwithcopywarning/
#data_science
Dataquest
SettingwithCopyWarning: How to Fix This Warning in Pandas – Dataquest
SettingWithCopyWarning: Everything you need to know about the most common (and most misunderstood) warning in pandas and how to fix it!
Useful Python abstractions / sugar / patterns
I already shared a book about patterns, which contains mostly high level / more complicated patters. But for writing ML code sometimes simple imperative function programming style is ok.
So - I will be posting about simple and really powerful python tips I am learning now.
This time I found out about
Map
#python
#data_science
I already shared a book about patterns, which contains mostly high level / more complicated patters. But for writing ML code sometimes simple imperative function programming style is ok.
So - I will be posting about simple and really powerful python tips I am learning now.
This time I found out about
map
and filter
, which are super useful for data preprocessing:Map
items = [1, 2, 3, 4, 5]Filter
squared = list(map(lambda x: x**2, items))
number_list = range(-5, 5)Also found this book - http://book.pythontips.com/en/latest/map_filter.html
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)
#python
#data_science
Readable list comprehensions in Python
My list and dictionary comprehensions usually look like s**t
https://gist.github.com/IaroslavR/7dcb54830242a22de1869f6fd05a8d7e
#python
My list and dictionary comprehensions usually look like s**t
https://gist.github.com/IaroslavR/7dcb54830242a22de1869f6fd05a8d7e
#python
Gist
Examples of readable comprehension formatting from SO
Examples of readable comprehension formatting from SO - examples.py
A decent explanation about decorators in Python
http://book.pythontips.com/en/latest/decorators.html
#python
http://book.pythontips.com/en/latest/decorators.html
#python
Yet another python tricks book
https://dbader.org/
https://www.getdrip.com/deliveries/xugaymstfzmizbyposdk?__s=ejdgfo9tsdhpgcrcscs3
https://vk.com/doc7608079_466151365
#python
https://dbader.org/
https://www.getdrip.com/deliveries/xugaymstfzmizbyposdk?__s=ejdgfo9tsdhpgcrcscs3
https://vk.com/doc7608079_466151365
#python
dbader.org
Python Training by Dan Bader – dbader.org
Dan Bader helps Python developers become more awesome. His tutorials, videos, and trainings have reached over half a million developers around the world.
Useful Python / PyTorch bits
dot.notation access to dictionary attributes
PyTorch embedding layer - ignore padding
#python
#pytorch
dot.notation access to dictionary attributes
class dotdict(dict):
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
PyTorch embedding layer - ignore padding
nn.Embedding
has a padding_idx
attribute not to update the padding token embedding.#python
#pytorch
Monkey patching a PyTorch model
Well, ideally you should not do this.
But sometimes you just need to quickly test something and amend your model on the fly.
This helps:
The above code essentially does the same as:
.path.to.some.block = some_other_block
#python
#pytorch
#deep_learning
#oop
Well, ideally you should not do this.
But sometimes you just need to quickly test something and amend your model on the fly.
This helps:
import torch
import functools
def rsetattr(obj, attr, val):
pre, _, post = attr.rpartition('.')
return setattr(rgetattr(obj, pre) if pre else obj, post, val)
def rgetattr(obj, attr, *args):
def _getattr(obj, attr):
return getattr(obj, attr, *args)
return functools.reduce(_getattr, [obj] + attr.split('.'))
for module in model.named_modules():
old_module_path = module[0]
old_module_object = module[1]
# replace an old object with the new one
# copy some settings and its state
if isinstance(old_module_object,torch.nn.SomeClass):
new_module = SomeOtherClass(old_module_object.some_settings,
old_module_object.some_other_settings)
new_module.load_state_dict(module_object.state_dict())
rsetattr(model,old_module_path,new_module)
The above code essentially does the same as:
model
.path.to.some.block = some_other_block
`
#python
#pytorch
#deep_learning
#oop
A Great Start For Your Сustom Python Dockerfiles
I like to popularize really great open-source stuff. I have shared my ML Dockerfiles several time. Now I base my PyTorch workflows on ... surprise-surprise PyTorch's official images with Apex. But (when I looked) for some reason it was difficult to find the original dockerfiles themselves, there were only images (maybe I did not look well enough).
But what if you need a simpler / different / lighter python workflow without PyTorch / GPUs? Miniconda is an obvious choice. Yeah, and now there is
https://hub.docker.com/r/continuumio/miniconda3/dockerfile
Enjoy. A great way to start your python project and / or journey.
#python
I like to popularize really great open-source stuff. I have shared my ML Dockerfiles several time. Now I base my PyTorch workflows on ... surprise-surprise PyTorch's official images with Apex. But (when I looked) for some reason it was difficult to find the original dockerfiles themselves, there were only images (maybe I did not look well enough).
But what if you need a simpler / different / lighter python workflow without PyTorch / GPUs? Miniconda is an obvious choice. Yeah, and now there is
miniconda
as a docker image (pre-built) and with dockerfile! What is also remarkable, my dockerfile, which I inherited from Fchollet in 2017, starts very similarly to this miniconda dockerfile.https://hub.docker.com/r/continuumio/miniconda3/dockerfile
Enjoy. A great way to start your python project and / or journey.
#python
Pandas ... Parallel Wrappers Strike Back
Someone made this into a library with 800+ starts - https://github.com/nalepae/pandarallel - which is cool AF if it will be maintained!
I wrote similar wrappers 2 years ago and I thought no one cared about this, because when I shared it, no one paid any attention. But in our day-to-day work they are still a workhorse. Simple, naïve, concise yet efficient.
I believe this library only has 2 major drawbacks:
- Spawning processes takes a second or so, but this is just python
- It does not support shared memory (?), the only thing that I arguably lack in such a tool
12 - 24, or even 64 core AMD processors are cheap nowadays, you know.
#python
Someone made this into a library with 800+ starts - https://github.com/nalepae/pandarallel - which is cool AF if it will be maintained!
I wrote similar wrappers 2 years ago and I thought no one cared about this, because when I shared it, no one paid any attention. But in our day-to-day work they are still a workhorse. Simple, naïve, concise yet efficient.
I believe this library only has 2 major drawbacks:
- Spawning processes takes a second or so, but this is just python
- It does not support shared memory (?), the only thing that I arguably lack in such a tool
12 - 24, or even 64 core AMD processors are cheap nowadays, you know.
#python
GitHub
GitHub - nalepae/pandarallel: A simple and efficient tool to parallelize Pandas operations on all available CPUs
A simple and efficient tool to parallelize Pandas operations on all available CPUs - nalepae/pandarallel
The Uncharted Waters of Having an Encrypted gRPC API Endpoint in Python without Hassle
Typically, you do not have to think twice about SSL in your standard HTTP web apps. Usually it is handled by your reverse proxy out-of-the-box, i.e.
The end user (typically a person) does not care about storing and obtaining the certificates (API users also do not really care about the encryption of their data for some reason preferring to rely on API admins). Moreover a plethora of tools packaged within reverse proxies make this just a matter of plumbing and trial and error.
But what if you use a gRPC endpoint? The typical answer is that it will most likely run behind some corporate firewall and will not be exposed to the Internet. But what if it will?
The official docs are not very clear on this topic for this very reason, and there are a few comprehensive (yet kind of old) guides - python, go.
But wait, 95% of these guides just follow the happy path from the docs, i.e. manually creating and managing all of the certificates (imagine the nightmare) or describe in details how to automate LE certificates in Go.
But they fail on several fronts:
- They fail to mention that you should add some form of token auth even before starting your gRPC session (and the official python example is way too complicated);
- They usually imply that you have to manually (or automatically) create client certificates and distribute them, but usually do not explain what happens if the client leaves this field blank. Turns out each release of gRPC ships with a list of most prominent CAs, which are loaded by default;
- They also often assume a fully automated scenario, when you have enough infrastructure to actually invest into integrating dedicating clients like acme-tiny in your app (a full list can be found here);
The above Go guide also says this (minica):
But maybe there is a simpler approach, more akin to how people visit websites? Moreover, public corporate endpoints (i.e. Sber) seemingly do not really bother with certificates (but the client is secure judging by logs). Maybe you can combine ease of use and security?
Turns out yes, but this is not immediately evident from docs and guides:
- You should leave the client certificate blank. In this case the client will look out for keys from widely accepted CAs packaged with the current version of the gRPC package;
- You should obtain your certificates from a trusted CA like LetsEncrypt (use certbot or any other client in case you need automation);
- If you are not using the latest gRPC client, chances are you will encounter a similar gotcha when testing your production certificate with cryptic errors. Explanation - CAs also rotate their keys. Solution - just update your package;
Minimalistic example:
Client side:
Typically, you do not have to think twice about SSL in your standard HTTP web apps. Usually it is handled by your reverse proxy out-of-the-box, i.e.
caddy
/ traefik
/ nginx-proxy
, you name it. In some cases you just use certbot
and that is it.The end user (typically a person) does not care about storing and obtaining the certificates (API users also do not really care about the encryption of their data for some reason preferring to rely on API admins). Moreover a plethora of tools packaged within reverse proxies make this just a matter of plumbing and trial and error.
But what if you use a gRPC endpoint? The typical answer is that it will most likely run behind some corporate firewall and will not be exposed to the Internet. But what if it will?
The official docs are not very clear on this topic for this very reason, and there are a few comprehensive (yet kind of old) guides - python, go.
But wait, 95% of these guides just follow the happy path from the docs, i.e. manually creating and managing all of the certificates (imagine the nightmare) or describe in details how to automate LE certificates in Go.
But they fail on several fronts:
- They fail to mention that you should add some form of token auth even before starting your gRPC session (and the official python example is way too complicated);
- They usually imply that you have to manually (or automatically) create client certificates and distribute them, but usually do not explain what happens if the client leaves this field blank. Turns out each release of gRPC ships with a list of most prominent CAs, which are loaded by default;
- They also often assume a fully automated scenario, when you have enough infrastructure to actually invest into integrating dedicating clients like acme-tiny in your app (a full list can be found here);
The above Go guide also says this (minica):
Is important to emphasize this example is not meant to be replicated for internal/private services. In talking to Jacob Hoffman-Andrews from Let’s Encrypt, he mentioned: In general, I recommend that people don’t use Let’s Encrypt certificates for gRPC or other internal RPC services. In my opinion, it’s both easier and safer to generate a single-purpose internal CA using something like minica and generate both server and client certificates with it. That way you don’t have to open up your RPC servers to the outside internet, plus you limit the scope of trust to just what’s needed for your internal RPCs, plus you can have a much longer certificate lifetime, plus you can get revocation that works.
But maybe there is a simpler approach, more akin to how people visit websites? Moreover, public corporate endpoints (i.e. Sber) seemingly do not really bother with certificates (but the client is secure judging by logs). Maybe you can combine ease of use and security?
Turns out yes, but this is not immediately evident from docs and guides:
- You should leave the client certificate blank. In this case the client will look out for keys from widely accepted CAs packaged with the current version of the gRPC package;
- You should obtain your certificates from a trusted CA like LetsEncrypt (use certbot or any other client in case you need automation);
- If you are not using the latest gRPC client, chances are you will encounter a similar gotcha when testing your production certificate with cryptic errors. Explanation - CAs also rotate their keys. Solution - just update your package;
Minimalistic example:
Client side:
ssl_creds = grpc.ssl_channel_credentials()
channel = grpc.secure_channel(bind_address, ssl_creds)
Server side:with open('privkey.pem', 'rb') as f:
private_key = f.read()
with open('fullchain.pem', 'rb') as f:
certificate_chain = f.read()
server_creds = grpc.ssl_server_credentials(
((private_key, certificate_chain,),))
server.add_secure_port(bind_address, server_creds)
When this works, do not forget some sort of additional token-based auth.gRPC
Authentication
An overview of gRPC authentication, including built-in auth mechanisms, and how to plug in your own authentication systems.