Spark in me
2.2K subscribers
829 photos
48 videos
116 files
2.68K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
Internet digest

(0) Ben Evans - https://goo.gl/72b4pm

ML / industry
(1) FB to design its own FPGAs / ML chips - https://goo.gl/nh2Wph ?
(2) Google willing to replicate iMessage, again https://goo.gl/MtwCet
-- No mention of Telegram - but all Google's attempts are aeons behind Telegram
-- Google willing to go the hardest route - a standard enforced on the carrier + replace the messenging app
-- All of the previous attempts kind of did not work
(3) Facebook media backlash - https://goo.gl/rKd9E5
(4) Who makes LIDARs - https://goo.gl/uD5qc5
(5) Tesla over automation - https://goo.gl/1WBMj3

Telecom
(1) British Telecom to switch to VOIP - https://goo.gl/MCbZgq
(2) Flickr purchased - https://goo.gl/AMcE6f

#internet
On the surface looks like an interesting competition

Well, I said that about Power Laws - but then it turned out otherwise.
So far I can see CV, NLP and tables in one mix.

https://www.kaggle.com/c/avito-demand-prediction/

#data_science
Forwarded from Админим с Буквой (bykva)
Релиз дистрибутива Ubuntu 18.04 LTS

Состоялся релиз дистрибутива Ubuntu 18.04 "Bionic Beaver", который отнесён к категории выпусков с длительным сроком поддержки (LTS), обновления для которых формируются в течение 5 лет. Установочные образы созданы для Ubuntu Desktop, Ubuntu Server, Ubuntu Cloud, Kubuntu, Ubuntu Budgie, Lubuntu, Ubuntu Studio, Ubuntu Kylin, Ubuntu MATE и Xubuntu.
Widen Jupyter editor to 100% wide screen

Just apply this CSS

#texteditor-container {
width: 95%
}

#data_science
Using Mendeley to read papers

Looks like when you migrate to a new PC it also can migrate your literature library.
Nice.

#data_science
Forwarded from Админим с Буквой (bykva)
Docker pull via proxy

# systemctl edit docker.service


add the following strings:

[Service]
Environment=ALL_PROXY=socks5://user:password@host:port


reload systemd && restart docker

# systemctl daemon-reload
# systemctl restart docker.service


#proxy #docker
Downgrading PyTorch from 0.4 to 0.3

Newest PyTorch has some issues with regards to multi-GPU operation.

If you want to install the previous version, the downgrade docs are a bit outdated, but you can simply:

conda install pytorch=0.3.0 cuda90 -c pytorch

#deep_learning
A small saga about OpenVPN

TLDR:
(0) Purchase a cheap VDS from a noname provider with decent bandwidth => install OpenVPN => forget about problems => share with friends and family;
(1) This guide just works https://goo.gl/K2xjby (do not be afraid of its length - it is just verbose);
(2) I tested it with DigitalOcean and hostus.us;

From a financial standpoint US$1-5 per month per 3-5 users without any 3rd party services seems to be a bargain.

Hosting options:
(0) With DO it just works (just follow the guide step by step). But the cheapest VDS (which is overkill for this) costs US$5 per month. If you use my link - https://m.do.co/c/6f8e77dddc23 - you will get US$10 for free;
(1) Tested it with hostus.us. Follow my link, if you would like to support us - https://my.hostus.us/aff.php?aff=2169. A decent VPS can be found in Amsterdam for as cheap as US$5-8 for 3 months. Be careful - their UX is a bit misleading at times - (!!!) the country choice does not seem to flow from one menu to another (!!!). This seems to be more than enough - https://goo.gl/GyPZ6u;
(2) If you want to search yourself - go here - http://lowendstock.com/ - the best 2 options seem to be VirMach and hostus, but the former is sold out;

Host.us caveats:
(0) If you would like to follow the DO guide but use hostus, then for the cheapest options do not forget to enable this in the admin https://goo.gl/DRx3UX;
(1) VPS provisioning time there is 0-8 hours. In my case it was ~40 mins;
(2) I also faced this bug -https://goo.gl/BTqeTX;

What if I have a problem with ssh keys on windows?
(0) This will give you some basic info about managing Linux servers https://goo.gl/TgL61G;
(1) Here we explain how to use Putty and ssh keys on Windows https://goo.gl/xxvGBb (also just google it);

Why OpenVPN:
(0) Seems to be the most well-known open-source VPN software with easy accessible clients for all major platforms;
(1) I know people who used it;

Alternatives:
(0) https://github.com/trailofbits/algo - seems to be newer and cooler, but I do not know living people who reported actually using it;

#linux
#digital_freedom
Pinned post

What is this channel about?
(0) This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python
(1) Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com
(2) No BS and ads

Donations
(0) Become a patreon 🤟 - https://www.patreon.com/bePatron?u=6159641
(1) Buy me a coffee 🤟 https://buymeacoff.ee/8oneCIN

Give us a rating:
(0) https://telegram.me/tchannelsbot?start=snakers4

Our chat
(0) https://t.me/joinchat/Bv9tjkH9JHaAEL-FVtw9Tw

More links
(0) Our website http://spark-in.me
(1) Our chat https://goo.gl/IS6Kzz
(2) DS courses review
http://goo.gl/5VGU5A
https://spark-in.me/post/learn-data-science
(3) GAN papers review
https://spark-in.me/post/gan-paper-review
(4) SpaceNet Challenge
https://spark-in.me/post/spacenet-three-challenge
(5) DS Bowl 2018
https://spark-in.me/post/playing-with-dwt-and-ds-bowl-2018
(6) Data Science tag on the website
https://spark-in.me/tag/data-science
Spark in me pinned «Pinned post What is this channel about? (0) This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python (1) Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram…»
Showing more images in Tensorboard

TB is super cool (also in together with script https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514), but it shows ~10 images in its image preview.

This can be fixed.
(0) Find your TB folder
import tensorboard
tensorboard.__file__
In my case it shows '/opt/conda/lib/python3.6/site-packages/tensorboard/__init__.py'
(1)
cd there
open backend/application.py
(2)
Change this line
image_metadata.PLUGIN_NAME: 400,
(3)
Profit - now it shows ~400 images on each view tab

#deep_learning
Exploring GANs and unsupervised learning

Here are my findings from my hobby project about using GANs and unsupervised methods to build some decent semantic search on a large dataset of images without annotation:
(0) https://spark-in.me/post/unsupervised-learning-limits

Lots of cool images.

TLDR
(0) Features from pre-trained Imagenet encoder => PCA => Umap => HDBSCAN work really well for image clusterization;
(1) Any siamese network / hard negative mining inspired methods just did not work - the annotation data is too coarse;
(2) GANs kind of work, but I could not achieve the boasted photo-realistic levels;

#deep_learning
2018 DS/ML digest 9

Market / libraries
(0) Tensorflow + Swift - wtf - https://goo.gl/FDvLM4
(1) Geektimes / Habrhabr.ru going international - https://goo.gl/dbGNwD
(2) A service for renting GPUs ... from people
- Reddit https://goo.gl/HxQ54x
- Link https://vectordash.com/hosting/
- Looks LXC based (afaik - the only user friendly alternative to Docker)
- Cool in theory, no idea how secure this is - we can assume as secure as providing a docker container to stranger
- They did not reply me in a week
(3) A friend sent me a new list of ... new yet another PyTorch NLP libraries
- https://goo.gl/kasRfZ, https://goo.gl/XXnbJy (AllenNLP is the biggest library like this)
- I believe that such libraries are more or less useless for real tasks, but cool to know they exist
(4) New SpaceNet 4? https://goo.gl/CsSS6P
(5) A new super cool competition on Kaggle about particle physics? https://www.kaggle.com/c/trackml-particle-identification

Tutorials / basics
(0) Bias vs. Variance (RU) https://goo.gl/4Y7tH7
(1) Yet another magic Jupyter guideline collection - https://goo.gl/AFWMuq

Real world ML applications
(0) Resnet + object detection (RU) - people wo helmets 90% accuracy - https://goo.gl/7xpQnE
(1) Fast.ai about using embeddings with Tabular data - http://www.fast.ai/2018/04/29/categorical-embeddings/
Very similar to our approach on electricity
I personally do not recommend using their library by all means
(2) Comparing Google TPU vs. V100 with ResNet50 - https://goo.gl/s6dhsy
- speed - https://goo.gl/Pww2sm
- pricing - https://goo.gl/Rtkp8Q
- but ... buying GPUs is much cheaper
(3) Other blog posts about embeddings + tabular data
- Sales prediction http://blog.kaggle.com/2016/01/22/rossmann-store-sales-winners-interview-3rd-place-cheng-gui/
- Taxi drive prediction http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-1st-place-team-%F0%9F%9A%95/
MLP + classification + embeddings - https://goo.gl/AMNGNG / https://arxiv.org/pdf/1508.00021.pdf
(4) Albu's solution to SpaceNet - augmentations https://github.com/SpaceNetChallenge/RoadDetector/tree/master/albu-solution/src/augmentations
CNN overview
Neural network part:

Split data to 4 folds randomly but the same number of each city tiles in every fold
Use resnet34 as encoder and unet-like decoder (conv-relu-upsample-conv-relu) with skip connection from every layer of network. Loss function: 0.8*binary_cross_entropy + 0.2*(1 – dice_coeff). Optimizer – Adam with default params.
Train on image crops 512*512 with batch size 11 for 30 epoch (8 times more images in one epoch)
Train 20 epochs with lr 1e-4
Train 5 epochs with lr 2e-5
Train 5 epochs with lr 4e-6
Predict on full image with padding 22 on borders (1344*1344).
Merge folds by mean


Jobs / job market
(0) Developers by country by scraping GitHub - https://goo.gl/n8gnLi
- developers count vs. GDP http://prntscr.com/j9v80e R^2 = 84%
- developers count vs. population - R^2 = 50%

Visualization
(0) Interactive tool for visualizing convolutions - https://ezyang.github.io/convolution-visualizer/

Datasets
(0) Open Images v4 outsourced
- https://research.googleblog.com/2018/04/announcing-open-images-v4-and-eccv-2018.html
- the dataset itself https://storage.googleapis.com/openimages/web/download.html
- categories https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy_visualizer/circle.html



#data_science
#deep_learning
#digest
Add comment button below major posts?
anonymous poll

Yes, definitely! – 26
👍👍👍👍👍👍👍 53%

Meh... – 14
👍👍👍👍 29%

No, why? – 7
👍👍 14%

Your option (PM / chat) – 2
👍 4%

👥 49 people voted so far.
The current state of ML

https://goo.gl/rzKUiQ
(1) Do not call it AI
(2) Distinguish ML from Intelligent Infrastructure and Intelligence Augmentation
(3) Human-imitative AI is not tractable now
(4) Developments which are now being called "AI" arose mostly in the engineering fields associated with low-level pattern recognition and movement control

#deep_learning
A decent explanation about decorators in Python

http://book.pythontips.com/en/latest/decorators.html

#python