Spark in me
2.26K subscribers
752 photos
48 videos
114 files
2.64K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
Novel topic modelling techniques
- http://bigartm.readthedocs.io/en/stable/intro.html
- https://github.com/bigartm/bigartm

Looks interesting.
If anyone knows about this - please ping in PM.

#nlp
Handy opencv snippet to transform grayscale mask with labels (i.e. 0 for background, 1,2,3 etc) into a colourful map

y_pred_coloured = cv2.applyColorMap((y_pred / y_pred.max() * 255).astype('uint8'), cv2.COLORMAP_JET)

#deep_learning
For some reason when you use PyTorch multi-thread data loaders, it stalls if you use OpenCV and not set
cv2.setNumThreads(0)

Nice to know this.

#deep_learning
The trend for smaller / inadequate prizes and weird datasets continues:

- This years' DS Bowl on Kaggle features a small public train dataset (650 images) vs. much larger delayed validation dataset (3000 images). Cheap annotation, anyone? =) Also remarkably, for a somewhat difficult task (instance segmentation) - the prize is much lower than the last year;

- New autonomous driving contest on Kaggle, as well as other CVPR competitions - features extremely large datasets, extremely low prizes (US$1-2k), and no travel costs to CVPR covered. Ofc you can win and fly there, but this will not even cover your GPU costs;

- The recent xnView challenge I really wanted to participate - requires a US Tax number to be eligible for prizes. Of course they do not know about double taxation treaties and WEP-8 tax exemptions;

Alas =(

#deep_learning
Internet digest
- Ben Evans - https://mailchi.mp/ben-evans/benedicts-newsletter-no-450525?e=b7fff6bc1c
- About autonomous cars - https://www.ben-evans.com/benedictevans/2018/3/26/steps-to-autonomy - autonomy will vary based on the route / conditions / situation / use case
- FB delays its speaker - https://www.bloomberg.com/technology
- Foxconn buys Belking https://goo.gl/Xf6g9A
- Amazon music > 10m subs - https://goo.gl/C8Qhdm
- The Economist about ML in business - https://goo.gl/fTCHE9
- Apple to make its own chips - https://goo.gl/ZkkEVc

#internet
#digest
As you may know (for newer people on the channel), sometimes we publish small articles on the website.

This time it covers a recent Power Laws challenge on DrivenData, which at first seemed legit and cool, but in the end turned back into a pumpkin.

Here is an article:
- https://spark-in.me/post/playing-with-electricity

#data_science
#time_series
#deep_learning
For new (!) people on the channel:

- This channel is a practicioner's channel on the following topics: internet, data science, math, deep learning, philosophy
- Focus is on data science
- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com
- No bs and ads
- Once a week or once several weeks we publish some ML related digests

Give us a rating:
- https://telegram.me/tchannelsbot?start=snakers4

Donations
- Buy me a coffee https://buymeacoff.ee/8oneCIN
- Direct donations - https://goo.gl/kvsovi - 5011673505 (paste this agreement number)
- Yandex - https://goo.gl/zveIOr

Other channel aliases (in case you are afraid Telegram gets blocked in Russia)
- Twitter feed - https://twitter.com/AlexanderVeysov
- Web feed http://snakers41.spark-in.me

Our website
- http://spark-in.me
Our chat
- https://goo.gl/IS6Kzz
DS courses review
- http://goo.gl/5VGU5A
- https://spark-in.me/post/learn-data-science
Our best article so far:
- https://spark-in.me/post/spacenet-three-challenge
A bit more on semantic segmentation, now 3D

{V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
}

--> Link / authors http://arxiv.org/abs/1606.04797, Fausto Milletari / Nassir Navab / Seyed-Ahmad Ahmadi
--> Essence:
(0) Essentially applies UNet to 3D with a custom DICE based loss
(1) Architecture - https://goo.gl/Yn2BGb - basically UNet with 3D convolutions. Upsampling / downsampling - https://goo.gl/VtXrXy
(2) PReLu (no ablation test)
(3) Receptive fields of layers - https://goo.gl/FGwDCF
(4) 3D DICE loss - https://goo.gl/SqrK93 (wo BCE?)
--> The paper does not use all the juice possible - hacky transfer learning (obvious idea - just stacking Imagenet filters), CLR, LinkNet architectures, etc
--> Looks like a good baseline / reference

{An application of cascaded 3D fully convolutional networks for medical image segmentation
}

--> http://arxiv.org/abs/1803.05431, a group of Japanese researchers
--> Essence:
(0) 2 stage 3D UNet, ablation test against 2D FCNs
(1) Loss - 3D cross-entropy
(2) Transfer learning - it works for other datasets, give a mild boost (1-3 %)
(3) 80-90% DICE, varies by organ
(4) weights downloadable https://github.com/holgerroth/3Dunet_abdomen_cascade (Caffe...)
--> Essentially a 2 stage process is dictated by memory considerations:
(0) Pipeline https://goo.gl/wZwF3X

In the long run transfer learning may rule, but here legal limitations may slow down this process.

#deep_learning
#medical_imaging
Yolov3 - best paper.
But not in terms of scientific contribution, but rebuttal of DS community BS.
Very funny read.
- https://pjreddie.com/media/files/papers/YOLOv3.pdf

If you want a proper comparison of object detection algorithms - use this paper https://arxiv.org/abs/1611.10012

Looks like SSD and YOLO are reasonably good and fast, and RCNN can be properly tuned to be 3-5x slower (not 100x) and more accurate.

#data_science
#computer_vision
DS Bowl 2018 stage 2 data was released.
It has completely different distribution from stage 1 data.
How do you like them, apples?

Looks like Kaggle admins really have no idea about dataset curation, or all of this is mean to misguide manual annotators.

Anyway - looks like random bs.

#data_science
#deep_learning
So I briefly dug into running a containerized GPU accelerated GUI app (I want to be able to run some apps I do not really want on my host).

Docker kind of works for this purpose, but I found working guides for nvidia-docker, not nvidia-docker2.

Looks like if you want to run a Linux host with a Linux container - then LXD is a good option. It is high level and seems to have an easy API to use. I will report if that will work for me.

- Guide https://blog.simos.info/how-to-run-graphics-accelerated-gui-apps-in-lxd-containers-on-your-ubuntu-desktop/
- LXD vs Docker https://unix.stackexchange.com/questions/254956/what-is-the-difference-between-docker-lxd-and-lxc/254982
- Extensive LXD tutorial https://stgraber.org/2016/03/11/lxd-2-0-introduction-to-lxd-112/

#linux
So, usually I try to stay away from such controversial topics, but I have to address and elephant in the room. You all know, that originally I am from Russia and I have quite liberal world views.

Seeing that many people start to ride the hype and advertising some expensive "solutions", this is why today I decided to do a post about creating your own SOCK5 proxy server via a droplet on Digital Ocean:
- Post - https://spark-in.me/post/vds-socks5-proxy-server - note that unlike my other posts - this one is a step-by-step explanation;
- It explains how to create your own SOCK5 proxy-server using Ubuntu and Digital Ocean with dante;
- The cheapest digital ocean droplet is US$5 per month (you can find such droplets for as low as US$2-3 with inferior service);
- If you use my referral link - you will get US$10 for free - https://m.do.co/c/6f8e77dddc23
- Also you can create credentials for your friends and family;

Also note, that foreseeing this s**t - I created aliases for our telegram channel
- In twitter https://twitter.com/AlexanderVeysov
- In the web http://snakers41.spark-in.me
- RSS http://snakers41.spark-in.me/rss/

UX is not so great, but it works more or less. Please tell me what you think. I know that the majority of readers are Russians and we have quite a negative mentality, but this is one of the cases when you have to share this message and my post as much as possible. We will be doing an adapted post on habrhabr.ru as well.

And I know that there are free proxy lists. But if you create a simple service today - tomorrow you can add layers to it (see some hints in the article) and not rely on other people.

If you like what I shared - please support our channel (see a pinned message)
- https://buymeacoff.ee/8oneCIN

#internet
#digital_freedom
What proxy will you use?

Luckily, it does not apply to me – 18
👍👍👍👍👍👍👍 26%

Other VPN-like solutions – 17
👍👍👍👍👍👍👍 25%

I will try your guide – 15
👍👍👍👍👍👍 22%

A free / public / provided by special channels – 15
👍👍👍👍👍👍 22%

I will stop using Telegram – 2
👍 3%

Let's wait, maybe they will reconsider – 1
▫️ 1%

👥 68 people voted so far.
So, I just found out that Firefox rendering engine was rewritten, now it boasts the fastest speeds and support for ... socks5 proxies, both on mobile and desktop.
- https://github.com/FelisCatus/SwitchyOmega/
- https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/

Also projects like orbot+orfox help in more extreme cases.

#internet