Spark in me
2.31K subscribers
670 photos
43 videos
114 files
2.59K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
So a couple of things - https://goo.gl/CBk7Um
- Movidius USB stick is enough to launch real-time object detection which is interesting to know
- It has shitty driver and library support (Caffe was mentioned)
- Installing everything is FAR from trivial (no idea why virtual box was used, but whatever)
- This guide uses Virtual box instead of Docker which says much

Also PyImageSearch is a sellout - he most likely has advertiser-friendly featured content in this post, looks like the Movidius stick topcoder event did not gain enough traction...

So - use Nvidia Jetsons for embedded solutions and do not bother with this. But it's good that new products emerge.

#deep_learning
Internet Digest
- Ben Evans - https://goo.gl/XsBqHN
- Flipboard (orly) launches ads - https://goo.gl/2muoiT
- Google sold 3.9 million Pixel phones in 2017 - https://goo.gl/6eUiXw
- Looks like smartbuses may be cool. App => bus route information => route gap => launch cosy bus with music and social features - https://goo.gl/TjKndB (I doubt this is a business though)
- About the importance of decentralization - next Internet will be a set of cryptonetwork protocols - https://goo.gl/c2aB4n

- How London is responding to technological innovationhttps://goo.gl/Dh6NgD
(1) Connected and autonomous vehicles (CAVs) or driverless (2) cars won't be on the road until the 2030s at least and could add to congestion
(3) Dockless cycle schemes need to be able to operate across London to be effective
(4) There is no control system in place for drones and droids
(5) TfL is monitoring technological developments but this needs to be embedded across the whole organisation

- Nice info graphics about city dwellers daily routes on pages 7-10 - https://goo.gl/vV71DR

#internet
#digest
So ofc I tried the new Jupyter lab.
And it is really cool that something so simple / cool / useful is completely free / no strings attached (yet). But I will not use it professionally.

Use my Dockerfile if you want to check it out with my DL environment:
(1) https://goo.gl/Y7VMTa

But in a nutshell it worked with jpn params inside the container
CMD jupyter lab --port=8888 --ip=0.0.0.0 --no-browser
And installation is as easy as
conda install -c conda-forge jupyterlab
Docs are a bit sparse for now
(1) https://goo.gl/1UQBnS

But this is a list of reasons, why you might consider sticking to ssh pass-through for auto-complete / terminal and jupyter notebook with extensions:
(0) It is still in beta, so unless your professional path is connected with node-js / web - you better pass now
(1) The existence of amazing extensions for Jupyter notebook that do 95% of what you might need - https://goo.gl/K86gjp
(2) Built-it terminal is much better than before, but it pales in comparison with Putty or even standard linux shell (autocomplete?)
(3) Some of built-in extensions like image viewer are really useful, but overall the product is a bit beta (which they openly say it is)

And here is why turning Jupyter notebook into a real environment is really cool:
(1) Building everything based on extensions IS REALLY COOL - and in the long run will encourage people to port jupyter extensions and build a really powerful tool. Also this implies diversity and freedom unlike shitty tools like Zeppelin
(2) After some effort, it may really replace terminal, IDE, desktop environment and notebooks for data-oriented people (I guess 6-12 monhts)
(3) Structuring extensions and npm packages lures the most fast developing web-developer community to support the project and provides transparency and clarity

#data_science
Was looking for CLAHE abstraction for my image pre-processing pipeline and found one in the Internet

class CLAHE:
def __init__(self, clipLimit=2.0, tileGridSize=(8, 8)):
self.clipLimit = clipLimit
self.tileGridSize = tileGridSize

def __call__(self, im):
img_yuv = cv2.cvtColor(im, cv2.COLOR_BGR2YUV)
clahe = cv2.createCLAHE(clipLimit=self.clipLimit, tileGridSize=self.tileGridSize)
img_yuv[:, :, 0] = clahe.apply(img_yuv[:, :, 0])
img_output = cv2.cvtColor(img_yuv, cv2.COLOR_YUV2BGR)
return img_output

#deep_learning
Forwarded from Just links
Adversarial Examples that Fool both Human and Computer Vision https://arxiv.org/abs/1802.08195
2017 DS/ML digest 5

Fun stuff
(1) Hardcore metal + CNNs + style transfer - https://goo.gl/VHYfHe

SpaceNet challenge
(1) Post by Nvidia https://goo.gl/6Mw4CB
(2) Some links to sota semseg articles
(3) Useful tools for CV - floodfill and grabcut, but guys from Nvidia did not notice ... that road width was in geojson data...
(4) Looks like they replicated the results just for PR, but their masks do not look appealing

Research / papers / libraries
(1) Neural Voice Cloning with a Few Samples - https://goo.gl/LwmzRf (demos audiodemos.github.io.)
(2) A library for CRFs in Python - https://goo.gl/cQc8hA
(3) 1000x faster CNN architecture search - still on CIFAR - https://arxiv.org/pdf/1802.03268.pdf (PyTorch https://goo.gl/BZ9Vrh)
(4) URLs + CNN - malicious link detection - https://arxiv.org/abs/1802.03162

Datasets
(1) 3m anime image dataset - https://www.gwern.net/Danbooru2017
(2) Google HDR dataset - https://goo.gl/XEL1Fm

Market
(1) Idea - AMT + blockchain - https://goo.gl/JfzEPV
(2) ARM to make processors for CNNs? - https://goo.gl/MpdPSB
(3) Google TPU in beta - https://goo.gl/gRzq9t - very expensive. + Note the rumours that Google's own people do not use their TPU quota
(4) One guy managed to deploy a PyTorch model using ONNX - https://goo.gl/QD4DkZ

#digest
#machine_learning
#data_science
Just found a book on practical Python programming patterns
- http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonForProgrammers.html

Looks good

#python
Most common libraries for Natural Language Processing:

CoreNLP from Stanford group:
http://stanfordnlp.github.io/CoreNLP/index.html

NLTK, the most widely-mentioned NLP library for Python:
http://www.nltk.org/

TextBlob, a user-friendly and intuitive NLTK interface:
https://textblob.readthedocs.io/en/dev/index.html

Gensim, a library for document similarity analysis:
https://radimrehurek.com/gensim/

SpaCy, an industrial-strength NLP library built for performance:
https://spacy.io/docs/

Source: https://itsvit.com/blog/5-heroic-tools-natural-language-processing/

#nlp #digest #libs
It is tricky to launch XGB fully on GPU. People report that on the same data CatBoost has inferior quality w/o tweaking (but is faster). LightGBM is reported to be faster and to have the same accuracy.

So I tried adding LighGBM w GPU support to my Dockerfile -
https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst - but I encountered some driver Docker issues.

One of the caveats I understood - it supports only older Nvidia drivers, up to 384.

Luckily, there is a Dockerfile by MS that seems to be working (+ jupyter, but I could not install extensions)
https://github.com/Microsoft/LightGBM/blob/master/docker/gpu/README.md

#data_science
Found some starter boilerplate of how to use hyperopt instead of gridsearch for faster search:
- here - https://goo.gl/ccXkuM
- and here - https://goo.gl/ktblo5

#data_science