Out post is live on Russian reddit - geektimes
- https://geektimes.ru/post/299971/
Please support if you have a valid account!
#internet
- https://geektimes.ru/post/299971/
Please support if you have a valid account!
#internet
Found an applied channel (RU) about security and admin stuff
Looks professional
- https://t.me/bykvaadm
- also the channel's admin posted some useful remarks here
-- https://geektimes.ru/post/299971/
#linux
Looks professional
- https://t.me/bykvaadm
- also the channel's admin posted some useful remarks here
-- https://geektimes.ru/post/299971/
#linux
Telegram
Админим с Буквой
Канал о системном администрировании, DevOps и немного Инфобеза.
По всем вопросам обращаться к @bykva. Рекламу не размещаю.
По всем вопросам обращаться к @bykva. Рекламу не размещаю.
So, I used to use Chromium based Opera.
Now I switched to the new Firefox, which is fast, boasts a lot of security extensions and looks also clean and nice. Their mobile apps are a bit unpolished, but also good.
Looks like the rewrote rendering from scratch - because a year ago it was slow.
Now I switched to the new Firefox, which is fast, boasts a lot of security extensions and looks also clean and nice. Their mobile apps are a bit unpolished, but also good.
Looks like the rewrote rendering from scratch - because a year ago it was slow.
2018 DS/ML digest 8
As usual my short bi-weekly (or less) digest of everything that passed my BS detector
Market / blog posts
(0) Fast.ai about the importance of accessibility in ML - http://www.fast.ai/2018/04/10/stanford-salon/
(1) Some interesting news about market, mostly self-driving cars (the rest is crap) - https://goo.gl/VKLf48
(2) US$600m investment into Chinese face recognition - https://goo.gl/U4k2Mg
Libraries / frameworks / tools
(0) New 5 point face detector in Dlib for face alignment task - https://goo.gl/T73nHV
(1) Finally a more proper comparsion of XGB / LightGBM / CatBoost - https://goo.gl/AcszWZ (also see my thoughts here https://t.me/snakers4/1840)
(3) CNNs on FPGAs by ZFTurbo
-- https://www.youtube.com/watch?v=Lhnf596o0cc
-- https://github.com/ZFTurbo/Verilog-Generator-of-Neural-Net-Digit-Detector-for-FPGA
(4) Data version control - looks cool
-- https://dataversioncontrol.com
-- https://goo.gl/kx6Qdf
-- but I will not use it - becasuse proper logging and treating data as immutable solves the issue
-- looks like over-engineering for the sake of overengineering (unless you create 100500 datasets per day)
Visualizations
(0) TF Playground to seed how simplest CNNs work - https://goo.gl/cu7zTm
Applications
(0) Looks like GAN + ResNet + Unet + content loss - can easily solve simpler tasks like deblurring https://goo.gl/aviuNm
(1) You can apply dilated convolutions to NLP tasks - https://habrahabr.ru/company/ods/blog/353060/
(2) High level overview of face detection in ok.ru - https://goo.gl/fDUXa2
(3) Alternatives to DWT and Mask-RCNN / RetinaNet? https://medium.com/@barvinograd1/instance-embedding-instance-segmentation-without-proposals-31946a7c53e1
- Has anybody tried anything here?
Papers
(0) A more disciplined approach to training CNNs - https://arxiv.org/abs/1803.09820 (LR regime, hyper param fitting etc)
(1) GANS for iamge compression - https://arxiv.org/pdf/1804.02958.pdf
(2) Paper reviews from ODS - mostly moonshots, but some are interesting
-- https://habrahabr.ru/company/ods/blog/352508/
-- https://habrahabr.ru/company/ods/blog/352518/
(3) SqueezeNext - the new SqueezeNet - https://arxiv.org/abs/1803.10615
#digest
#data_science
#deep_learning
As usual my short bi-weekly (or less) digest of everything that passed my BS detector
Market / blog posts
(0) Fast.ai about the importance of accessibility in ML - http://www.fast.ai/2018/04/10/stanford-salon/
(1) Some interesting news about market, mostly self-driving cars (the rest is crap) - https://goo.gl/VKLf48
(2) US$600m investment into Chinese face recognition - https://goo.gl/U4k2Mg
Libraries / frameworks / tools
(0) New 5 point face detector in Dlib for face alignment task - https://goo.gl/T73nHV
(1) Finally a more proper comparsion of XGB / LightGBM / CatBoost - https://goo.gl/AcszWZ (also see my thoughts here https://t.me/snakers4/1840)
(3) CNNs on FPGAs by ZFTurbo
-- https://www.youtube.com/watch?v=Lhnf596o0cc
-- https://github.com/ZFTurbo/Verilog-Generator-of-Neural-Net-Digit-Detector-for-FPGA
(4) Data version control - looks cool
-- https://dataversioncontrol.com
-- https://goo.gl/kx6Qdf
-- but I will not use it - becasuse proper logging and treating data as immutable solves the issue
-- looks like over-engineering for the sake of overengineering (unless you create 100500 datasets per day)
Visualizations
(0) TF Playground to seed how simplest CNNs work - https://goo.gl/cu7zTm
Applications
(0) Looks like GAN + ResNet + Unet + content loss - can easily solve simpler tasks like deblurring https://goo.gl/aviuNm
(1) You can apply dilated convolutions to NLP tasks - https://habrahabr.ru/company/ods/blog/353060/
(2) High level overview of face detection in ok.ru - https://goo.gl/fDUXa2
(3) Alternatives to DWT and Mask-RCNN / RetinaNet? https://medium.com/@barvinograd1/instance-embedding-instance-segmentation-without-proposals-31946a7c53e1
- Has anybody tried anything here?
Papers
(0) A more disciplined approach to training CNNs - https://arxiv.org/abs/1803.09820 (LR regime, hyper param fitting etc)
(1) GANS for iamge compression - https://arxiv.org/pdf/1804.02958.pdf
(2) Paper reviews from ODS - mostly moonshots, but some are interesting
-- https://habrahabr.ru/company/ods/blog/352508/
-- https://habrahabr.ru/company/ods/blog/352518/
(3) SqueezeNext - the new SqueezeNet - https://arxiv.org/abs/1803.10615
#digest
#data_science
#deep_learning
www.fast.ai
A Discussion about Accessibility in AI at Stanford
Making neural nets uncool again
A DISCIPLINED APPROACH TO NEURAL NETWORK HYPER-PARAMETERS: PART 1 – LEARNING RATE, BATCH SIZE, MOMENTUM, AND WEIGHT DECAY
(0) https://arxiv.org/abs/1803.09820, Leslie N. Smith US Naval Research Laboratory
(1) Will serve as a good intuition starter if you have little experience (!)
(2) Some nice ideas:
- The test/validation loss is a good indicator of the network’s convergence - especially in early epochs
- The amount of regularization must be balanced for each dataset and architecture
- The practitioner’s goal is obtaining the highest performance while minimizing the needed computational time
(smaller batch - less stability and faster convergence)
- Optimal momentum value(s) will improve network training
(3) The author does not study the difference between SGD and Adam in depth =( Adam kind of solves much of his pains
(4) In my practice the following approach works best:
- Aggressive training with Adam to find the optimal LR
- Apply various LR decay regimes to determine the optimal
- Use low LR or CLR in the end to converge to a lower value (possible overfitting)
- Test on test / delayed test end-to-end
- In my experience - a strong model with good params will start with test/val set loss much lower / target metric much higher than on the train set
- In some applications if your CNN is memory intesive - you just opt for the largest batch possible (usually >6-8 works)
- Also there is no mention of augmentations - they usually help reduce overfitting much better than hyper parameters
#deep_learning
(0) https://arxiv.org/abs/1803.09820, Leslie N. Smith US Naval Research Laboratory
(1) Will serve as a good intuition starter if you have little experience (!)
(2) Some nice ideas:
- The test/validation loss is a good indicator of the network’s convergence - especially in early epochs
- The amount of regularization must be balanced for each dataset and architecture
- The practitioner’s goal is obtaining the highest performance while minimizing the needed computational time
(smaller batch - less stability and faster convergence)
- Optimal momentum value(s) will improve network training
(3) The author does not study the difference between SGD and Adam in depth =( Adam kind of solves much of his pains
(4) In my practice the following approach works best:
- Aggressive training with Adam to find the optimal LR
- Apply various LR decay regimes to determine the optimal
- Use low LR or CLR in the end to converge to a lower value (possible overfitting)
- Test on test / delayed test end-to-end
- In my experience - a strong model with good params will start with test/val set loss much lower / target metric much higher than on the train set
- In some applications if your CNN is memory intesive - you just opt for the largest batch possible (usually >6-8 works)
- Also there is no mention of augmentations - they usually help reduce overfitting much better than hyper parameters
#deep_learning
Nice read about systemctl
https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units
#linux
https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units
#linux
Digitalocean
How To Use Systemctl to Manage Systemd Services and Units | DigitalOcean
Systemd is an init system and system manager that has become the new standard for Linux distributions. In this guide, we will be discussing the systemctl com…
A draft of the article about DS Bowl 2018 on Kaggle.
This time this was a lottery.
Good that I did not really spend much time, but this time I learned a lot about watershed and some other instance segmentation methods!
An article is accompanied by a dockerized PyTorch code release on GitHub:
- https://spark-in.me/post/playing-with-dwt-and-ds-bowl-2018
- https://github.com/snakers4/ds_bowl_2018
This is a beta, you are welcome to comment and respond.
Kudos!
#data_science
#deep_learning
#instance_se
This time this was a lottery.
Good that I did not really spend much time, but this time I learned a lot about watershed and some other instance segmentation methods!
An article is accompanied by a dockerized PyTorch code release on GitHub:
- https://spark-in.me/post/playing-with-dwt-and-ds-bowl-2018
- https://github.com/snakers4/ds_bowl_2018
This is a beta, you are welcome to comment and respond.
Kudos!
#data_science
#deep_learning
#instance_se
Spark in me
Applying Deep Watershed Transform to Kaggle Data Science Bowl 2018 (dockerized solution)
In this article I will describe my solution to the DS Bowl 2018 and why it was a lottery and post a link to my dockerized solution
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
Also what is interesting, despite the fact that geektimes blocked my SOCKS proxy post and the fact that marketing based web-sites stole it (in Russian), I received the following feedback:
- 3 people thanked me in the ODS channel
- 3 people thanked me via email
- 2 people thanked me in geektimes PM
Also this is also interesting - my referral link was hit 165 times and ~50 people registered =)
- http://prntscr.com/j69w85
So if you missed the fun
- Post https://spark-in.me/post/vds-socks5-proxy-server
- Referral link https://m.do.co/c/6f8e77dddc23
- Note that the final config is in the comments and here (thanks to https://t.me/bykvaadm and its admin)
So, thanks to bykvaadm for his feedback and support and to everybody else.
#linux
- 3 people thanked me in the ODS channel
- 3 people thanked me via email
- 2 people thanked me in geektimes PM
Also this is also interesting - my referral link was hit 165 times and ~50 people registered =)
- http://prntscr.com/j69w85
So if you missed the fun
- Post https://spark-in.me/post/vds-socks5-proxy-server
- Referral link https://m.do.co/c/6f8e77dddc23
- Note that the final config is in the comments and here (thanks to https://t.me/bykvaadm and its admin)
sudo apt update && apt upgrade
wget https://launchpad.net/ubuntu/+archive/primary/+files/dante-server_1.4.2+dfsg-2build1_amd64.deb
dpkg -i dante-server_1.4.2+dfsg-2build1_amd64.deb
echo '
logoutput: syslog /var/log/danted.log
internal: eth0 port = 1080
external: eth0
socksmethod: username
user.privileged: root
user.unprivileged: nobody
client pass {
from: 0.0.0.0/0 to: 0.0.0.0/0
log: error
}
socks pass {
from: 0.0.0.0/0 to: 0.0.0.0/0
command: connect
log: error
socksmethod: username
}' > /etc/danted.conf
# basic ufw installation
sudo apt-get install ufw
sudo ufw status
# https://wiki.dieg.info/socks
sudo ufw allow ssh
sudo ufw allow proto tcp from any to any port 1080
sudo ufw status numbered
sudo ufw enable
sudo systemctl enable danted
sudo useradd --shell /usr/sbin/nologin av_socks && sudo passwd av_socks
So, thanks to bykvaadm for his feedback and support and to everybody else.
#linux
Lightshot
Screenshot
Captured with Lightshot
Also someone just bought us a coffee
- https://www.buymeacoffee.com/8oneCIN
Please consider supporting us for more quality content
Usually it takes several hours (to a month if it is about a competition) to write and does not pay well
And when people steal your content to put their refcodes in it, it's painful (
- https://www.buymeacoffee.com/8oneCIN
Please consider supporting us for more quality content
Usually it takes several hours (to a month if it is about a competition) to write and does not pay well
And when people steal your content to put their refcodes in it, it's painful (
Buy Me a Coffee
Alexander Veysov
A practitioner in the field of Data Science / Deep LearningWelcome to my BMC page.As a community member I mostly do 3 things:Maintain and write articles for multi-author...
Nice realistic article about bias in embeddings by Google
https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html
#google
#nlp
https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html
#nlp
Googleblog
Text Embedding Models Contain Bias. Here's Why That Matters. - Google for Developers
data:blog.pageName + " Read more."
DS Bowl 2018 top solution
https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741
#data_science
https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741
#data_science
This is really interesting...their approach to separation is cool
Andrew NG released first 4 chapters of his new book
So far looks not really technical
- https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/704291d2-365e-45bf-a9f5-719959dfe415/Ng_MLY01.pdf
#data_science
So far looks not really technical
- https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/704291d2-365e-45bf-a9f5-719959dfe415/Ng_MLY01.pdf
#data_science
Spark in me via @vote
Given the current situation ... which post / guide would you like next?
DS / ML related (back log of hobby projects)! – 47
👍👍👍👍👍👍👍 69%
OpenVPN + Docker – 12
👍👍 18%
Dante proxy + Arubacloud + DigitalOcean + Vultr + Docker – 9
👍 13%
👥 68 people voted so far.
DS / ML related (back log of hobby projects)! – 47
👍👍👍👍👍👍👍 69%
OpenVPN + Docker – 12
👍👍 18%
Dante proxy + Arubacloud + DigitalOcean + Vultr + Docker – 9
👍 13%
👥 68 people voted so far.
Useful Python abstractions / sugar / patterns
I already shared a book about patterns, which contains mostly high level / more complicated patters. But for writing ML code sometimes simple imperative function programming style is ok.
So - I will be posting about simple and really powerful python tips I am learning now.
This time I found out about
Map
#python
#data_science
I already shared a book about patterns, which contains mostly high level / more complicated patters. But for writing ML code sometimes simple imperative function programming style is ok.
So - I will be posting about simple and really powerful python tips I am learning now.
This time I found out about
map
and filter
, which are super useful for data preprocessing:Map
items = [1, 2, 3, 4, 5]Filter
squared = list(map(lambda x: x**2, items))
number_list = range(-5, 5)Also found this book - http://book.pythontips.com/en/latest/map_filter.html
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)
#python
#data_science
A note on CDNs and protecting your website against censorship
TLDR
- https://goo.gl/47UqyZ
- Using a free / cheap CDN service can enable you to protect your domain hosted resource from censorship
- Unless CDN servers will be blocked (but I guess the CDN has more servers, than you, right?)
So, I host spark-in.me on Digital Ocean. And I do not want to move or start a CDN by myself. I read news, that Google abandoned some of its proxying tools because of such censorship events...interesting.
I knew that services like Cloudflare (**CDN**) forward your traffic somehow, but I was not sure what IP is actually seen by the user and whether all of the traffic is forwarded. Then I read their FAQ
- https://goo.gl/uHPLjW
It says
So I tried their free-tier service (paid service starts from US$20-200, which is too steep) and it just works, though SSL certificates were issued ~90 mins after I changed my nameservers. It is as easy as:
- Backup your DNS settings somewhere
- Import to CloudFlare
- Change name servers in your domain registrar cabinet
- 90 mins and ... profit
Now I cannot see my direct DO server IP when I resolve my DNS:
#internet
#security
TLDR
- https://goo.gl/47UqyZ
- Using a free / cheap CDN service can enable you to protect your domain hosted resource from censorship
- Unless CDN servers will be blocked (but I guess the CDN has more servers, than you, right?)
So, I host spark-in.me on Digital Ocean. And I do not want to move or start a CDN by myself. I read news, that Google abandoned some of its proxying tools because of such censorship events...interesting.
I knew that services like Cloudflare (**CDN**) forward your traffic somehow, but I was not sure what IP is actually seen by the user and whether all of the traffic is forwarded. Then I read their FAQ
- https://goo.gl/uHPLjW
It says
After a visitor's browser has done the initial DNS lookup, it begins making requests to retrieve the actual content of a website. These requests are directed to the IP address that was returned from the DNS lookup. Before Cloudflare, that address would have been 198.51.100.1. With Cloudflare as the authoritative nameserver, the new address is 203.0.113.1. Cloudflare’s data center at 203.0.113.1 will serve as much of your website as it can from its local storage, and ask your web server at 198.51.100.1 for any part of your website it doesn’t already have stored locally. The Cloudflare data center at 203.0.113.1 will then provide your complete website to the visitor, so the visitor never talks directly to your web server at 198.51.100.1.
So I tried their free-tier service (paid service starts from US$20-200, which is too steep) and it just works, though SSL certificates were issued ~90 mins after I changed my nameservers. It is as easy as:
- Backup your DNS settings somewhere
- Import to CloudFlare
- Change name servers in your domain registrar cabinet
- 90 mins and ... profit
Now I cannot see my direct DO server IP when I resolve my DNS:
$ dig +short spark-in.me
104.27.142.65
104.27.143.65
#internet
#security
Forwarded from Админим с Буквой (bykva)
Немного о баш скриптовании
Порою возникает необходимость записать какие-то данные в фай с сохранением переноса строк и отступов. сделать это можно несколькими способами.
1) с помощью echo
Такой способ имеет только единственный плюс - он однострочник. В реальности он впринципе не читабельный.
Разбить этот однострочник можно, конечно, на несколько строк:
И такой способ впринципе хорош до тех пор, пока вам не придется экранировать кавычки в тексте.
2) с помощью cat
В отличии от первого способа переменные по-прежнему интерпретируются, а кавычки экранировать не нужно.
#bash_tips_and_tricks
Порою возникает необходимость записать какие-то данные в фай с сохранением переноса строк и отступов. сделать это можно несколькими способами.
1) с помощью echo
echo -e 'This is first string\nAnd this is second' > /path/to/file
Такой способ имеет только единственный плюс - он однострочник. В реальности он впринципе не читабельный.
Разбить этот однострочник можно, конечно, на несколько строк:
echo "
str1
$variable - why not?
str N
"
И такой способ впринципе хорош до тех пор, пока вам не придется экранировать кавычки в тексте.
2) с помощью cat
cat > ceph.conf « EOF
[global]
mon_host = xx.xx.xx.xx:6789
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
EOF
В отличии от первого способа переменные по-прежнему интерпретируются, а кавычки экранировать не нужно.
#bash_tips_and_tricks
For windows users, that use their legacy machine as thin client to access Linux servers
Old habits die slowly. I use and old, but powerful Windows machine and so far doing everything on remote servers was ok, until I needed to commit to
But my key is stored locally and I do not want to use git bash or any windows based software, because it sucks. Also having a single source of truth on a remote Linux machine is better anyway. But I cannot store my key on the remote machine.
There is a solution -
- Install pageant, add your identity locally (.ppk private key file)
- Check
- Follow the below guides to check that all works
-
https://www.digitalocean.com/community/tutorials/how-to-use-pageant-to-streamline-ssh-key-authentication-with-putty
https://developer.github.com/v3/guides/using-ssh-agent-forwarding/
#linux
Old habits die slowly. I use and old, but powerful Windows machine and so far doing everything on remote servers was ok, until I needed to commit to
github
using ssh agent forwarding
.But my key is stored locally and I do not want to use git bash or any windows based software, because it sucks. Also having a single source of truth on a remote Linux machine is better anyway. But I cannot store my key on the remote machine.
There is a solution -
ssh-agent forwarding
. In a nutshell:- Install pageant, add your identity locally (.ppk private key file)
- Check
allow agent forwarding
in Putty- Follow the below guides to check that all works
-
profit
https://www.digitalocean.com/community/tutorials/how-to-use-pageant-to-streamline-ssh-key-authentication-with-putty
https://developer.github.com/v3/guides/using-ssh-agent-forwarding/
#linux
Digitalocean
How To Use Pageant to Streamline SSH Key Authentication with PuTTY | DigitalOcean
Pageant is a PuTTY authentication agent. It holds your private keys in memory so that you can use them whenever you are connecting to a server. It eliminates…