Data Science by ODS.ai 🦜
51.1K subscribers
359 photos
32 videos
7 files
1.51K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
New Coding Assistant Tool From OpenAI and Microsoft

Github announced new tool for improving coding experience: Github's copilot, developed with Microsoft and OpenAI's help. This looks really promosing, at least from the announce perspective: imaging just typing convert_datetime_to_date and getting function for that. Looking forward to the actual demo.

Project: https://copilot.github.com
Blog entry: https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/
CNBC news post: https://www.cnbc.com/2021/06/29/microsoft-github-copilot-ai-offers-coding-suggestions.html

#OpenAI #microsoft #coding #CS #computerlanguageunderstanding #CLU #Github
MMPX Style-Preserving Pixel Art Magnification

Work on #pixel graphics resolution upscale. Hopefully we will get all the classic games auto-remastered someday.

Publication: http://www.jcgt.org/published/0010/02/04/
Article: http://www.jcgt.org/published/0010/02/04/paper.pdf

#CV #superresolution #upscale
This media is not supported in your browser
VIEW IN TELEGRAM
Habitat 2.0: Training home assistant robots with faster simulation and new benchmarks

Facebook released a new simulation platform to train robots in. Yeah, virtual robots in virtual environment, which can be a real space replica. This work brings us closer to domestic use of assistive robots.

Project website: https://ai.facebook.com/blog/habitat-20-training-home-assistant-robots-with-faster-simulation-and-new-benchmarks
Paper: https://ai.facebook.com/research/publications/habitat-2.0-training-home-assistants-to-rearrange-their-habitat

#Facebook #DigitalTwin #VR #RL #assistiverobots
Cloud-Native MLOps Framework

In this video, Artem Koval, Big Data and Machine Learning Practice Lead at Clear Scale, will analyse the requirements for modern MLOps and the main trends: Human-Centered AI, Fairness, Explainability, Model Monitoring, Human Augmented AI.

Link: https://youtu.be/K8s6dD7TPH4
FEDOT - AutoML framework for composite pipelines

FEDOT is an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks, as well as different data types and multi-modal cases. Also, sensitivity analysis of the pipelines, custom pipelines design as the initial assumption of optimization, domain-specific objective functions, and other interesting features are implemented.

Github: https://github.com/nccr-itmo/FEDOT

Preprint: https://arxiv.org/abs/2106.15397

Intro: https://www.youtube.com/watch?v=RjbuV6i6de4
Forwarded from Gradient Dude
Experimented with generating images from text prompts with VQGAN and CLIP. Some cool results:

1."Minecraft Starcraft"
2. "Polygonal fast food"
3. "Holy war against capitalism"
4. "Modern cubist painting"

πŸ€™πŸΌ Colab notebook
Channel name was changed to Β«Data Science by ODS.ai πŸ’‰Β»
Thank all of you 44 666 for your support!
Under the Boot of Google and Facebook and How to Crack it for better Performance

In this video, Alex Farseev from SoMin.ai will shed the light into the complex Digital Advertising ecosystem and will show you techniques, such as Long-Tail targeting, that we use in to crack the Ad Performance.

Link: https://youtu.be/p7wT_4Lf3Ks
Forwarded from Silero News (Alexander)
New Language Classifier For 116 Languages

- 116 languages (83% accuracy), 77 language groups (87% accuracy)
- Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian)
- Trained on approx 20k hours of data (10k of which are for 5 most popular languages)
- 1.7M params

Shortcomings

- Predictably, related and mutually intelligible languages are hard to tell apart
- The confusion matrix mostly makes sense, except for low resource languages and English
- English has the lowest accuracy
- Dataset needs some further curation (i.e. remove hardly spoken or artificial languages)
- Make a model larger

Link

- https://github.com/snakers4/silero-vad
Automated Machine Learning Library

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks. Beats built-in solution in HANA, database from SAP. Written by 2 students as diploma project.

Features:
β€’ Easy to use Python interface
β€’ Automates most Machine Learning steps
β€’ Complete documentation
β€’ Intuitive web client
β€’ Supports Regression and Binary Classification tasks

Roadmap:
β€’ Text classification
β€’ Multi class classification
β€’ Forecasting
β€’ Automate all ML steps
β€’ Beat other libraries in accuracy
β€’ More hyperparameter tuning methods


GitHub: https://github.com/dan0nchik/SAP-HANA-AutoML
Web app: https://share.streamlit.io/dan0nchik/sap-hana-automl/main/web.py
Docs: https://sap-hana-automl.readthedocs.io/en/latest/index.html#
Authors: @dan0nchik, @m_whiskas

#automl
​​Long-Short Transformer: Efficient Transformers for Language and Vision

This paper offers a new approach to solving the problem of quadratic time and memory complexities of self-attention in Transformers. The authors propose Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. A dual normalization is used to deal with the scale mismatch between the two attention mechanisms. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity.

This method outperforms the state-of-the-art models on multiple tasks in language and vision domains. For instance, Transformer-LS achieves 0.97 test BPC on enwik8 using half the number of parameters than previous methods, while being faster and is able to handle 3Γ— as long sequences. On ImageNet, it can obtain 84.1% Top-1 accuracy, while being more scalable on high-resolution images.

Paper: https://arxiv.org/abs/2107.02192

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-transformerls

#deeplearning #cv #nlp #transformer #attention
​​We are happy to announce the start of our first ODS Summer of Code #1

This is our first official Summer School, designed after the famous Google GSoC. ODS Summer School activities are hosted on the same basis of Data Fest - if you are already registered to the Data Fest 2021 feel free to join ODS SoC projects.

ODS Summer School is 100% online and lasts until September 3rd with a finale on ODS Course Fest. We will have an equator stream on August 6th.

At ODS Summer of Code start we have 14 Summer projects grouped into 3 tracks:
β€’ Open Source: if you ever wanted to develop an open source AutoML or maybe finally get used to Catalyst, scikit-uplift, DeepPavlov or DVC
β€’ Open Science: for those of you who want to have more ML
β€’ ML4SG: to make the world a better place

We also have our hardware-centric partners for Summer School - a combo of Intel and SberCloud. For those of you who would like to know more about project development - visit their joint track CloudCity and register on aicloud to try out awesome tech for free.

We managed to get a prize pool of 1,000,000+β‚½. All the prize information is available on the SoC prize page.

There will be an onboarding in ODS Spatial Chat this Saturday, July 17th at 1 PM. Feel free to join!

ODS Summer of Code #1: https://ods.ai/tracks/summer-of-code-2021
Register: https://ods.ai/events/datafest2021
SoC prize page: https://ods.ai/tracks/summer-of-code-2021/blocks/ae0e78c2-ae3e-468d-bc80-9346bab55465
ODS Spatial Chat: https://live.ods.ai/
JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels.

Scientific, Data science and visualisation packages are supported.

Basically it means you can use Jupyter just by opening a new browser tab. Starting to learn Data Science has never been easier.

Read the intro[1] for full feature list, or try it online[2].

#jupyterlab #jupyterlite
[1] https://blog.jupyter.org/jupyterlite-jupyter-%EF%B8%8F-webassembly-%EF%B8%8F-python-f6e2e41ab3fa

[2] https://jupyterlite.github.io/demo
​​Blender Bot 2.0: An open source chatbot that builds long-term memory and searches the internet

Bot is capable of supporting a dialog and remembering the context of the sequential questions.

Blogpost: https://ai.facebook.com/blog/blender-bot-2-an-open-source-chatbot-that-builds-long-term-memory-and-searches-the-internet
Github: https://github.com/facebookresearch/ParlAI
Paper 1: https://parl.ai/projects/sea
Paper 2: https://parl.ai/projects/msc

#chatbot #NLU #facebookai
Forwarded from Gradient Dude
OpenAI disbands its robotics research team. This is exactly the same team that, for example, taught a robotic arm to solve a Rubik's cube using Reinforcement Learning. This decision was made because the company considers more promising research in areas where physical equipment is not required (except for servers, of course), and there is already a lot of data available. And also for economic reasons, since Software as a Services is a business with a much higher margin. Yes, the joke is that the non-profit organization OpenAI is considered more and more about profit. This is understandable because it takes a lot of money to create general artificial intelligence (AGI) that can learn all the tasks that a person can do and even more.

It's no secret that research in the field of robotics is also a very costly activity that requires a lot of investment. Therefore, there are not so many companies involved in this. Among the large and successful, only Boston Dynamics comes to mind, which has already changed several owners. Did you know that in 2013 Google acquired Boston Dynamics, then Google also scaled down its robotics research program, and in 2017 sold Boston Dynamic to the Japanese firm SoftBank. The adventures of Boston Dynamics did not end there, and in December 2020 SoftBank resold 80% of the shares (a controlling stake) to the automaker Hyundai. This looks somehow fishy as if every company understands after a few years that it is still difficult to make a profit from Boston Dynamics and sells it to another patsy.

In any case, it is very interesting to observe which focus areas are chosen by the titans of AI research. But I'm a bit sad that robots are still lagging behind.
​​πŸ”ͺDiSECt: A Differentiable Simulation Engine for Autonomous Robotic Cutting

A differentiable simulator for robotic cutting at #RSS2021. It achieves highly accurate predictions of the knife forces, optimizes cutting actions & more!

Project site: https://diff-cutting-sim.github.io
ArXiV: https://arxiv.org/abs/2105.12244