Data Science by ODS.ai 🦜
49.6K subscribers
407 photos
43 videos
7 files
1.55K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
#NLP #News (by Sebastian Ruder):
* 2020 NLP wish lists
* #HuggingFace + #fastai
* #NeurIPS 2019
* #GPT2 things
* #ML Interviews

blog post: http://newsletter.ruder.io/archive/211277
Data Science by ODS.ai 🦜
​​YouTokenToMe, new tool for text tokenisation from VK team Meet new enhanced tokenisation tool on steroids. Works 7-10 times faster alphabetic languages and 40 to 50 times faster on logographic languages, than alternatives. Under the hood (watch source)…
New rust tokenization library from #HuggingFace

Tokenization is a process of converting strings in model input tensors. Library provides BPE/Byte-Level-BPE/WordPiece/SentencePiece tokenization, computes exhaustive set of outputs (offset mappings, attention masks, special token masks).

Library has python and node.js bindings.

The quoted post contains information on another fast #tokenization implementation. Looking forward for speed comparison.

Install: pip install tokenizers
Github: https://github.com/huggingface/tokenizers/tree/master/tokenizers

#NLU #NLP #Transformers #Rust #NotOnlyPython
​​overview of current #trends & #problems in #NLP
by #huggingface

link to presentation: here
​​the latest news from :hugging_face_mask:

[0] Helsinki-NLP

With v2.9.1 released 1,008 machine translation models, covering of 140 different languages trained with marian-nmt

link to models: https://huggingface.co/models?search=Helsinki-NLP%2Fopus-mt


[1] updated colab notebook with the new Trainer

colab: https://t.co/nGQxwqwwZu?amp=1


[2] NLP – library to easily share & load data/metrics already providing access to 99+ datasets!

features
– get them all: built-in interoperability with pytorch, tensorflow, pandas, numpy
– simple transparent pythonic API
– strive on large datasets: nlp frees you from RAM memory limits
– smart cache: process once reuse forever
– add your dataset

colab: https://t.co/37pfogRWIZ?amp=1
github: https://github.com/huggingface/nlp


#nlp #huggingface #helsinki #marian #trainer # #data #metrics
The Reformer – Pushing the limits of language modeling
Patrick von Platen @ huggingface

The Reformer model was introduced by Kitaev, Kaiser et al. `20 – it is one of the most memory-efficient transformer models for long sequence modeling as of today.

The goal of this blog post is to give an in-depth understanding of each of the next four Reformer features:
[0] reformer self-attention layer – how to efficiently implement self-attention without being restricted to a local context?
[1] chunked feed forward layers – how to get a better time-memory trade-off for large feed forward layers?
[2] reversible residual layers – how to drastically reduce memory consumption in training by a smart residual architecture?
[3] axial positional encodings – how to make positional encodings usable for extremely large input sequences?

This long blog post can better allow you to understand how the model works to correctly set configurations


blog post: https://huggingface.co/blog/reformer

#nlp #reformer #huggingface #transformers
​​Perceiver IO: a scalable, fully-attentional model that works on any modality

#HuggingFace added neural network which is capable of working on all kinds of modailities: text, images, audio, video, coordinates, etc to the transformers library.

Blog: https://huggingface.co/blog/perceiver
🦜 Hi!

We are the first Telegram Data Science channel.


Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.


Ultimate Posts

* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda


Open Data Science

ODS.ai is an international community of people anyhow related to Data Science.

Website: https://ods.ai



Hashtags

Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.

#deeplearning #DL β€” post about deep neural networks (> 1 layer)
#cv β€” posts related to Computer Vision. Pictures and videos
#nlp #nlu β€” Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition β€” related to audio information processing
#ar β€” augmeneted reality related content
#rl β€” Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart β€” about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 β€” related to special architectures or frameworks
#coding #CS β€” content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface β€” hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization


Chats

- Data Science Chat https://t.me/datascience_chat
- ODS Slack through invite form at website

ODS resources

* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai

Feedback and Contacts

You are welcome to reach administration through telegram bot: @opendatasciencebot
Data Science by ODS.ai 🦜
Some stats to get the perspective of the development of #dalle Β«Used 1000 prompts in Dalle over the last 2 days, about 9 hours each day. Of those, saved ~300. 50 I like enough to share w/ socials. 12 enough to rework for future projects. 3 were perfect,…
Tips & Tricks on Image Generation

Generating images with AI tools is a skill, which can be improved and enhanced. So here is couple of articles, covering tips & tricks on how to generate better images with #midjourney. Most interesting one is #huggingface prompt generator, which uses #NLP model to generate sample prompts.

As an example, we tried to reproduce and improve our group avatar, following ideas in the articles. Prompt for an illustration to this post was generated with query ferrofluids in form of a brain, beautiful connections chaos, swirling black network --ar 3:4 --iw 9 --q 2 --s 1250

Midjourney Prompt Generator: https://huggingface.co/spaces/doevent/prompt-generator
List of Midjourney prompts: https://www.followchain.org/midjourney-prompts/
An advanced guide to writing prompts for Midjourney ( text-to-image): https://medium.com/mlearning-ai/an-advanced-guide-to-writing-prompts-for-midjourney-text-to-image-aa12a1e33b6

#visualization #gan #generation #generatinveart #aiart #artgentips
Forwarded from Machinelearning
βœ”οΈ БСсплатныС ΠΏΠΎΠ»Π΅Π·Π½Ρ‹Π΅ руководства ΠΏΠΎ дистилляции ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ:

1. Руководство ΠΏΠΎ дистилляции ΠΎΡ‚ OpenAI πŸ–₯

Руководство содСрТит ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎΠ΅ описаниС процСсса ΠΏΠ΅Ρ€Π΅Π΄Π°Ρ‡ΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ‚ Π±ΠΎΠ»Π΅Π΅ ΠΊΡ€ΡƒΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΊ ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ, c сохранСниСм высокой ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ.

ΠžΡΠ½ΠΎΠ²Π½Ρ‹Π΅ аспСкты, рассмотрСнныС Π² руководствС:
- Π‘ΠΎΡ…Ρ€Π°Π½Π΅Π½ΠΈΠ΅ Π²Ρ‹Ρ…ΠΎΠ΄Π½Ρ‹Ρ… Π΄Π°Π½Π½Ρ‹Ρ… ΠΊΡ€ΡƒΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ: Π‘ΠΎΠ·Π΄Π°Π½ΠΈΠ΅ Π½Π°Π±ΠΎΡ€Π° Π΄Π°Π½Π½Ρ‹Ρ…, содСрТащСго прСдсказания большой ΠΌΠΎΠ΄Π΅Π»ΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π±ΡƒΠ΄ΡƒΡ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ для обучСния мСньшСй ΠΌΠΎΠ΄Π΅Π»ΠΈ.

- ΠžΡ†Π΅Π½ΠΊΠ° ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ: Π‘Ρ€Π°Π²Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ Π°Π½Π°Π»ΠΈΠ· точности ΠΈ эффСктивности ΠΊΠ°ΠΊ ΠΊΡ€ΡƒΠΏΠ½ΠΎΠΉ, Ρ‚Π°ΠΊ ΠΈ ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° основС Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Ρ… ΠΌΠ΅Ρ‚Ρ€ΠΈΠΊ.

- Π‘ΠΎΠ·Π΄Π°Π½ΠΈΠ΅ ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ… Π΄Π°Π½Π½Ρ‹Ρ… для ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ:
ИспользованиС прСдсказаний ΠΊΡ€ΡƒΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ для Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰Π΅Π³ΠΎ Π½Π°Π±ΠΎΡ€Π° Π΄Π°Π½Π½Ρ‹Ρ…, ΡΠΏΠΎΡΠΎΠ±ΡΡ‚Π²ΡƒΡŽΡ‰Π΅Π³ΠΎ эффСктивному ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ мСньшСй ΠΌΠΎΠ΄Π΅Π»ΠΈ.

- ΠžΡ†Π΅Π½ΠΊΠ° Π΄ΠΎΠΎΠ±ΡƒΡ‡Π΅Π½Π½ΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ: ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΊΠ° ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ ΠΈ точности ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ послС процСсса дистилляции для подтвСрТдСния соотвСтствия трСбованиям.

πŸ”—Π‘ΡΡ‹Π»ΠΊΠ°

2. Π£Ρ‡Π΅Π±Π½ΠΈΠΊ ΠΏΠΎ дистилляции Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ‚ PyTorch πŸ”₯

Руководство ΠΎΡ‚ PyTorch, ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ΅ содСрТит практичСскоС Π²Π²Π΅Π΄Π΅Π½ΠΈΠ΅ Π² Ρ‚Π΅Ρ…Π½ΠΈΠΊΡƒ ΠΏΠ΅Ρ€Π΅Π΄Π°Ρ‡ΠΈ Π·Π½Π°Π½ΠΈΠΉ для развёртывания ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° устройствах с ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΌΠΈ рСсурсами.

ΠžΡΠ½ΠΎΠ²Π½Ρ‹Π΅ аспСкты руководства:

- Π˜Π·Π²Π»Π΅Ρ‡Π΅Π½ΠΈΠ΅ скрытых прСдставлСний: Π’ Π³Π°ΠΉΠ΄Π΅ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, ΠΊΠ°ΠΊ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ ΠΏΡ€ΠΎΠΌΠ΅ΠΆΡƒΡ‚ΠΎΡ‡Π½Ρ‹Π΅ прСдставлСния ΠΈΠ· ΠΎΠ±ΡƒΡ‡Π΅Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ для дальнСйшСго использования.

- ΠœΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΡ Ρ†ΠΈΠΊΠ»ΠΎΠ² обучСния Π² PyTorch: Π—Π΄Π΅ΡΡŒ рассматриваСтся интСграция Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹Ρ… Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΉ Π² стандартныС Ρ†ΠΈΠΊΠ»Ρ‹ обучСния для эффСктивной ΠΏΠ΅Ρ€Π΅Π΄Π°Ρ‡ΠΈ Π·Π½Π°Π½ΠΈΠΉ.

- На ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π΅ ΠΏΠΎΠΊΠ°Π·Π°Π½ процСсс обучСния ΠΊΠΎΠΌΠΏΠ°ΠΊΡ‚Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ, с ипользованиСм прСдсказания Π±ΠΎΠ»Π΅Π΅ слоТной ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² качСствС ΠΎΡ€ΠΈΠ΅Π½Ρ‚ΠΈΡ€Π°.

Руководство содСрТит ΠΏΠΎΡˆΠ°Π³ΠΎΠ²Ρ‹Π΅ инструкции ΠΈ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Ρ‹ ΠΊΠΎΠ΄Π°, Ρ‡Ρ‚ΠΎ Π΄Π΅Π»Π°Π΅Ρ‚ Π΅Π³ΠΎ Ρ†Π΅Π½Π½Ρ‹ΠΌ рСсурсом, Ссли Π²Ρ‹ Ρ…ΠΎΡ‚ΠΈΡ‚Π΅ Π½Π°ΡƒΡ‡ΠΈΡ‚ΡŒΡΡ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ свои ΠΌΠΎΠ΄Π΅Π»ΠΈ для использования Π² срСдах с ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ рСсурсами.

β–ͺБсылка

3. Jetson Introduction to Knowledge Distillation ΠΎΡ‚ Nvidia πŸ–₯

Π’ Π΄Π°Π½Π½ΠΎΠΌ руководствС рассматриваСтся процСсс ΠΏΠ΅Ρ€Π΅Π΄Π°Ρ‡ΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ‚ ΠΌΠΎΠ΄Π΅Π»ΠΈ OpenCLIP (vision-language model) ΠΊ ΠΌΠΎΠ΄Π΅Π»ΠΈ ResNet18 для классификации Π½Π° Π½Π°Π±ΠΎΡ€Π΅ Π΄Π°Π½Π½Ρ‹Ρ… STL10.

ОсобоС Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ удСляСтся Ρ‚ΠΎΠΌΡƒ, ΠΊΠ°ΠΊ Π²Ρ‹Π±ΠΎΡ€ Π΄Π°Π½Π½Ρ‹Ρ…, ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ дистилляции ΠΈ Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° ΠΌΠΎΠ΄Π΅Π»ΠΈ, Π²Π»ΠΈΡΡŽΡ‚ Π½Π° ΠΈΡ‚ΠΎΠ³ΠΎΠ²ΡƒΡŽ Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ.

ΠšΡ€ΠΎΠΌΠ΅ Ρ‚ΠΎΠ³ΠΎ, ΠΎΠ±ΡΡƒΠΆΠ΄Π°ΡŽΡ‚ΡΡ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ профилирования ΠΈ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ для ΠΈΡ… развёртывания Π½Π° устройствах NVIDIA Jetson Orin Nano.

πŸ”— Бсылка

4. Π£Ρ‡Π΅Π±Π½ΠΈΠΊ ΠΏΠΎ дистилляции Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ‚ Keras ⭐️

ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎ описываСтся концСпция дистилляции Π·Π½Π°Π½ΠΈΠΉ ΠΈ Π΅Π΅ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ΅ мСдицинских ΠΈΠ·ΠΎΠ±Ρ€Π°ΠΆΠ΅Π½ΠΈΠΉ.

πŸ”—Github
πŸ”—Π£Ρ‡Π΅Π±Π½ΠΈΠΊ Keras

5. Руководство ΠΏΠΎ дистилляции ΠΎΡ‚
huggingface πŸ€—

Π—Π΄Π΅ΡΡŒ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, ΠΊΠ°ΠΊ Π²Ρ‹ΠΏΠΎΠ»Π½ΡΡ‚ΡŒ Π΄ΠΈΡΡ‚ΠΈΠ»Π»ΡΡ†ΠΈΡŽ Π·Π½Π°Π½ΠΈΠΉ шаг Π·Π° шагом Π½Π° ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½ΠΎΠΌ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π΅.

πŸ”— Бсылка

6. Дистилляция Π·Π½Π°Π½ΠΈΠΉ для Π·Π°Π΄Π°Ρ‡ ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠ³ΠΎ зрСния ΠΎΡ‚ huggingface πŸ‘

Π—Π΄Π΅ΡΡŒ рассматриваСтся, ΠΊΠ°ΠΊ ΡΠ΄Π΅Π»Π°Ρ‚ΡŒ Ρ„Π°ΠΉΠ½Ρ‚ΡŽΠ½ ViT-ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² MobileNet с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ API Trainer ΠΈΠ· Transformers.

πŸ”—Π‘ΡΡ‹Π»ΠΊΠ°

#KnowledgeDistillation #Distillation #openai #keras #tutorial #course #freecourses #huggingface #Nvidia #pytorch
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM