Self Supervised Boy
160 subscribers
9 photos
56 links
Posting links to papers I read. Right now I'm mostly interested in things around LLMs, AI agents, and ML4Code. That is subject to change.

@martolod
Download Telegram
Channel created
Who are you? I'm PhD student doing DL research, preferably about weak/self supervision. Or even unsupervised things as well.
What happens? I'm writing here some reviews of papers I read.
Why the hell? Because it allows me to practice writing, and to understand papers I read deeper.
So what? I will be happy if it's somehow interesting to someone else. Anyways, here's my archive: https://www.notion.so/Self-Supervised-Boy-papers-reading-751aa85ffca948d28feacc45dc3cb0c0.
Self-training über alles. Another paper on self-training by Le Quoc.
They compared self-training with supervised and self-supervised pre-training for different tasks. Self-training seemingly works better, while pre-training even hurts final quality when enough labeled data is available or strong augmentation is applied.
Main practical takeaway is, self-training adds quality even after pre-training. So, it could be worthy to self-train your baseline models to have better start.
More detailed with tables here: https://www.notion.so/Rethinking-Pre-training-and-Self-training-e00596e346fa4261af68db7409fbbde6
Source here: https://arxiv.org/pdf/2006.06882.pdf
Unsupervised segmentation with autoregressive models. Authors proposed to scan image with different scanning orders and request that the close pixels produce close embeddings independently of the scanning order.
SoTA across the unsupervised segmentations.
More detailed with images and losses here: https://www.notion.so/Autoregressive-Unsupervised-Image-Segmentation-211c6e8ec6174fe9929e53e5140e1024
Source here: https://arxiv.org/pdf/2007.08247.pdf
One more update on Teacher-Student paradigm by Le Quoc.
Now Teacher is continuously updated to direct Student towards optimum w.r.t. the labeled data. On each step we took update gradient for the Teacher model as the gradient towards current pseudo-label. Then we scale this gradient w.r.t. cosine distance between two gradients of the Student model: from unlabeled and labeled data.
Achieved new SoTA on ImageNET (+1.6% top-1 acc).

More detailed with formulas here: https://www.notion.so/Meta-Pseudo-Label-b83ac7b7086e47e1bef749bc3e8e2124
Source here: https://arxiv.org/pdf/2003.10580.pdf
Oral from the ICLR 2021 on usage of teacher-student setup for cross-domain transfer learning. Teacher is trained on the labelled data and produces pseudolabels for the unlabelled data in target domain. This allows student to learn worthy in-domain representations and gain 2.9% of accuracy on one-shot learning with relatively low training effort.

With more fluff here: https://www.notion.so/Self-training-for-Few-shot-Transfer-Across-Extreme-Task-Differences-bfe820f60b4b474796fd0a5b6b6ad312
Source here: https://openreview.net/pdf?id=O3Y56aqpChA