Vol Building AGI

The definition of stabilze:


def stabilize(f_, i_):
    "stabilize and activate forget and input gates"
    m = max_scan(f_, i_)
    m_prev = F.pad(m[:, :-1, :], (0,0,1,0))

    i = (i_ - m).exp()
    f = (f_ + m_prev - m).sigmoid()
    return f, i

max_scan is based on a Blelloch scan with a (max, +) semiring

🤓1

60 viewsedited 09:25

Vol Building AGI

https://x.com/haoailab/status/1788269848788869299?

Diffusion consistency has been applied together with Jacobi decoding to get 3x speed up over autoregressive loops. Consistency finetuning can be applied to existing autoregressive LMs to get efficient inference.

This might be more important than maintaining constant memory during autoregressive looping (working on RNNs).

X (formerly Twitter)

Hao AI Lab (@haoailab) on X

People often see LLMs as sequential decoders, but we show they can be easily adapted as fast parallel decoders!🔥🚀

Announcing consistency LLMs: teaching LLMs to predict the fixed point from any point on its Jacobi decoding trajectory
- LLM can fast forward…

65 viewsedited 06:37

Vol Building AGI

https://github.com/shashankvkt/DoRA_ICLR24

Pretraining on object tracking on 10 long form videos beats ImageNet pretraining

GitHub

GitHub - shashankvkt/DoRA_ICLR24: This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video?…

This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video"" - shashankvkt/DoRA_ICLR24

🤯1

66 views08:34

Vol Building AGI

An insight from Shida Wang — nonlinearity of recurrence impacts universal approximation ability but not memory capacity. Exponential parametrization of the recurrence operator (like xLSTM) improves the optimization landscape.

https://github.com/radarFudan/Curse-of-memory

GitHub

GitHub - radarFudan/Curse-of-memory: Curse-of-memory phenomenon of RNNs in sequence modelling

Curse-of-memory phenomenon of RNNs in sequence modelling - radarFudan/Curse-of-memory

72 viewsedited 09:45

Vol Building AGI

New generative model in town: learning to do nothing

https://assafshocher.github.io/IGN/

67 views10:29

Vol Building AGI

Kyunghyun Cho is building an AI drug design system. He is also known for GRU and contributions to machine translation.

He is currently working on protein sequence design by generative modeling over a database of sequences, property classification of generated samples and black box optimization to find sample sets on the Pareto frontiers of multiple objectives induced by property classifiers (translation people: think of Minimum Bayes Risk) to send to the lab for validation. This loop describes the second and third pipeline steps on the photo. Eventually he wants to *backprop * through the whole loop.

Currently the forward pass takes more than 100 years, from discovering the role of pancreas in diabetes to approval of Semaglutide.

❤1🤯1

69 views13:13

Vol Building AGI

Research hint from Yann LeCun: figure out where transformer loss spikes come from. They don’t usually happen in convnets. My thought: convnets do not have input dependent weights unlike Transformers.

Also work on Q* with hierarchical time

69 viewsedited 15:42

Vol Building AGI

My favorite AI paper from ICLR is OMNI, which is also my favorite NeurIPS workshop paper by Jenny Zhang et al.

Jenny develops on an idea that Juergen calls PowerPlay; you give your AI agent tasks that are
1. Learnable, by measuring learning progress as a fraction of successes. Also can be used to track forgetting.
2. Interesting, by human notion of interestingness encoded into gpt-4

(1) has been known how to do before
(2) has been done with measures like novelty or artificial curiosity. Novel stuff is not always interesting! Which is where you need GPT to get true general agents — GPT has learned interestingness from Reddit experts and all scientific papers.

https://www.jennyzhangzt.com/omni/

OMNI

👍1🤯1

80 viewsedited 16:10

Vol Building AGI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890294/pdf/CN-20-2267.pdf

75 views16:56

Vol Building AGI

Reinforcement learning followed by mechanistic interpretability on mice modulated by ketamine

78 views16:56

Vol Building AGI

Why and how to initialize neural networks? We would like them to undergo "nontrivial" updates when going through stochastic gradient descent — a regime that is also called "feature learning".

This regime asks for the the norm of the hidden features *and their gradient* scaling like the square root of the width of the feature.

You can achieve this desiderata by scaling your initialization and learning rate to preserve certain properties of the spectral norm of your weights and gradients.

By staying in the feature learning regime, you also get predictable hyperparameter transfer from small-scale development proxy models to large-scale expensive training runs — something you don't get from PyTorch defaults.

https://arxiv.org/abs/2310.17813

87 views14:07

Vol Building AGI

Jan Leike authored his PhD thesis on Nonparametric General Reinforcement Learning showing that Hutter's incomputable but optimal AGI, AIXI, collapses under degenerate choices of priors and can't be optimally approximated using finite computation. He then shows that AIXI can be approximated epsilon-optimally and provides alternatives to asymptotically optimal learning in stochastic environments based on Thompson sampling.

https://jan.leike.name

jan.leike.name

Jan Leike

My research, publications, and contact info

86 viewsedited 07:47

Vol Building AGI

I'm finally citing Schlesinger!

👍2🤯1🎉1

86 views16:46

Vol Building AGI

https://sites.google.com/view/ngsmworkshop

Call For Papers: ICML 2024 Workshop on Next Generation of Sequence Modeling Architectures

Submission Deadline: May 31, 2024 (Anywhere on Earth)
Acceptance Notification: June 17, 2024 (Anywhere on Earth)

I am happy to be serving as a reviewer for this workshop. Looking forward to learning new insights into sequence models from you.

Google

NGSM

Description

🔥1

80 views09:05

Vol Building AGI

My best science joke so far

92 views21:18

Vol Building AGI

tired: github streaks
wired: wandb streaks

96 views22:09

Vol Building AGI

New challenge for signal recognition: the bandwidth has increased

https://content.neuralink.com/compression-challenge/README.html

83 views21:48

Vol Building AGI

Automatic learning rate transfer across sizes is now easier to use: https://github.com/jxbz/modula/tree/main

GitHub

GitHub - jxbz/modula: Scalable neural net training via automatic normalization in the modular norm.

Scalable neural net training via automatic normalization in the modular norm. - jxbz/modula

🔥1

79 viewsedited 19:46

Vol Building AGI

Manifest AI is working on linear transformers and context scaling. At the end of this article authors discuss what is possible when you push the current context size limits — at billions of tokens you won't need finetuning any more — you'll just be able to push your entire dataset into the context window.

Currently open source LMs are at thousands of tokens, industrial grade LMs are at the millions of tokens — there's a lot of work left to push this frontier. In transformers we are simply concatenating token embeddings to the memory, and we will need some automatic compression to get past this.

https://manifestai.com/articles/compute-optimal-context-size/

Manifestai

Manifest AI - Compute-Optimal Context Size

84 views08:24

Vol Building AGI

4o has become so good at math that it can analyze recurrences for me (and tolerate my typos)

84 viewsedited 13:22

Vol Building AGI

Doing symbolic differentiation with loops is a piece of cake, I don't have to explain what a "backwards pass" is: https://chatgpt.com/share/858b2882-9d29-442e-a0cb-7e3afb24abab

Openai

ChatGPT

A conversational AI system that listens, learns, and challenges

89 viewsedited 13:29

About

Blog

Apps

Platform