Sparse Hash AI

Seeing the World through Your Eyes
https://world-from-eyes.github.io/

👍1

62 views16:07

This media is not supported in your browser

Flow-matching implementation

Flow-matching is very similar to diffusion, but simplifies things. Noised images are linear interpolations between (data, noise) pairs, and the network predicts *velocity* of this trajectory.

post

54 views13:02

Sparse Hash AI

0:33

This media is not supported in your browser

VIEW IN TELEGRAM

Непись Character-1 от @hedra_labs

https://www.hedra.com/

64 viewsedited 12:08

Sparse Hash AI

Forwarded from То шо нейросети

Ускоряем Grokking до 50 раз
Grokfast - технология\подход для ускорения возникновения феномена grokking'а за счет усиления низкочастотных компонент градиентов параметров с помощью дополнительного механизма поверх используемого оптимизатора.

Буквально в пару строк:

from grokfast import gradfilter_ma, gradfilter_ema


# Insert the following line before the training loop.
grads = None

# Between loss.backward() and optimizer.step(), insert one of the following line. Make sure model is of type nn.Module and grads are initialized properly before the training loop:
# ... in the optimization loop.
loss.backwards() # Calculate the gradients.

### Option 1: Grokfast (has argument alpha, lamb)
grads = gradfilter_ema(model, grads=grads, alpha=alpha, lamb=lamb)
### Option 2: Grokfast-MA (has argument window_size, lamb)
# grads = gradfilter_ma(model, grads=grads, window_size=window_size, lamb=lamb)

optimizer.step() # Call the optimizer.
# ... logging & other codes.

Github
Paper

@toshoseti

56 views12:51

Sparse Hash AI

This media is not supported in your browser

VIEW IN TELEGRAM

👍2

57 views18:40

Sparse Hash AI

golf pattern 1 ( 214 chars )
https://www.shadertoy.com/view/XXtXWM

весь код

void mainImage(out vec4 O, vec2 u)
{
    u = 12.*(u+u - (O.xy=iResolution.xy)) /O.y;
    float a,r;                   
    for(int i; i++<132;O[i%4] = .5-.5*cos(a+a) )
         r = dot(u,u),
         r>1. ? u *= r = 1./r : u,         
         u *= mat2(cos(10.107+vec4(0,33,11,0))) * 5.662,
         u.y += 1.62,            
         a = a*.99+r;       
}

Shadertoy

Build shaders, share them, and learn from the best community.

61 viewsedited 20:12

Sparse Hash AI

❤1

64 views20:13

Sparse Hash AI

This media is not supported in your browser

VIEW IN TELEGRAM

Luma AI

66 views18:56

Sparse Hash AI

The Remarkable Robustness of LLMs: Stages of Inference?
https://arxiv.org/abs/2406.19384

В работе изучалось как удаление и перестановки слоёв сказываются на качестве вывода LLM.

В результате обнаружили, что модель расслаивается по глубине на функциональные блоки: "detokenization, feature engineering, prediction ensembling, and residual sharpening".

👍1

62 views16:13

Sparse Hash AI

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
https://arxiv.org/abs/2404.05405

LLMs can and only can store 2 bits of knowledge per parameter.

Для достижения емкости 2 бита/параметр необходимо, чтобы каждая часть знаний была посещена 1000 раз во время обучения, что называется 1000-экспозицией, чтобы отличать ее от традиционной терминологии «1000 проходов», поскольку за один проход данных можно раскрыть часть знаний 1000 раз.

ист

🤔1

86 viewsedited 20:41

Sparse Hash AI

0:40

This media is not supported in your browser

VIEW IN TELEGRAM

Вырастет и всем припомнит.

👍1

63 views15:59

Sparse Hash AI

Предобученные LLM на файнтюне могут запомнить данные с единственного предъявления. И эта меморизация выдаёт себя на графике лосса характерными ступеньками.

Can LLMs learn from a single example?
https://www.fast.ai/posts/2023-09-04-learning-jumps/

X

68 views17:38

Sparse Hash AI

0:58

This media is not supported in your browser

VIEW IN TELEGRAM

49 views19:33

Sparse Hash AI

0:57

This media is not supported in your browser

VIEW IN TELEGRAM

Stephen Wolfram says the concept of a soul is a description of the computational essence of a mind, an abstraction that is independent of the details of the physical substrate on which it runs.

49 views19:45

Sparse Hash AI

This media is not supported in your browser

VIEW IN TELEGRAM

Gen-3 Alpha

49 views13:39

Sparse Hash AI

Learning to (Learn at Test Time): RNNs with Expressive Hidden States
https://arxiv.org/abs/2407.04620

ttt-lm-jax: Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
https://github.com/test-time-training/ttt-lm-jax

ttt-lm-pytorch: Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
https://github.com/test-time-training/ttt-lm-pytorch

Мы разработали новую архитектуру, которая заменяет скрытое состояние RNN моделью машинного обучения. Эта модель сжимает контекст посредством фактического градиентного спуска на входных токенах. Мы называем наш метод «Test-Time-Training layers».

Слои TTT напрямую заменяют внимание и открывают архитектуру линейной сложности с выразительной памятью, что позволяет нам обучать LLM с помощью миллионов (а иногда и миллиардов) токенов в контексте.

match or beat the strongest Transformers and Mamba

X X

69 views16:19

Sparse Hash AI

0:18

This media is not supported in your browser

VIEW IN TELEGRAM

👍1

51 views20:09

Sparse Hash AI

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
https://arxiv.org/abs/2212.10559

https://github.com/microsoft/LMOps/tree/main/understand_icl

В статье исследователи математически доказали, что in-context информация имеет эффект, аналогичный градиентному спуску, который обновляет веса внимания zero-shot промпта.

GPT сначала создает мета-градиенты согласно демонстрационным примерам, а затем эти мета-градиенты применяются к исходному GPT для построения модели ICL.

Экспериментальные результаты показывают, что in-context обучение ведет себя аналогично явному файнтюнингу. Авторы разработали momentum-based attention, который дал улучшение производительности.

57 viewsedited 17:14

About

Blog

Apps

Platform