πŸ’‘ Remember Box
10 subscribers
2.17K photos
29 videos
1.82K files
11.8K links
πŸ“ Interesting articles
πŸ—ž Ideas & TodoπŸ’‘
πŸ‘“ Random stuff
🎢 Music
πŸ€” Thoughts
πŸ“• Books
πŸ“š Courses
πŸ“Ί Videos
πŸ“ Papers
πŸ•Έ Websites/Blogs
πŸŽ™ Podcasts
πŸ„ Spirituality

In the pursuit of excellence!

The aim is to discover interesting ideas and perspectives.
Download Telegram
πŸ„πŸ™πŸ¦’πŸœπŸ¦ πŸπŸŸπŸΊ
Learn finite field theory.
Learn quantum error correction.
Learn enough algebraic geometry to understand triality.
Exploring the mechanisms of action of probiotics in depression - https://doi.org/10.1016/j.jad.2025.01.153
An opinionated perspective on the solution is the last moat
Being able to build and understand the best way to solve a problem isn't remotely the same thing
find weak spots that exist due to coordination failures or attention scarcity, not technical difficulty
The CUDA kernel space is crowded
The Metal/MLX kernel space is EMPTY
Everyone with a MacBook wants faster local inference
Almost nobody is doing serious Metal kernel optimization
#todo implement paged attention kernel for mlx (decode + prefill), then build block manager (allocator + block tables + copy-on-write), wrap into paged model that swaps mlx-lm attention, finally integrate as sglang mlx runner.

paged attention is the core primitive that makes sglang useful, without it sglang on mlx is just regular inference.