Henok | Neural Nets
1.61K subscribers
233 photos
20 videos
13 files
157 links
Download Telegram
100k is a lotttt.
😁8🤔1
This media is not supported in your browser
VIEW IN TELEGRAM
Hasab AI 🔥

Here is a great practical use case of ML in Ethiopia. The inference is really optimized 🔥.

Take a look at the demo.

https://www.hasab.ai/
🔥28
Why is the field of robotics so slow in the past 15 years?

Boston Dynamics is the only one I can think of.
💯6
Let's have fun😁
🤣27😁7
take it easy bro, where is the fun :)
😁3
My favorite open source model series. The Llama 4 herd is out

https://ai.meta.com/blog/llama-4-multimodal-intelligence/
🔥11
What are some *practical* ways for reducing input tokens (e.g. chunking, summarizing, selective filtering) ? Just looking for something that worked well.
Forwarded from Debugging Epohul (epohul)
I'm craving a PhD
❤‍🔥3
I'm not signing a doc to access a so called "open source dataset", it's also very clear that i won't be able to develop a model with such small data, let alone use for commercial purposes.
😁9😭2🥱1
Where is EthioNLP? Probably one of the leading initiatives in Ethiopian NLP, Open source datasets and models, and the best NLP community in Ethiopia. They have listed Ghana NLP but not EthioNLP😂 this is hilarious.

https://www.gsma.com/solutions-and-impact/connectivity-for-good/mobile-for-development/wp-content/uploads/2025/04/AI-in-Ethiopia.pdf
🤣5😢2
Oh wow, Deepseek is getting some hit back. Probably it's political
🤔4
Gemma-3-27b-it Parameter Breakdown

| Component | Parameters | Percent
|:---------------|------------------|----------
| Feed-Forward | 21,770,514,288 | 79.36%
| Attention | 4,239,205,376 | 15.45%
| Embedding | 1,415,027,328 | 5.16%
| Other | 6,324,096 | 0.02%
| LayerNorm | 1,335,552 | 0.00%

Total Trainable Parameters: 27,432,406,640 (27.4B🤯)

So the model architecture

- They've 27 Siglip vision transformer layers with self-attention and MLP blocks. The vision part heavily influences multi-modal capabilities, combining visual context with linguistic understanding.


- The language model architecture got 62 Gemma3DecoderLayers, each featuring sophisticated self-attention with rotary embeddings, intricate RMS normalizations, and extensive MLP layers for robust textual modeling.

I'll write about it in depth about each of those and compare it with other models and why it was able to work on single gpu.
🔥4👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
Every time my model training is almost done
😁21
💔8🤣5👍4😢2👌2