Henok | Neural Nets
1.61K subscribers
233 photos
20 videos
13 files
157 links
Download Telegram
I guess I can finally call myself a researcher now lol ! Itโ€™s always been a dream to present my work.
๐Ÿ”ฅ25๐ŸŽ‰7โค5
I got many requests and questions about research and ML in the past few days and today I want to make a group to work on something. Probably this could be your first research work. To make the best out of it, I'll take 5-6 people as core members and incase we need more people we'll add some.

If you got any interesting ideas or maybe if you are curios about AI research, come join us.

The target is to make a cool work and hopefully publish a paper.

I'll try to reply for every DM and we will see if you are a great match for this.โœŒ๏ธ
โค9๐Ÿ”ฅ8
To Code, or Not To Code? Exploring Impact of Code in Pre-training

So apparently adding some code data in your pretraining data increases reasoning and improves non-code tasks๐Ÿค”. I've seen this in a work from Neurips 2023 led by Niklas Muennighoff and now this work here goes in depth into it. My only concern is that they train 64 models ranging from 470M to 2.8B parameters and it's not clear if this applies to models with larger parameters.

If you are having some issues in Amharic llms try to add some python code data and see if it improves. I'll soon update you on it, once I got the results.
โคโ€๐Ÿ”ฅ7๐Ÿ˜1
buildspace is closing๐Ÿ˜ž
๐Ÿ˜ข3
Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the generated diffs), and doing a bit of "half-coding" where you write the first chunk of the code you'd like, maybe comment it a bit so the LLM knows what the plan is, and then tab tab tab through completions. Sometimes you get a 100-line diff to your code that nails it, which could have taken 10+ minutes before.

I still don't think I got sufficiently used to all the features. It's a bit like learning to code all over again but I basically can't imagine going back to "unassisted" coding at this point, which was the only possibility just ~3 years ago.



Source Karpathy
๐Ÿ‘15๐Ÿคฎ3โค2
This is why @baydis and @beka_cru and many of you couldn't make it to YC lol
๐Ÿคฃ7๐Ÿ˜4
Forwarded from Frectonz
My nixpkgs PR got merged after 2 weeks. I packaged my ethiopian calendar TUI app mekuteriya for nix.

nix shell nixpkgs#mekuteriya


I'm officially a NixOS package maintainer now.

https://github.com/NixOS/nixpkgs/pull/333690

๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰
โšก12๐Ÿ”ฅ1
Alright alright, what's good people๐Ÿ˜
๐Ÿ˜ฑ5๐Ÿ˜4โšก1
Guess what, he loves Ethiopian food too. Will get a lunch with him some day
โšก11โค3
Forwarded from Chapi Dev Talks (Chapi M.)
I asked
how many r's in strawberry

In multiple ai models and here is the results.

Only Gemini got it right.
Counting is one of the active problems of most llms if not all.
๐Ÿ‘3
๐Ÿ“Why AI canโ€™t spell โ€˜strawberryโ€™.


This is a blog from Techcrunch released just yesterday, most of the researchers have their view on why this happen.

From the blog "As these memes about spelling โ€œstrawberryโ€ spill across the internet, OpenAI is working on a new AI product code-named Strawberry, which is supposed to be even more adept at reasoning..."

https://techcrunch.com/2024/08/27/why-ai-cant-spell-strawberry/
Can Neural Networks Learn to Reason? by Samy Bengio

The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data โ€“ generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize. How much do neural networks rely on memorization โ€“ seeing highly similar training examples โ€“ and how much are they capable of human-intelligence styled reasoning โ€“ identifying abstract rules underlying the data?
https://youtu.be/lCSdC8b0MrY
๐Ÿคฏ100M Token Context Windows

I don't like to hype about AI but this is some next level stuff and all of you devs are done.

While the commercial applications of these ultra-long context models are plenty, at Magic we are focused on the domain of software development.

Itโ€™s easy to imagine how much better code synthesis would be if models had all of your code, documentation, and libraries in context, including those not on the public internet.



https://magic.dev/blog/100m-token-context-windows
๐Ÿ”ฅ3
I didn't expect Jane Street would invest in such things, I guess there is a lack of Ocaml devs.
๐Ÿ˜2
Forwarded from Samson Endale ๐Ÿ‡ช๐Ÿ‡น
Just to be clear, I want us (human civilization) to have AGI no mater the cost BUT my issue is over hyping

I hope we will have strong AI one day but until that days comes, I'm gonna be skeptic about this trend.
๐Ÿ‘2