Andrej Karpathy posted this:
LLM OS.
Specs:
• LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)
• RAM: 128Ktok
• Filesystem: Ada002
INTERESTING!!!🤔
#AI #AndrejKarpathy
@Dagmawi_Babi
LLM OS.
Specs:
• LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)
• RAM: 128Ktok
• Filesystem: Ada002
INTERESTING!!!
#AI #AndrejKarpathy
@Dagmawi_Babi
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
Wake up babe, Andrej Karpathy just dropped a new video about LLMs
• https://youtu.be/zjkBMFhNj_g?si=Ep2yJ81MenrKG0RQ
#AndrejKarpathy #AI #Tutorials #YouTube
@Dagmawi_Babi
• https://youtu.be/zjkBMFhNj_g?si=Ep2yJ81MenrKG0RQ
#AndrejKarpathy #AI #Tutorials #YouTube
@Dagmawi_Babi
Dagmawi Babi
Wake up babe, Andrej Karpathy just dropped a new video about LLMs • https://youtu.be/zjkBMFhNj_g?si=Ep2yJ81MenrKG0RQ #AndrejKarpathy #AI #Tutorials #YouTube @Dagmawi_Babi
Intro to LLM by Andrej Karpathy.pdf
41.5 MB
Just watched his lecture and damn it's so good. So much to think about. Here's his powerpoint.
#ML #AI #AndrejKarpathy #LLM
@Dagmawi_Babi
#ML #AI #AndrejKarpathy #LLM
@Dagmawi_Babi
Good thought from Andrej Karpathy
"On shortification of learning
There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are learning (but actually they are just having fun). The people creating this content also enjoy it because fun has a much larger audience, fame and revenue. But as far as learning goes, this is a trap. This content is an epsilon away from watching the Bachelorette. It's like snacking on those "Garden Veggie Straws", which feel like you're eating healthy vegetables until you look at the ingredients.
Learning is not supposed to be fun. It doesn't have to be actively not fun either, but the primary feeling should be that of effort. It should look a lot less like that "10 minute full body" workout from your local digital media creator and a lot more like a serious session at the gym. You want the mental equivalent of sweating. It's not that the quickie doesn't do anything, it's just that it is wildly suboptimal if you actually care to learn.
I find it helpful to explicitly declare your intent up front as a sharp, binary variable in your mind. If you are consuming content: are you trying to be entertained or are you trying to learn? And if you are creating content: are you trying to entertain or are you trying to teach? You'll go down a different path in each case. Attempts to seek the stuff in between actually clamp to zero.
So for those who actually want to learn. Unless you are trying to learn something narrow and specific, close those tabs with quick blog posts. Close those tabs of "Learn XYZ in 10 minutes". Consider the opportunity cost of snacking and seek the meal - the textbooks, docs, papers, manuals, longform. Allocate a 4 hour window. Don't just read, take notes, re-read, re-phrase, process, manipulate, learn.
And for those actually trying to educate, please consider writing/recording longform, designed for someone to get "sweaty", especially in today's era of quantity over quality. Give someone a real workout. This is what I aspire to in my own educational work too. My audience will decrease. The ones that remain might not even like it. But at least we'll learn something."
#AndrejKarpathy #Tweets
@Dagmawi_Babi
"On shortification of learning
There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are learning (but actually they are just having fun). The people creating this content also enjoy it because fun has a much larger audience, fame and revenue. But as far as learning goes, this is a trap. This content is an epsilon away from watching the Bachelorette. It's like snacking on those "Garden Veggie Straws", which feel like you're eating healthy vegetables until you look at the ingredients.
Learning is not supposed to be fun. It doesn't have to be actively not fun either, but the primary feeling should be that of effort. It should look a lot less like that "10 minute full body" workout from your local digital media creator and a lot more like a serious session at the gym. You want the mental equivalent of sweating. It's not that the quickie doesn't do anything, it's just that it is wildly suboptimal if you actually care to learn.
I find it helpful to explicitly declare your intent up front as a sharp, binary variable in your mind. If you are consuming content: are you trying to be entertained or are you trying to learn? And if you are creating content: are you trying to entertain or are you trying to teach? You'll go down a different path in each case. Attempts to seek the stuff in between actually clamp to zero.
So for those who actually want to learn. Unless you are trying to learn something narrow and specific, close those tabs with quick blog posts. Close those tabs of "Learn XYZ in 10 minutes". Consider the opportunity cost of snacking and seek the meal - the textbooks, docs, papers, manuals, longform. Allocate a 4 hour window. Don't just read, take notes, re-read, re-phrase, process, manipulate, learn.
And for those actually trying to educate, please consider writing/recording longform, designed for someone to get "sweaty", especially in today's era of quantity over quality. Give someone a real workout. This is what I aspire to in my own educational work too. My audience will decrease. The ones that remain might not even like it. But at least we'll learn something."
#AndrejKarpathy #Tweets
@Dagmawi_Babi
Andrej Karpathy has left OpenAI
He left to focus on his own projects (mainly building his own Jarvis) which is soooo interesting. He's finally free to do whatever he wants. He's got the cash, the brains, the tools and resources, the connections. This should be epic!
And ofcourse he's gonna be making videos for us. He has already started making the next video.
#AndrejKarpathy #OpenAI
@Dagmawi_Babi
He left to focus on his own projects (mainly building his own Jarvis) which is soooo interesting. He's finally free to do whatever he wants. He's got the cash, the brains, the tools and resources, the connections. This should be epic!
And ofcourse he's gonna be making videos for us. He has already started making the next video.
#AndrejKarpathy #OpenAI
@Dagmawi_Babi
Babe wakeup Andrej just dropped a new video
Let's build the GPT Tokenizer
• https://www.youtube.com/watch?v=zduSFxRajkE
Or in my files channel
• https://t.me/c/1156511084/805
#YouTube #AndrejKarpathy #GPT
@Dagmawi_Babi
Let's build the GPT Tokenizer
• https://www.youtube.com/watch?v=zduSFxRajkE
Or in my files channel
• https://t.me/c/1156511084/805
#YouTube #AndrejKarpathy #GPT
@Dagmawi_Babi
One of the greatest living AI scientist thinks C and Python are the absolute best languages. Hmm something to think about.
#AndrejKarpathy
@Dagmawi_Babi
#AndrejKarpathy
@Dagmawi_Babi
Babe wake up Andrej dropped a video
• youtu.be/l8pRSuU81PU
Or in our files channel
• https://t.me/c/1156511084/935
4 hours long. 🤯
#YouTube #AndrejKarpathy #GPT2
@Dagmawi_Babi
• youtu.be/l8pRSuU81PU
Or in our files channel
• https://t.me/c/1156511084/935
4 hours long. 🤯
#YouTube #AndrejKarpathy #GPT2
@Dagmawi_Babi
Dagmawi Babi
Babe wake up Andrej dropped a video • youtu.be/l8pRSuU81PU Or in our files channel • https://t.me/c/1156511084/935 4 hours long. 🤯 #YouTube #AndrejKarpathy #GPT2 @Dagmawi_Babi
From Andrej:
"The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize it to train very fast
- then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers
- then we bring up model evaluation, and
- then cross our fingers and go to sleep.
In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model.
This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.
The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step.
• github.com/karpathy/build-nanogpt"
🔥🔥🔥🔥
#YouTube #AndrejKarpathy #GPT2
@Dagmawi_Babi
"The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize it to train very fast
- then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers
- then we bring up model evaluation, and
- then cross our fingers and go to sleep.
In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model.
This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.
The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step.
• github.com/karpathy/build-nanogpt"
🔥🔥🔥🔥
#YouTube #AndrejKarpathy #GPT2
@Dagmawi_Babi