Jetson Nano
Jensen‘s doing little weird presentations recently but the new iteration of Nano should be good.
Models to test are here, use docker.
Jensen‘s doing little weird presentations recently but the new iteration of Nano should be good.
Models to test are here, use docker.
YouTube
Introducing NVIDIA Jetson Orin™ Nano Super: The World’s Most Affordable Generative AI Computer
The NVIDIA Jetson Orin™ Nano Super Developer Kit’s performance, compact size, and low cost are redefining generative AI for small edge devices.
At just $249, it provides developers, students, and builders with the most affordable and accessible platform…
At just $249, it provides developers, students, and builders with the most affordable and accessible platform…
RTX 5000*
New Year will start with a CES release of the next RTX line.
With rumors of 600W to power it up, home nuclear reactors speed up.
Laptop chargers will be used as foot warmers even more.
The price might also be hot, but 32GB VRAM in a consumer card may worth it.
New Year will start with a CES release of the next RTX line.
With rumors of 600W to power it up, home nuclear reactors speed up.
Laptop chargers will be used as foot warmers even more.
The price might also be hot, but 32GB VRAM in a consumer card may worth it.
TechRadar
Nvidia’s RTX 5090 now rumored to have superfast clock speeds – as well as being super-slim – could this GPU be too good to be true?
Does something not add up here? We’re hopeful something special is in the works, but wary of these rumors
Open Source Training
One of the good articles read on NY holidays is by Colin Rafael.
He writes about the way of training models the same way we work on software with forks, merges and pull requests.
Not sure if that may work well. For example, adding a new capability to the model will significantly change the core logic of the project.
Plus, as we wait for new architectures, changes might become discontinious quite fast.
It is all in case people will not forget how to use git properly with all the co-pilots just for traditional software dev.
One of the good articles read on NY holidays is by Colin Rafael.
He writes about the way of training models the same way we work on software with forks, merges and pull requests.
Not sure if that may work well. For example, adding a new capability to the model will significantly change the core logic of the project.
Plus, as we wait for new architectures, changes might become discontinious quite fast.
It is all in case people will not forget how to use git properly with all the co-pilots just for traditional software dev.
messestadt
o1 w Dalle Not yet for visual explanations, at least looks nice
Explainable Visualizations
Really useful multimodal models will become when they will be able to explain things not by tons of text but with pictures and videos.
Anyway everyone's brains are already adopted to shorts and reels. It also related to engineers and scientists in some way.
So the new approach seen recently is generating code for videos using Manim library written by 3blue1brown.
He generated his videos like this or this with it since years ago.
There's a quick version to try.
Official repo says:
Note, there are two versions of manim. This repository began as a personal project by the author of 3Blue1Brown for the purpose of animating those videos, with video-specific code available here. In 2020 a group of developers forked it into what is now the community edition, with a goal of being more stable, better tested, quicker to respond to community contributions, and all around friendlier to get started with. See this page for more details.
I would use community version (it also works with GPUs)
After all, latest models like deepseek or Claude are able to generate quite efficient code if the problem to visualize is described good enough.
Really useful multimodal models will become when they will be able to explain things not by tons of text but with pictures and videos.
Anyway everyone's brains are already adopted to shorts and reels. It also related to engineers and scientists in some way.
So the new approach seen recently is generating code for videos using Manim library written by 3blue1brown.
He generated his videos like this or this with it since years ago.
There's a quick version to try.
Official repo says:
Note, there are two versions of manim. This repository began as a personal project by the author of 3Blue1Brown for the purpose of animating those videos, with video-specific code available here. In 2020 a group of developers forked it into what is now the community edition, with a goal of being more stable, better tested, quicker to respond to community contributions, and all around friendlier to get started with. See this page for more details.
I would use community version (it also works with GPUs)
After all, latest models like deepseek or Claude are able to generate quite efficient code if the problem to visualize is described good enough.
DeepSeek
A reminder that DeepSeek is LLM without multimodal capabilities while frontier is somewhere on World Foundation Models.
Anyway, it is a very good model with fun Chinese cultural restrictions.
In reality the rule for LLMs is the same: if not working in NLP since long ago - just focus on your specific things and don't worry much.
Those guys will have a bloodbath competition for some more time, and maybe on NVIDIA GPUs for some years more (not guaranteed).
Meanwhile, a next hype release from another deep Chinese people might be cool.
In March is a release of Deep Robotics European partner - Inmotion Robotic.
Comments from Chinese colleagues about meeting in Europe more communists than in China are deeply related. Interesting how local comintern will react on Chinese capitalists robots starting in Frankfurt.
A reminder that DeepSeek is LLM without multimodal capabilities while frontier is somewhere on World Foundation Models.
Anyway, it is a very good model with fun Chinese cultural restrictions.
In reality the rule for LLMs is the same: if not working in NLP since long ago - just focus on your specific things and don't worry much.
Those guys will have a bloodbath competition for some more time, and maybe on NVIDIA GPUs for some years more (not guaranteed).
Meanwhile, a next hype release from another deep Chinese people might be cool.
In March is a release of Deep Robotics European partner - Inmotion Robotic.
Comments from Chinese colleagues about meeting in Europe more communists than in China are deeply related. Interesting how local comintern will react on Chinese capitalists robots starting in Frankfurt.
👻2
DeepSeek #2
The explanation of why the reaction is so nice from Jensen.
Corporate joke there is that company's strategy is mostly his thoughts and reasons.
Thoughts and reasons supposed to be following:
the company should feel okay even if all NVIDIA GPU architectures will become obsolete like tomorrow.
Projects like Omniverse, NIMs, DriveOS and Gr00t should sustain and be the foundation and association with the company.
GPU production is in a way a driver and considered to be gone in some time.
Losing 15% of stocks in a day doesn't feel nice in a moment anyway😶🌫️
The explanation of why the reaction is so nice from Jensen.
Corporate joke there is that company's strategy is mostly his thoughts and reasons.
Thoughts and reasons supposed to be following:
the company should feel okay even if all NVIDIA GPU architectures will become obsolete like tomorrow.
Projects like Omniverse, NIMs, DriveOS and Gr00t should sustain and be the foundation and association with the company.
GPU production is in a way a driver and considered to be gone in some time.
Losing 15% of stocks in a day doesn't feel nice in a moment anyway😶🌫️
MSN
Nvidia calls China’s DeepSeek R1 model ‘an excellent AI advancement’
Nvidia called DeepSeek's R1 model "an excellent AI advancement," despite the Chinese startup's emergence causing the chip maker's stock price to plunge 17%.
😁1
AI pwns
LLMs for cybersecurity summarised in the last sentence.
In reality nobody knows what happens there except specialists. A lot of hype tho is around Zero Trust, firms like Palo Alto and products like Morpheus
LLMs for cybersecurity summarised in the last sentence.
In reality nobody knows what happens there except specialists. A lot of hype tho is around Zero Trust, firms like Palo Alto and products like Morpheus
Clone Updates
Clone guys are probably making these update videos incredibly creepy on purpose to remind about uncanny valley.
It's very nice that they finally showed a water compressor inside of the body so it looks less like a fridge.
This year's release will be very cool, probably they'll show it in Poland first.
Clone guys are probably making these update videos incredibly creepy on purpose to remind about uncanny valley.
It's very nice that they finally showed a water compressor inside of the body so it looks less like a fridge.
This year's release will be very cool, probably they'll show it in Poland first.
YouTube
Protoclone: Bipedal Musculoskeletal Android V1
The Protoclone is a faceless, anatomically accurate, synthetic human with over 200 degrees of freedom, over 1,000 Myofibers, and over 200 sensors.
www.clonerobotics.com
www.clonerobotics.com
Prompt hints from Greg Brockman
Somewhere in a Linkedin feed it seemed pretty much useful. For a specific tasks this structure indeed may help to come up with a good response on a topic. The main thing is as usual to get the goal right.
1. Goal
Primary objective of the question. It tells the LLM what you want it to achieve.
2. Return Format
Defines how the information should be presented back. It’s a blueprint for the LLM response.
3. Warnings
Warnings (or constraints) tell the LLLM what to watch out for, like guardrails.
4. Context Dump
Extra background that helps the LLM to adapt the response to specific references.
Other hint from LLMs mentioned during hours of flow coding is that when the model starts its responses with 'Ah, now I see..' or 'I apologize, let me..' it is a good sign to start new thread all over..
Somewhere in a Linkedin feed it seemed pretty much useful. For a specific tasks this structure indeed may help to come up with a good response on a topic. The main thing is as usual to get the goal right.
1. Goal
Primary objective of the question. It tells the LLM what you want it to achieve.
2. Return Format
Defines how the information should be presented back. It’s a blueprint for the LLM response.
3. Warnings
Warnings (or constraints) tell the LLLM what to watch out for, like guardrails.
4. Context Dump
Extra background that helps the LLM to adapt the response to specific references.
Other hint from LLMs mentioned during hours of flow coding is that when the model starts its responses with 'Ah, now I see..' or 'I apologize, let me..' it is a good sign to start new thread all over..
👍3
GPU Rich
One research by Deloitte made 2 years ago revealed in a similar article today.
In short - China had an idea to redirect capital flows from real estate to building data centers. Now built so many that they hit oversupply even in the AI boom times.
In the same time DeepSeek is sometimes still lagging with throughput.
Deng Xiaoping would be questioned by all these moves.
One research by Deloitte made 2 years ago revealed in a similar article today.
In short - China had an idea to redirect capital flows from real estate to building data centers. Now built so many that they hit oversupply even in the AI boom times.
In the same time DeepSeek is sometimes still lagging with throughput.
Deng Xiaoping would be questioned by all these moves.
SemiAnalysis
Huawei AI CloudMatrix 384 – China’s Answer to Nvidia GB200 NVL72
Huawei is making waves with its new AI accelerator and rack scale architecture. Meet China’s newest and most powerful Chinese domestic solution, the CloudMatrix 384 built using the Ascend 910C. Thi…
Chinese chips
Worth noting this update by Huawei.
Still there’s no domestic fabric, NCCL and literally CUDA but there’s also no crucial hindrance for it to appear. Even if it’s not created/under development already. Plus it all be boosted by chinese ecosystem becoming bigger and more closed with recent political updates..
“The drawback here is that it takes 3.9x the power of a GB200 NVL72, with 2.3x worse power per FLOP, 1.8x worse power per TB/s memory bandwidth, and 1.1x worse power per TB HBM memory capacity.”
– Considering that there's almost no problem with domestic energy supply (or having Russia nearby) the power issue is not an issue, and the scale will easily come up with equivalent of H100-based clusters in US.
All this will probably be re-shifted with updates on nuclear reactors for datacenters. However, it is still not less than 3-4 years until the moment.
Worth noting this update by Huawei.
Still there’s no domestic fabric, NCCL and literally CUDA but there’s also no crucial hindrance for it to appear. Even if it’s not created/under development already. Plus it all be boosted by chinese ecosystem becoming bigger and more closed with recent political updates..
“The drawback here is that it takes 3.9x the power of a GB200 NVL72, with 2.3x worse power per FLOP, 1.8x worse power per TB/s memory bandwidth, and 1.1x worse power per TB HBM memory capacity.”
– Considering that there's almost no problem with domestic energy supply (or having Russia nearby) the power issue is not an issue, and the scale will easily come up with equivalent of H100-based clusters in US.
All this will probably be re-shifted with updates on nuclear reactors for datacenters. However, it is still not less than 3-4 years until the moment.
👀3
Claude_Sonnet_3.7_New.txt
61.9 KB
System prompts
Someone managed to get out a complete system guide for Claude Sonnet. Useless but good to understand what drives those llms on the final control level level🥲
Someone managed to get out a complete system guide for Claude Sonnet. Useless but good to understand what drives those llms on the final control level level🥲
🤯1
GPU Competition Updates
For better understanding - HPC and personal GPUs last years became 2 unconnected branches.
Both have common ground in technology and architecture but I would consider them as 2 parallel products.
Overall GPU world splits into this:
◾️Personal
◾️◾️Discrete (RTX x0xx, A6000 etc..)
◾️◾️Integral (Intel something)
◾️Data Center
◾️◾️Training (H100, B200 etc..)
◾️◾️ Inference (A30, L40 etc..)
◾️◾️ Others (TPUs, NPUs, BPUs)
There's also technically split in personal into discrete and embedded ones, but let's be serious here.
So far not so much, a lot of texts about AMD trying into both personal with RX 9060 and data centers MI350.
Cerebras trying even harder but into DC only. Groq is also somewhere there catching up.
This is for US.
EU being miserable with all possibilities has nothing while UK has Graphcore which also tries.. Maybe ARM will do smth but having research and production processes in Europe it's not earlier than 2030
The topic is discussed back and forth for America and Europe but there's always..
China
So far the most discussed is Ascend series by Huawei.
Ascends are linked into CloudMatrix as discussed above. Being a logical level of a DGX, CloudMatrix just has x6 GPUs in a rack inside (8 in DGX vs 32 in CloudMatrix). This in theory increases power consumption but who cares in China even though they can by some gas by the neighbor to be happy.
The one player overlooked is another Chinese firm - Biren.
Those guys have already created a chip comparable with H100 in FLOPS and bandwidth and seems to have some support to compete with Huawei.
*Not being a guru in internal Chinese tech policies by the CCD party, but seems like historically following the logic
of endorsing competition. Kai-Fu Lee in ‘AI Superpowers’ specifically describes a period of “mass innovation and mass entrepreneurship” in China.
Seems legit under the assumption that the central government is less communist than Europeans at the moment. However the reminder is always that even Stalin was artificially creating competition in industries (examples are aero construction bureaus and constructors personally like Mikoyan, Sukhoi, Iliyushin and others competing with each other for some valuable stuff instead of revenue).
It is creating another assumption that Biren (maybe others) will be bubbling up to stay along with Huawei in China's clouds.
By seing Espressif entering the loT market with its ESP32, crushing the competition in tiny boards, China can squeeze NVIDIA and AMD soon. Won't happen immediately, but it's more than realistic in the mid-long term..
For better understanding - HPC and personal GPUs last years became 2 unconnected branches.
Both have common ground in technology and architecture but I would consider them as 2 parallel products.
Overall GPU world splits into this:
◾️Personal
◾️◾️Discrete (RTX x0xx, A6000 etc..)
◾️◾️Integral (Intel something)
◾️Data Center
◾️◾️Training (H100, B200 etc..)
◾️◾️ Inference (A30, L40 etc..)
◾️◾️ Others (TPUs, NPUs, BPUs)
There's also technically split in personal into discrete and embedded ones, but let's be serious here.
So far not so much, a lot of texts about AMD trying into both personal with RX 9060 and data centers MI350.
Cerebras trying even harder but into DC only. Groq is also somewhere there catching up.
This is for US.
EU being miserable with all possibilities has nothing while UK has Graphcore which also tries.. Maybe ARM will do smth but having research and production processes in Europe it's not earlier than 2030
The topic is discussed back and forth for America and Europe but there's always..
China
So far the most discussed is Ascend series by Huawei.
Ascends are linked into CloudMatrix as discussed above. Being a logical level of a DGX, CloudMatrix just has x6 GPUs in a rack inside (8 in DGX vs 32 in CloudMatrix). This in theory increases power consumption but who cares in China even though they can by some gas by the neighbor to be happy.
The one player overlooked is another Chinese firm - Biren.
Those guys have already created a chip comparable with H100 in FLOPS and bandwidth and seems to have some support to compete with Huawei.
*Not being a guru in internal Chinese tech policies by the CCD party, but seems like historically following the logic
of endorsing competition. Kai-Fu Lee in ‘AI Superpowers’ specifically describes a period of “mass innovation and mass entrepreneurship” in China.
Seems legit under the assumption that the central government is less communist than Europeans at the moment. However the reminder is always that even Stalin was artificially creating competition in industries (examples are aero construction bureaus and constructors personally like Mikoyan, Sukhoi, Iliyushin and others competing with each other for some valuable stuff instead of revenue).
It is creating another assumption that Biren (maybe others) will be bubbling up to stay along with Huawei in China's clouds.
By seing Espressif entering the loT market with its ESP32, crushing the competition in tiny boards, China can squeeze NVIDIA and AMD soon. Won't happen immediately, but it's more than realistic in the mid-long term..
AMD
AMD Radeon™ RX Graphics Cards
AMD Radeon™ RX 9000 Series graphics are built on AMD RDNA™ 4 architecture for ultra-fast performance and stunning visuals, perfect for gamers and streamers.
Compiler Explorer
Turned out NV was working with Compiler Explorer maintainers for 3 years to implement real CUDA compilation directly in the web. Apparently there’s even an article now praising how ‘indispensable’ CE is for people, which is true.
Although, this tool is one of those hidden diamonds from 2000s which holds on an old man in the woods, it seems to be well maintained and will be hopefully upgraded with new features and visualisations.
Turned out NV was working with Compiler Explorer maintainers for 3 years to implement real CUDA compilation directly in the web. Apparently there’s even an article now praising how ‘indispensable’ CE is for people, which is true.
Although, this tool is one of those hidden diamonds from 2000s which holds on an old man in the woods, it seems to be well maintained and will be hopefully upgraded with new features and visualisations.
Kaolin
Nice workshop by some nice people.
From the latest CVPR.
Good to see Kaolin not dead as some projects on Omniverse orbit recently.
Long live 3D computer vision with warp, 3DGUT, and other Gaussian magic.
RIP to pure CUDA-speeded voxels.
Successors are XCubes and this.
Nice workshop by some nice people.
From the latest CVPR.
Good to see Kaolin not dead as some projects on Omniverse orbit recently.
Long live 3D computer vision with warp, 3DGUT, and other Gaussian magic.
RIP to pure CUDA-speeded voxels.
Successors are XCubes and this.
NVIDIA Developer
NVIDIA Kaolin Suite of Tools
Accelerate 3D deep learning research for neural fields, rendering, and more, with Kaolin Library, Kaolin Wisp, and Omniverse Kaolin app.
Red Hat Inference
Good talk by RedHat's inference specialist on how to evaluate and benchmark hosted models.
Explains treating different types of LLMs usage (RAGs, Agents, Orchestration)
In between, presenting GuideLLM for evaluation and testing.
Good talk by RedHat's inference specialist on how to evaluate and benchmark hosted models.
Explains treating different types of LLMs usage (RAGs, Agents, Orchestration)
In between, presenting GuideLLM for evaluation and testing.
YouTube
Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith
Accuracy scores and leaderboard metrics look impressive—but production-grade AI requires evals that reflect real-world performance, reliability, and user happiness. Traditional benchmarks rarely help you understand how your LLM will perform when embedded…
Docker Agents
Docker Hub is preparing for the new MCP agent kung foo in the future. Here described how it'll look like.
Promises to work with LangGraph, crew.ai (how tf they end up on crewai.com instead of crew.ai is a nonsense), Vercel, SpringAI and many others.
"Now, with just a compose.yaml, you can define your open models, agents, and MCP-compatible tools, then spin up your full agentic stack with a simple docker compose up. From dev to production (more on this later), your agents are wired, connected, and ready to run." (c)
There's even MCP Catalog already.
Docker Hub is preparing for the new MCP agent kung foo in the future. Here described how it'll look like.
Promises to work with LangGraph, crew.ai (how tf they end up on crewai.com instead of crew.ai is a nonsense), Vercel, SpringAI and many others.
"Now, with just a compose.yaml, you can define your open models, agents, and MCP-compatible tools, then spin up your full agentic stack with a simple docker compose up. From dev to production (more on this later), your agents are wired, connected, and ready to run." (c)
There's even MCP Catalog already.
NVIDIA Lepton
One of the most controversial endeavors of NVIDIA management discussed more than a year ago also with Deloitte teams was Lepton Project. The idea is a meta-service covering all GPU providers managed directly by NVIDIA. Of course it all made a lot of indignation.
This move even brought consensus between major hyperscalers (AWS, Microsoft) and smaller resource providers (Nebius, CoreWeave) to be upset with the decision.
At the day of release even shares went down a bit. Seems like the market works so that Docker can make smth like this but from NVIDIA it looks already too arrogant.
Current stage of the project as latest known is a silent rollback.
One of the most controversial endeavors of NVIDIA management discussed more than a year ago also with Deloitte teams was Lepton Project. The idea is a meta-service covering all GPU providers managed directly by NVIDIA. Of course it all made a lot of indignation.
This move even brought consensus between major hyperscalers (AWS, Microsoft) and smaller resource providers (Nebius, CoreWeave) to be upset with the decision.
At the day of release even shares went down a bit. Seems like the market works so that Docker can make smth like this but from NVIDIA it looks already too arrogant.
Current stage of the project as latest known is a silent rollback.
NVIDIA
NVIDIA DGX Cloud Lepton
Developers can now discover available GPUs in regions of choice and easily connect that compute to workloads for experimentation, fine-tuning, and scalable deployment.