AI & Robotics Lab
Analyzing Qwen2.5's Performance Issues Upon investigating poor performance of Qwen2.5 model, I discovered a persistent error pattern. Out of approximately 1,230 attempts to write the function, 1,036 (84%) failed due to the same mathematical mistake: using…
Failed Attempt to Use Feedback for Improving Performance
My assumptions about using feedback proved unsuccessful. I tried two different approaches:
- creating a chat that collected detailed error descriptions when checks weren't successful;
- in generating mode, adding non-working code from previous attempt to the initial prompt.
In both scenarios, the models eventually became "stuck" on certain wrong answers (as Qwen2.5 model in the previous test), and the attempt count increased dramatically. I tested both the general-purpose model Llama3.2 and the code-specific Codestral - the results were the same. While I could have potentially tried something more sophisticated in making feedback, I decided not to pursue this path further at this time.
My assumptions about using feedback proved unsuccessful. I tried two different approaches:
- creating a chat that collected detailed error descriptions when checks weren't successful;
- in generating mode, adding non-working code from previous attempt to the initial prompt.
In both scenarios, the models eventually became "stuck" on certain wrong answers (as Qwen2.5 model in the previous test), and the attempt count increased dramatically. I tested both the general-purpose model Llama3.2 and the code-specific Codestral - the results were the same. While I could have potentially tried something more sophisticated in making feedback, I decided not to pursue this path further at this time.
🔥1
AI & Robotics Lab
Comparing Llama Models for Python Function Generation Testing two models, Llama 3.1 (8B) and Llama 3.2 (3.2B), on their ability to generate a Python function that computes the height of an object using its image size in pixels and the object’s distance. …
Keep a Cool Head - Tuning the Models' Temperature
The temperature parameter controls how diverse a model's outputs can be. Lower temperature values make the model more deterministic, causing it to focus on the most likely responses for better accuracy. I conducted experiments with this parameter on three selected models (Phi4, Codestral, and Codellama), which revealed some interesting patterns.
Each model was tasked with generating code of the
The Phi4 model's performance at temperature 0.2 is particularly impressive - generating working code on the first attempt in 17 out of 30 trials (over 55%) - an outstanding result!
The second insight comes from examining the default temperature values across different models, which we can infer from the graph. The general-purpose Phi4 operates with a default temperature around 0.5-0.7, allowing for more creative responses across various scenarios. Interestingly, the code-specific Codestral model has a default temperature of about 0.2 - a setting that aligns well with its specialized purpose. Perhaps surprisingly, the Codellama model runs with a higher default temperature of around 0.5-0.7, despite its code-focused nature.
These findings highlight that tuning the temperature parameter is a crucial step in optimizing code generation performance. The time invested in such experiments is clearly worthwhile, as it can significantly impact the model's effectiveness in generating correct code.
The temperature parameter controls how diverse a model's outputs can be. Lower temperature values make the model more deterministic, causing it to focus on the most likely responses for better accuracy. I conducted experiments with this parameter on three selected models (Phi4, Codestral, and Codellama), which revealed some interesting patterns.
Each model was tasked with generating code of the
object_height
function 30 times to pass a specific test. A clear trend emerged across all models: a lower temperature of 0.2 consistently delivered the best performance in generating test function code. This finding was statistically validated using the Kruskal-Wallis test (alpha = 0.05), both with and without the results from this temperature group. The conclusion is clear - for code generation, it's best to keep the model's head cool.The Phi4 model's performance at temperature 0.2 is particularly impressive - generating working code on the first attempt in 17 out of 30 trials (over 55%) - an outstanding result!
The second insight comes from examining the default temperature values across different models, which we can infer from the graph. The general-purpose Phi4 operates with a default temperature around 0.5-0.7, allowing for more creative responses across various scenarios. Interestingly, the code-specific Codestral model has a default temperature of about 0.2 - a setting that aligns well with its specialized purpose. Perhaps surprisingly, the Codellama model runs with a higher default temperature of around 0.5-0.7, despite its code-focused nature.
These findings highlight that tuning the temperature parameter is a crucial step in optimizing code generation performance. The time invested in such experiments is clearly worthwhile, as it can significantly impact the model's effectiveness in generating correct code.
👏1
It looks like we're entering the sunset years of traditional software engineering 🤔
Forwarded from AI Post — Artificial Intelligence
Please open Telegram to view this post
VIEW IN TELEGRAM
GitHub Copilot: Using Your Workspace as Context
Recently I received an email saying that GitHub's AI code assistant Copilot is now free. I've tried a few different AI assistant extensions in VS Code - the last one was Tabnine AI, which is actually pretty good at creating docstrings and live code completion. Since it's always interesting to try something new, I decided to give Copilot a shot.
This extension has two main features - a chat in the side panel and live code completion suggestions, which are standard for these kinds of assistants. It uses two models: GPT-4 by default and Claude 3.5 Sonnet, which is pretty impressive for a free assistant.
My favorite Copilot's feature - using your workspace as context for the model. Here's a PyQt utility project I'm working on that handles autofocusing for custom cameras. It runs multiple threads to manage UI interactions, camera motor movements, and image capture. The autofocus process involves several modules with complex data flow between them. If you need to refresh your understanding of the whole system, you can open the Copilot chat and ask questions about your entire workspace. What you get is a high-level description with clickable links to navigate through your project - it's really cool and super convenient. I haven't seen such a helpful feature in other assistants.
Let's try something else. When you ask for code improving, Copilot provides the updated code of a specific function in a particular module. Since you're in the project context, you can use the "
So that's my quick look at Copilot. While it has the usual AI features, some extras make it stand out. Since it's free now, it's worth playing around with it and seeing how it fits your workflow. Thanks for listening! Let's go design!
Recently I received an email saying that GitHub's AI code assistant Copilot is now free. I've tried a few different AI assistant extensions in VS Code - the last one was Tabnine AI, which is actually pretty good at creating docstrings and live code completion. Since it's always interesting to try something new, I decided to give Copilot a shot.
This extension has two main features - a chat in the side panel and live code completion suggestions, which are standard for these kinds of assistants. It uses two models: GPT-4 by default and Claude 3.5 Sonnet, which is pretty impressive for a free assistant.
My favorite Copilot's feature - using your workspace as context for the model. Here's a PyQt utility project I'm working on that handles autofocusing for custom cameras. It runs multiple threads to manage UI interactions, camera motor movements, and image capture. The autofocus process involves several modules with complex data flow between them. If you need to refresh your understanding of the whole system, you can open the Copilot chat and ask questions about your entire workspace. What you get is a high-level description with clickable links to navigate through your project - it's really cool and super convenient. I haven't seen such a helpful feature in other assistants.
Let's try something else. When you ask for code improving, Copilot provides the updated code of a specific function in a particular module. Since you're in the project context, you can use the "
Apply in Editor
" option - Copilot automatically inserts the changes right into code. You can review these changes and decide whether to keep them or not.So that's my quick look at Copilot. While it has the usual AI features, some extras make it stand out. Since it's free now, it's worth playing around with it and seeing how it fits your workflow. Thanks for listening! Let's go design!
🔥1
What is the brave new world we are stepping into now? Everyone has heard about countless people losing their jobs because of AI: programmers, graphic designers, copywriters and so many others. Another technical revolution is definitely unfolding before our eyes. Like previous ones, it brings possibilities we couldn't imagine before - almost everyone now can have access to an AI assistant that knows literally "everything" and eagerly answers any of your questions. It's an incredible time for creative, curious and open-minded people. And as always there is a dark side: the internet gave instructions how to print weapons, AI now can suggest how to make them more dangerous and invisible to scanners... Technology is only a tool, but today it's more powerful than ever before.
👍1
Forwarded from Science in telegram
DIY Fusion: How to Build a Nuclear Reactor in Your Kitchen (with AI)
A guy managed to assemble a neutron fusion reactor in his kitchen, using AI as his consultant. 🔬
Technical Specs:
• 30kV/10mA Electrostatic Precipitator
• Vacuum at 3 mTorr (253,333 times deeper than atmospheric pressure!)
• Bubble Detector for neutron counting
• Homemade Deuterium extracted from heavy water via electrolysis
The most impressive part? The entire deuterium production process cost just $112:
• $32 for a hydrocar PEM
• $80 for 50g of D₂O (heavy water)
From this, he managed to produce 56 liters of D₂ gas! 🧪
How AI Helped:
The author heavily relied on Claude for:
• Process debugging
• Safety checks
• Following complex instructions
While this isn’t a commercial reactor, as a demonstration of AI-assisted DIY, it’s absolutely mind-blowing. 🔥
The Journey:
The build was live-streamed over 36 hours straight. Remarkably, just months earlier, the same individual assembled a plasma reactor. What’s even more fascinating? He didn’t have deep expertise in nuclear physics—he simply asked Claude the right questions. Independent study would have required thousands (if not tens of thousands) of hours.
The Bigger Picture:
As exciting as this is, it’s also a bit terrifying. If a hobbyist can pull this off in fusion, imagine the possibilities with biology. We might someday look back at bats with nostalgia. 🦇
AI-powered DIY is here, and it’s opening doors to both innovation and ethical challenges.
A guy managed to assemble a neutron fusion reactor in his kitchen, using AI as his consultant. 🔬
Technical Specs:
• 30kV/10mA Electrostatic Precipitator
• Vacuum at 3 mTorr (253,333 times deeper than atmospheric pressure!)
• Bubble Detector for neutron counting
• Homemade Deuterium extracted from heavy water via electrolysis
The most impressive part? The entire deuterium production process cost just $112:
• $32 for a hydrocar PEM
• $80 for 50g of D₂O (heavy water)
From this, he managed to produce 56 liters of D₂ gas! 🧪
How AI Helped:
The author heavily relied on Claude for:
• Process debugging
• Safety checks
• Following complex instructions
While this isn’t a commercial reactor, as a demonstration of AI-assisted DIY, it’s absolutely mind-blowing. 🔥
The Journey:
The build was live-streamed over 36 hours straight. Remarkably, just months earlier, the same individual assembled a plasma reactor. What’s even more fascinating? He didn’t have deep expertise in nuclear physics—he simply asked Claude the right questions. Independent study would have required thousands (if not tens of thousands) of hours.
The Bigger Picture:
As exciting as this is, it’s also a bit terrifying. If a hobbyist can pull this off in fusion, imagine the possibilities with biology. We might someday look back at bats with nostalgia. 🦇
AI-powered DIY is here, and it’s opening doors to both innovation and ethical challenges.
UnrollNow
Thread By @hud_zah - I built a nuclear fusor in my kitchen...
I built a nuclear fusor in my kitchen over 36 hours with AI help, achieving nuclear fusion for the first time. I used parts from eBay, electrolyzed heavy water...
ROS again...
My projects currently have a monolithic structure where different parts are tightly coupled. Development in this situation requires careful work within the app's context, considering all dependencies. As we've seen, AI struggles with context handling. So we need a different approach - app components need more isolation, something like a microservice architecture.
While discussing my Focusing utility with Claude 3.5 Sonnet, it started suggesting a new structure: core and support services, dataflow with various message types (commands, answers, data packets), event-based communication... As the conversation progressed, I suddenly realized something familiar in all this - of course, The Robot Operating System (ROS)!
I actually worked with this powerful system a couple years ago on my pet robot project. And you know what? It seems to have everything we need:
1. Built-in Features:
├── Message passing system
├── Topic pub/sub
├── Service calls
├── Action servers
├── Parameter system
└── Launch system
2. Tools Available:
├── rqt (GUI development)
├── rviz (visualization)
├── rosbag (data recording/playback)
├── ros2 doctor (diagnostics)
└── built-in logging
Perfect! So, ROS again. Let's dive into design!
My projects currently have a monolithic structure where different parts are tightly coupled. Development in this situation requires careful work within the app's context, considering all dependencies. As we've seen, AI struggles with context handling. So we need a different approach - app components need more isolation, something like a microservice architecture.
While discussing my Focusing utility with Claude 3.5 Sonnet, it started suggesting a new structure: core and support services, dataflow with various message types (commands, answers, data packets), event-based communication... As the conversation progressed, I suddenly realized something familiar in all this - of course, The Robot Operating System (ROS)!
I actually worked with this powerful system a couple years ago on my pet robot project. And you know what? It seems to have everything we need:
1. Built-in Features:
├── Message passing system
├── Topic pub/sub
├── Service calls
├── Action servers
├── Parameter system
└── Launch system
2. Tools Available:
├── rqt (GUI development)
├── rviz (visualization)
├── rosbag (data recording/playback)
├── ros2 doctor (diagnostics)
└── built-in logging
Perfect! So, ROS again. Let's dive into design!
Blogspot
Управления роботом Moveo в ROS ч.2 - инверсная кинематика с IKPy, управление роботом с помощью джойстика в Rviz
Обновился до ROS Noetic с поддержкой python3 😁 и, наконец, снова взялся за работу над пакетом moveo_ps3 для управления манипулятором Moveo...
👏1
The cause is DeepSeek, a Chinese AI model that costs drastically less to train compared to competitors:
• OpenAI: $6 billion
• Meta’s Llama: $60 million
• DeepSeek: $5.6 million
• OpenAI: $6 billion
• Meta’s Llama: $60 million
• DeepSeek: $5.6 million
🤩1
Forwarded from AI Post — Artificial Intelligence
Please open Telegram to view this post
VIEW IN TELEGRAM
ROS2 - Understanding services
ROS is built around nodes - independent programs that each handle a specific task in your system (data network). For example, one node might control motors, while another processes camera data.
These nodes need to communicate with each other, and ROS provides several communication methods. One key method is services, which enable request-response communication between nodes. Think of services like function calls between nodes - one node (the client) requests something, and another node (the server) processes that request and sends back a response.
For example, a camera node might provide a "capture_image" service that other nodes can call when they need a new image. This is different from continuous data streaming (which uses topics) because services are specifically for on-demand, request-response interactions.
Key Benefits of ROS2 Services:
├── Request-Response Pattern
├── Services provide a synchronous request-response communication model
├── The client sends a request and waits for a response from the server
├── This is ideal for on-demand, one-to-one interactions between nodes
├── Guaranteed Delivery
├── Unlike topics which use publish/subscribe, services ensure the request reaches the server and gets a response
├── The client knows whether the service call succeeded or failed
├── Good for critical operations that need confirmation
├── Resource Efficiency
├── Services don't continuously stream data like topics
├── They only transmit when explicitly called
└── More efficient for occasional/periodic interactions
Perfect for Specific Use Cases:
├── One-time configurations
├── Computing results on demand
├── Robot control commands that need acknowledgment
└── System queries that expect a response
Better Error Handling
├── Services provide explicit success/failure feedback
├── The client can implement proper error handling
└── Helps build more robust systems
Synchronous Nature
├── Blocking calls ensure operations complete in sequence
├── Critical for tasks that must happen in order
└── Provides clear control flow in the application
The key is choosing between services and topics based on specific needs:
- Use services for request-response patterns
- Use topics for continuous data streams
Services are optimal when you need:
❕Guaranteed delivery
❕Operation confirmation
❕Explicit error handling
❕Sequential execution
❕One-time or periodic interactions
#ROS #knowledge
ROS is built around nodes - independent programs that each handle a specific task in your system (data network). For example, one node might control motors, while another processes camera data.
These nodes need to communicate with each other, and ROS provides several communication methods. One key method is services, which enable request-response communication between nodes. Think of services like function calls between nodes - one node (the client) requests something, and another node (the server) processes that request and sends back a response.
For example, a camera node might provide a "capture_image" service that other nodes can call when they need a new image. This is different from continuous data streaming (which uses topics) because services are specifically for on-demand, request-response interactions.
Key Benefits of ROS2 Services:
├── Request-Response Pattern
├── Services provide a synchronous request-response communication model
├── The client sends a request and waits for a response from the server
├── This is ideal for on-demand, one-to-one interactions between nodes
├── Guaranteed Delivery
├── Unlike topics which use publish/subscribe, services ensure the request reaches the server and gets a response
├── The client knows whether the service call succeeded or failed
├── Good for critical operations that need confirmation
├── Resource Efficiency
├── Services don't continuously stream data like topics
├── They only transmit when explicitly called
└── More efficient for occasional/periodic interactions
Perfect for Specific Use Cases:
├── One-time configurations
├── Computing results on demand
├── Robot control commands that need acknowledgment
└── System queries that expect a response
Better Error Handling
├── Services provide explicit success/failure feedback
├── The client can implement proper error handling
└── Helps build more robust systems
Synchronous Nature
├── Blocking calls ensure operations complete in sequence
├── Critical for tasks that must happen in order
└── Provides clear control flow in the application
The key is choosing between services and topics based on specific needs:
- Use services for request-response patterns
- Use topics for continuous data streams
Services are optimal when you need:
❕Guaranteed delivery
❕Operation confirmation
❕Explicit error handling
❕Sequential execution
❕One-time or periodic interactions
#ROS #knowledge
🔥1
Forwarded from AI Post — Artificial Intelligence
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
ROS2: Modular & Resilient Node Parameters
In ROS2, each node controls its own settings, creating a system that’s both modular and remarkably resilient. What’s more, ROS2 comes with a rich array of built-in methods for managing parameters, making your nodes even more flexible and robust:
• Decentralized Control:
Each node declares and manages its own parameters, creating a plug-and-play architecture that minimizes risk and enhances robustness.
• Dynamic Adjustments:
Built-in APIs let you update parameters on the fly—no need for restarts. Change callbacks ensure that live updates are smoothly integrated into your node performance.
• Extensive Built-In Methods:
ROS2 provides a variety of tools, from dynamic parameter handling to advanced introspection methods. This extensive toolkit offers everything needed—from automatic type checks to seamless integration of new configurations—making error handling and system adaptation straightforward.
• Intuitive Organization:
Use namespaced parameters, coupled with thorough inline documentation, to keep configurations neat and scalable. This clarity not only aids team collaboration but also simplifies long-term maintenance.
• Streamlined Configuration Management:
Centralized YAML or launch files allow you to manage settings consistently across development, testing, and production environments without fuss.
• Real-Time Monitoring:
Built-in introspection tools and logging capabilities keep you informed of parameter changes as they happen, ensuring that your system remains agile and well-monitored.
This node-centric approach, enhanced by an impressive suite of built-in methods, emphasizes modularity and resilience—empowering you to build dynamic, high-performance ROS2 applications with ease and confidence.
#ROS #knowledge
In ROS2, each node controls its own settings, creating a system that’s both modular and remarkably resilient. What’s more, ROS2 comes with a rich array of built-in methods for managing parameters, making your nodes even more flexible and robust:
• Decentralized Control:
Each node declares and manages its own parameters, creating a plug-and-play architecture that minimizes risk and enhances robustness.
• Dynamic Adjustments:
Built-in APIs let you update parameters on the fly—no need for restarts. Change callbacks ensure that live updates are smoothly integrated into your node performance.
• Extensive Built-In Methods:
ROS2 provides a variety of tools, from dynamic parameter handling to advanced introspection methods. This extensive toolkit offers everything needed—from automatic type checks to seamless integration of new configurations—making error handling and system adaptation straightforward.
• Intuitive Organization:
Use namespaced parameters, coupled with thorough inline documentation, to keep configurations neat and scalable. This clarity not only aids team collaboration but also simplifies long-term maintenance.
• Streamlined Configuration Management:
Centralized YAML or launch files allow you to manage settings consistently across development, testing, and production environments without fuss.
• Real-Time Monitoring:
Built-in introspection tools and logging capabilities keep you informed of parameter changes as they happen, ensuring that your system remains agile and well-monitored.
This node-centric approach, enhanced by an impressive suite of built-in methods, emphasizes modularity and resilience—empowering you to build dynamic, high-performance ROS2 applications with ease and confidence.
#ROS #knowledge
🔥1
Media is too big
VIEW IN TELEGRAM
Working with Parameters in ROS2: A Quick Example
1️⃣ Defining Parameters:
Using
- Type: PARAMETER_DOUBLE
- Description: "Focal length in mm"
- Range: 1.0 to 100.0, step 0.1
2️⃣ Access & Modify Parameters:
- Use the CLI with commands like ros2 param get/set.
- Alternatively, use a GUI to adjust values in real time for better flexibility and usability.
3️⃣ Error Handling:
ROS2 comes with built-in methods to validate parameters based on their descriptors. For example, if you try setting focal_length to 40.44 (which doesn't match the step size of 0.1), you get a precise error:
#ROS #knowledge #example
1️⃣ Defining Parameters:
Using
ParameterDescriptor
, we define parameters like focal_length
with details such as:- Type: PARAMETER_DOUBLE
- Description: "Focal length in mm"
- Range: 1.0 to 100.0, step 0.1
2️⃣ Access & Modify Parameters:
- Use the CLI with commands like ros2 param get/set.
- Alternatively, use a GUI to adjust values in real time for better flexibility and usability.
3️⃣ Error Handling:
ROS2 comes with built-in methods to validate parameters based on their descriptors. For example, if you try setting focal_length to 40.44 (which doesn't match the step size of 0.1), you get a precise error:
"Setting parameter failed: The value is not close enough to a valid step."❗️This means ROS2 ensures parameters strictly follow the defined constraints—like ranges, step increments, and types—without requiring custom validation logic in your code.
#ROS #knowledge #example
⚡1
AI Post — Artificial Intelligence
You dont necessarly need Tanks and missles to start a war. Sometimes it’s just an AI model that cost 5m to train. @aipost 🪙 | Our X 🥇
🤔 Why I'm not surprised:
https://semianalysis.com/2025/01/31/deepseek-debates/
Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters.
https://semianalysis.com/2025/01/31/deepseek-debates/
SemiAnalysis
DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts
The DeepSeek Narrative Takes the World by Storm DeepSeek took the world by storm. For the last week, DeepSeek has been the only topic that anyone in the world wants to talk about. As it currently s…
⚡1