https://www.youtube.com/watch?v=tNmgmwEtoWE
"An extremely solid and convincing rebuttal. Sad. I wonder what the Devin team will say in response, if anything. Summarizing the video:
• Devin is sold as being able to solve arbitrary Upwork tasks. In the video demo the problem it was asked to solve doesn't match the stated requirements of the customer (who asked for setup instructions, not code).
• Devin is shown fixing errors in the source of a GitHub repo, but the files it's shown editing don't actually exist in that repo and some of the errors its fixing are nonsensical, of the type that'd never be made by a human. Inference: Devin must be fixing bugs in files it has itself created, but that's not clearly indicated.
• There is no need to do any coding in the first place, because the README in the repository has all the instructions needed to achieve the task ready to go and they still work fine with only a one-line tweak, even though the repository is old. This is why the customer asked for instructions for how to run it on EC2 rather than for some coding. Devin didn't seem to read the README or understand that it only had to execute a couple of pre-existing Python scripts. The output in the video makes it look like the task was complex and sophisticated, with a long plan and many check boxes showing work completed, but the work was in fact pointless and redundant.
• Devin's code changes are bad, e.g. writing its own low level file read loop instead of using the standard library properly.
• Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.
• Devin does nonsensical shell commands like
The strange mistakes lead to questions about what underlying model it's using. I don't think GPT-4 would make mistakes like that.
The Internet of Bugs guy is an AI fan and uses coding AI himself, but points out that the company behind it says you can "watch Devin get paid for doing work" which isn't actually supported by their video evidence when watched carefully."
- some HN user
"An extremely solid and convincing rebuttal. Sad. I wonder what the Devin team will say in response, if anything. Summarizing the video:
• Devin is sold as being able to solve arbitrary Upwork tasks. In the video demo the problem it was asked to solve doesn't match the stated requirements of the customer (who asked for setup instructions, not code).
• Devin is shown fixing errors in the source of a GitHub repo, but the files it's shown editing don't actually exist in that repo and some of the errors its fixing are nonsensical, of the type that'd never be made by a human. Inference: Devin must be fixing bugs in files it has itself created, but that's not clearly indicated.
• There is no need to do any coding in the first place, because the README in the repository has all the instructions needed to achieve the task ready to go and they still work fine with only a one-line tweak, even though the repository is old. This is why the customer asked for instructions for how to run it on EC2 rather than for some coding. Devin didn't seem to read the README or understand that it only had to execute a couple of pre-existing Python scripts. The output in the video makes it look like the task was complex and sophisticated, with a long plan and many check boxes showing work completed, but the work was in fact pointless and redundant.
• Devin's code changes are bad, e.g. writing its own low level file read loop instead of using the standard library properly.
• Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.
• Devin does nonsensical shell commands like
head -n 5 foo | tail -n 5The strange mistakes lead to questions about what underlying model it's using. I don't think GPT-4 would make mistakes like that.
The Internet of Bugs guy is an AI fan and uses coding AI himself, but points out that the company behind it says you can "watch Devin get paid for doing work" which isn't actually supported by their video evidence when watched carefully."
- some HN user
YouTube
Debunking Devin: "First AI Software Engineer" Upwork lie exposed!
Recently, Devin the supposed "First AI Software Engineer" was announced. The company lied and said that their video showed Devin completing and getting paid for freelance jobs on Upwork, but it didn't show that at all.
UPDATE!!: The original job poster…
UPDATE!!: The original job poster…