AI chatbots were tasked to run a tech company. They built software in under seven minutes — for less than $1.

shish_mish@lemmy.world · 1 year ago

AI chatbots were tasked to run a tech company. They built software in under seven minutes — for less than $1.

theluddite@lemmy.ml · 1 year ago

“I gave an LLM a wildly oversimplified version of a complex human task and it did pretty well”

For how long will we be forced to endure different versions of the same article?

The study said 86.66% of the generated software systems were “executed flawlessly.”

Like I said yesterday, in a post celebrating how ChatGPT can do medical questions with less than 80% accuracy, that is trash. A company with absolute shit code still has virtually all of it “execute flawlessly.” Whether or not code executes it not the bar by which we judge it.

Even if it were to hit 100%, which it does not, there’s so much more to making things than this obviously oversimplified simulation of a tech company. Real engineering involves getting people in a room, managing stakeholders, navigating conflicting desires from different stakeholders, getting to know the human beings who need a problem solved, and so on.

LLMs are not capable of this kind of meaningful collaboration, despite all this hype.

PlexSheep@feddit.de · 1 year ago

Thank you for writing this so I only have to upvore you.

nul@programming.dev · 1 year ago

I don’t know what an upvore is and I don’t want to know.

Lucidlethargy@sh.itjust.works · edit-2 1 year ago

So what you’re saying is that 86.66% of the time, it works every time.

merc@sh.itjust.works · 1 year ago

80% accuracy, that is trash

More than 80% of most codebases is boilerplate stuff: including the right files for dependencies, declaring functions with the right number of parameters using the right syntax, handling basic easily anticipated errors, etc. Sometimes there’s even more boilerplate, like when you’re iterating over a list, or waiting for input and handling it.

The rest of the stuff is why programming is a highly paid job. Even a junior developer is going to be much better than an LLM at this stuff because at least they understand it’s hard, and at least often know when they should ask for help because they’re in over their heads. An LLM will “confidently” just spew out plausible bullshit and declare the job done.

Because an LLM won’t ask for help, won’t ask for clarifications, and can’t understand that it might have made a mistake, you’re going to need your highly paid programmers to go in and figure out what the LLM did and why it’s wrong.

Even perfecting self-driving is going to be easier than a truly complex software engineering project. At least with self-driving, the constraints are going to be limited because you’re dealing with the real world. The job is also always the same – navigate from A to B. In the software world you’re only limited by the limits of math, and math isn’t very limiting.

I have no doubt that LLMs and generative AI will change the job of being a software engineer / programmer. But, fundamentally programming comes down to actually understanding the problem, and while LLMs can pretend they understand things, they’re really just like well-trained parrots who know what sounds to make in specific situations, but with no actual understanding behind it.

igorlogius@lemmy.world · 1 year ago

Do Managment next!

thanks_shakey_snake@lemmy.ca · 1 year ago

They did do management-- They modeled the whole company as individual “staff” communicating with each other: CEO-bot communicates a product direction to the CTO-bot who communicates technical requirements to the developer-bot who asks for a “beautiful user interface” (lol) from the “art designer” (lol).

It’s all super rudimentary and goofy, but management was definitely part of the experiment.

Pistcow@lemm.ee · 1 year ago

But did it work?

KoboldCoterie@pawb.social · 1 year ago

The study said 86.66% of the generated software systems were “executed flawlessly.”

But…

Nevertheless, the study isn’t perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software. Still, the researchers said the findings “may potentially help junior programmers or engineers in the real world” down the line.

m_r_butts@kbin.social · 1 year ago

deleted by creator

KoboldCoterie@pawb.social · 1 year ago

And when the reviews are terrible and end users start reporting unreal quantities of bugs, they’ll fire the junior devs. They should have fixed those!

Knusper@feddit.de · 1 year ago

the CTO responded with Python. In turn, the CEO said, “Great!” and explained that the programming language’s “simplicity and readability make it a popular choice for beginners and experienced developers alike.”

Yep, that does sound like my CEO.

kitonthenet@kbin.social · 1 year ago

At the designing stage, the CEO asked the CTO to “propose a concrete programming language” that would “satisfy the new user’s demand,” to which the CTO responded with Python. In turn, the CEO said, “Great!” and explained that the programming language’s “simplicity and readability make it a popular choice for beginners and experienced developers alike.”

I find it extremely funny that project managers are the ones chatbots have learned to immitate perfectly, they already were doing the robot’s work: saying impressive sounding things that are actually borderline gibberish

thanks_shakey_snake@lemmy.ca · 1 year ago

What does it even mean for a programming language to “satisfy the new user’s demand?” Like when has the user ever cared whether your app is built in Python or Ruby or Common Lisp?

It’s like “what notebook do I need to buy to pass my exams,” or “what kind of car do I need to make sure I get to work on time?”

Yet I’m 100% certain that real human executives have had equivalent conversations.

realharo@lemm.ee · edit-2 1 year ago

And ironically Python (with Pygame which they also used) is a terrible choice for this kind of game - they ended up making a desktop game that the user would have to download. Not playable on the web, not usable for a mobile app.

More interestingly, if decisions like these are going to be made even more based on memes and random blogposts, that creates some worrying incentives for even more spambots. Influence the training data, and you’re influencing the decision making. It kind of works like that for people too, but with AI, it’s supercharged to the next level.

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

AI chatbots like OpenAI’s ChatGPT can operate a software company in a quick, cost-effective manner with minimal human intervention, a new study has found.

Based on the waterfall model — a sequential approach to creating software — the company was broken down into four different stages, in chronological order: designing, coding, testing, and documenting.

After assigning ChatDev 70 different tasks, the study found that the AI-powered company was able to complete the full software development process “in under seven minutes at a cost of less than one dollar,” on average — all while identifying and troubleshooting “potential vulnerabilities” through its “memory” and “self-reflection” capabilities.

“Our experimental results demonstrate the efficiency and cost-effectiveness of the automated software development process driven by CHATDEV,” the researchers wrote in the paper.

The study’s findings highlight one of the many ways powerful generative AI technologies like ChatGPT can perform specific job functions.

Nevertheless, the study isn’t perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software.

The original article contains 639 words, the summary contains 172 words. Saved 73%. I’m a bot and I’m open source!

atzanteol@sh.itjust.works · 1 year ago

This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce “halucinations” by providing each bot a more specific task.

The paper goes into more about this:

Similar to hallucinations encountered when using LLMs for natural language querying, directly generating entire software systems using LLMs can result in severe code hallucinations, such as incomplete implementation, missing dependencies, and undiscovered bugs. These hallucinations may stem from the lack of specificity in the task and the absence of cross-examination in decision- making. To address these limitations, as Figure 1 shows, we establish a virtual chat -powered software tech nology company – CHATDEV, which comprises of recruited agents from diverse social identities, such as chief officers, professional programmers, test engineers, and art designers. When presented with a task, the diverse agents at CHATDEV collaborate to develop a required software, including an executable system, environmental guidelines, and user manuals. This paradigm revolves around leveraging large language models as the core thinking component, enabling the agents to simulate the entire software development process, circumventing the need for additional model training and mitigating undesirable code hallucinations to some extent.

turmacar@kbin.social · 1 year ago

I assume the endgame of this is the boardroom suggestion ~~guy~~ bot asking “is this based on real facts? / does this actually function?”

m_r_butts@kbin.social · 1 year ago

deleted by creator

gencha@feddit.de · 1 year ago

What a load of bullshit. If you have a group of researchers provide “minimal human input” to a bunch of LLMs to produce a laughable program like tic-tac-toe, then please just STFU or at least don’t tell us it cost $1. This doesn’t even have the efficiency of a Google search. This AI hype needs to die quick

blazera@kbin.social · 1 year ago

Researchers, for example, tasked ChatDev to “design a basic Gomoku game,” an abstract strategy board game also known as “Five in a Row.”

What tech company is making Connect Four as their business model?

realharo@lemm.ee · edit-2 1 year ago

This is also the kind of task you would expect it to be great at - tutorial-friendly project for which there are tons of examples and articles written online, that guide the reader from start to finish.

The kind of thing you would get a YouTube tutorial for in 2016 with title like “make [thing] in 10 minutes!”. (see https://www.google.com/search?q=flappy+bird+in+10+minutes)

Other things like that include TODO lists (which is even used as a task for framework comparisons), tile-based platformer games, wordle clones, flappy bird clones, chess (including online play and basic bots), URL shorteners, Twitter clones, blogging CMSs, recipe books and other basic CRUD apps.

I wasn’t able to find a list of tasks in the linked paper, but based on the gomoku one, I suspect a lot of it will be things like these.

taanegl@lemmy.ml · 1 year ago

Future software is going to be written by AI, no matter how much you would like to avoid that.

My speculation is that we will see AI operating systems at some point, due to the extreme effectiveness of future AI to hack and otherwise subvert frameworks, services, libraries and even protocols.

So mutating protocols will become a thing, whereby AI will change and negotiate protocols on the fly, as a war rages between defensive AI and offensive AI. There will be shared codebase, but a clear distinction of the objective at hand.

That’s why we need more open source AI solutions and less proprietary solutions, because whoever controls the AI will be controlling the digital world - be it you or some fat cat sitting on a Smaug hill of money.

Melco@lemmy.world · 1 year ago

deleted by creator

1984@lemmy.today · edit-2 1 year ago

This is very much like the people saying airplanes will never fly after watching the prototypes fail in the 1900s.

It’s 100% guaranteed that computers will be able to write software much better and faster than humans. The only variable is how long it will take.

I think within a decade. Could be wrong and it could be two decades but I doubt it.

Think about it - these bots are already being used by humans to solve tasks every day. The only difference now compared to the future is that now there is a slow human typing something on a keyboard.

In the future, you will have bots talking to bots, millions of times per second, and models will learn in real time, not being pre-trained.

nychtelios@rlyeh.icu · 1 year ago

Computers are already able to write software faster than humans, this is called compilation. Languages are only a way to describe a problem and the computer automatically builds an extremely efficient software to solve it. No language models involved, so no randomicity, no biases and no errors caused by the inability to follow elementary syllogisms. Language models are not intelligent and they will never be. Yes, true AI imho is possible, but it won’t be a statistical model trying to predict words in a phrase, this is ridiculous and just companies marketing which you continue to take the bait.

1984@lemmy.today · edit-2 1 year ago

Yep true AI is not language models, this is just the beginning.

Compilation turns source code into binaries, and that’s because humans wants to write code rather than machine code, again because we are not smart enough to quickly write machine code.

I expect computers to skip all these steps completely in the future and just generate programs immediately.

nychtelios@rlyeh.icu · 1 year ago

Programming languages are only a way to describe a problem. Even with AI if you want it to build software for you, you have to describe your problem and to describe problems human languages are not that efficient, soooo… AI would require kinda a programming language, just an high level one. Maybe you can avoid writing logics, but as a software engineer, writing logics is the easiest and less time demanding task in serious software development.

1984@lemmy.today · 1 year ago

You will be able to talk to the computer but more importantly, the computer will already know a lot of patterns what is the best way to do something under the conditions you will describe.

It will be like having only top programmers write code when you explain to them what you want, except much faster. There could also be brain implants to interact directly, but that I think is at least 30 years away.