Open source projects drown in bad bug reports penned by AI

misk@sopuli.xyz · 10 months ago

Open source projects drown in bad bug reports penned by AI

TheFunkyMonk@lemmy.world · 9 months ago

I use LLM-type AI every day as a software developer. It’s incredibly helpful in many contexts, but you have to understand what it’s designed to do and what its limitations are.

I went back and forth with Claude and ChatGPT today about its logic being incorrect and it telling me “You’re right,” then outputting the same/similar erroneous code it output before, until I needed to just slow down and fix some fundamental issues with its output myself. It’s certainly a force multiplier, but not at any kind of scale without guidance.

I’m not convinced AI, in its current incarnation, can be used to write code at a reasonable scale without human intervention. Though I hope we get there so I can retire.

Thorry84@feddit.nl · 9 months ago

so I can retire.

So you can become homeless you mean :p

WhatAmLemmy@lemmy.world · 9 months ago

Bro’s legit out here thinking there’s some sort of meaningful wealth redistribution instead of winner takes all for the few, abject poverty for the rest.

nomy@lemmy.zip · 9 months ago

He’s a programmer, they’re not really known for their awareness outside of pretty specific problem solving.

WhatAmLemmy@lemmy.world · 9 months ago

I’m a programmer. Programmers are the way the are because of biases inherent to pre-existing wealth and historically in-demand skills / high pay. It’s only a matter of time till they learn the boot of capitalism will crush them the same as any other worker.

nomy@lemmy.zip · 9 months ago

It was mostly a joke but I don’t disagree with your assessment.

Elvith Ma'for@feddit.org · 9 months ago

No, everyone knows we’re gonna do gardening or woodworking or something like that when we stop our programming career. Main thing is: something that’s as far as possible from a computer.

1985MustangCobra@lemmy.ca · 9 months ago

i like using computers though.

anomnom@sh.itjust.works · 9 months ago

I’m fixing classic cars now. If they have a computer it’s so old that there’s no danger of ROHS soldering and there aren’t even any programming ports. Just stick a sensor up the tailpipe and adjust some screws.

Is even been better for my back than sitting at a desk was.

lemmeBe@sh.itjust.works · 9 months ago

Was wondering what garden leave is. 😁

TheFunkyMonk@lemmy.world · 9 months ago

I’ll take it.

TechLich@lemmy.world · edit-2 9 months ago

One thing you gotta remember when dealing with that kind of situation is that Claude and Chat etc. are often misaligned with what your goals are.

They aren’t really chat bots, they’re just pretending to be. LLMs are fundamentally completion engines. So it’s not really a chat with an ai that can help solve your problem, instead, the LLM is given the equivalent of “here is a chat log between a helpful ai assistant and a user. What do you think the assistant would say next?”

That means that context is everything and if you tell the ai that it’s wrong, it might correct itself the first couple of times but, after a few mistakes, the most likely response will be another wrong answer that needs another correction. Not because the ai doesn’t know the correct answer or how to write good code, but because it’s completing a chat log between a user and a foolish ai that makes mistakes.

It’s easy to get into a degenerate state where the code gets progressively dumber as the conversation goes on. The best solution is to rewrite the assistant’s answers directly but chat doesn’t let you do that for safety reasons. It’s too easy to jailbreak if you can control the full context.

The next best thing is to kill the context and ask about the same thing again in a fresh one. When the ai gets it right, praise it and tell it that it’s an excellent professional programmer that is doing a great job. It’ll then be more likely to give correct answers because now it’s completing a conversation with a pro.

There’s a kind of weird art to prompt engineering because open ai and the like have sunk billions of dollars into trying to make them act as much like a “helpful ai assistant” as they can. So sometimes you have to sorta lean into that to get the best results.

It’s really easy to get tricked into treating like a normal conversation with a person when it’s actually really… not normal.

TimeSquirrel@kbin.melroy.org · 9 months ago

It’s really easy to get tricked into treating like a normal conversation with a person when it’s actually really… not normal.

I caught myself thanking GitHub Copilot after getting a response to a question. Felt…weird. For a whole two seconds my brain was operating like I’m talking to another human. You are absolutely correct.

Max@lemmy.world · 9 months ago

This is a really fantastic explanation of the issue!

It’s more like improv comedy with an extremely adaptable comic than a conversation with a real person.

One of the things that I’ve noticed is that the training/finetuning that’s done in order to make it give good completions to the “helpful ai conversation scenario” is that it flattens a lot of the capabilities of the underlying language model for really interesting and specific completions. I remember playing around with gpt2 in it’s native text completion mode, and even with that much weaker model, it was able to complete a much larger variety of text styles without sliding into the sameness and slickness of the current chat model fine-tuning.

A lot of the research that I read on LLMs is using them in the original token completion context, but pretty much the only way people interact with them is through a thick layer of ai chatbot improv. As an example for code, I imagine that one would have more success using an LLM to edit your code if the context that you give it starts out written like it is a review of a pull request for the code, or some other commentary of a form that matches the way that code is reviewed in the training data. But instead of having access to create that context directly, we have to ask for code review through the fogged window of a chat between an AI assistant and a person discussing code. And that form of chat likely isn’t well represented in the training data.