Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
tests can never prove correctness of code. All they can prove is “the thing hasn’t failed yet”. Proper reasoning is always needed if you want a guarantee.
If you had the llm write the regex for you, I can practically guarantee that you won’t think of, and write tests for, all the edge cases.
No, which is why I avoid regexes for most production code and also why I would never use one written by a pathological liar and always guessing coder like an LLM.
LLM is great when you’re coding in a pure fictional programming language like elm and are using loss of custom types to make impossible states unrepresentable, and the function you’re writing could have been derived by the Haskell compiler, so mathematically the only possible way you could write it wrong is to use the wrong constructor, then it’s usually right and when it’s wrong either it doesn’t compile or you can see it’s chosen the wrong path.
The rest of the time it will make shit up and when you challenge it, out will happily rewrite it for you, but there’s no particular reason why it wouldn’t make up more nonsense.
Regexes are far easier to write than to debug, which is exactly why they’re poison for a maintainable code base and a really bad use case for an LLM.
I also wouldn’t use an LLM for languages in which there are lots and lots of ways to go wrong. That’s exactly when you need an experienced developer, not someone who guesses based on what they read online and no understanding, never learning anything, because, my young padawan, that’s exactly what an LLM is, every day.
tests can never prove correctness of code. All they can prove is “the thing hasn’t failed yet”. Proper reasoning is always needed if you want a guarantee.
If you had the llm write the regex for you, I can practically guarantee that you won’t think of, and write tests for, all the edge cases.
You formally verify your regexes? Doubtful.
No, which is why I avoid regexes for most production code and also why I would never use one written by a pathological liar and always guessing coder like an LLM.
LLM is great when you’re coding in a pure fictional programming language like elm and are using loss of custom types to make impossible states unrepresentable, and the function you’re writing could have been derived by the Haskell compiler, so mathematically the only possible way you could write it wrong is to use the wrong constructor, then it’s usually right and when it’s wrong either it doesn’t compile or you can see it’s chosen the wrong path.
The rest of the time it will make shit up and when you challenge it, out will happily rewrite it for you, but there’s no particular reason why it wouldn’t make up more nonsense.
Regexes are far easier to write than to debug, which is exactly why they’re poison for a maintainable code base and a really bad use case for an LLM.
I also wouldn’t use an LLM for languages in which there are lots and lots of ways to go wrong. That’s exactly when you need an experienced developer, not someone who guesses based on what they read online and no understanding, never learning anything, because, my young padawan, that’s exactly what an LLM is, every day.
Watch your LLM like a hawk.