I have weird thoughts that people find peculiar, so I write them down and people seem to enjoy reading them.
Fun speculation: we are CPUs for the information systems we inhabit, like scientific method, political ideologies, etc.
It’s trivial to get LLMs to act against the instructions
There is a lot more that goes into it than just being correct. 18000 waters may have been the actual order, because somebody decided to screw with the machine. A human who isn’t terminally autistic would reliably interpret that as a joke and would simply refuse to punch that in. The LLM will likely do what a human tells it to do, since it has no contextual awareness, it only has the system prompt and whatever interaction with the user it had so far.
This kind of stuff never happens overnight. It happens slowly, incrementally, and the people are never mad enough at too much sudden change to be motivated enough to do anything. People should feel good about the imposition of boundaries, and it helps that for the average user, the boundaries often result in a better user experience.