• rozodru@lemmy.world
    link
    fedilink
    arrow-up
    15
    ·
    1 day ago

    Claude AI does this ALL the time too. It NEEDS to give a solution, it rarely can say “I don’t know” so it will just completely make up a solution that it thinks is right without actually checking to see the solution exists. It will make/dream up programs or libraries that don’t and have never existed OR it will tell you something can do something when it has never been able to do that thing ever.

    And that’s just how all these LLMs have been built. they MUST provide a solution so they all lie. they’ve been programmed this way to ensure maximum profits. Github Copilot is a bit better because it’s with me in my code so it’s suggestions, most of the time, actually work because it can see the context and whats around it. Claude is absolute garbage, MS Copilot is about the same caliber if not worse than Claude, and Chatgpt is only good for content writing or bouncing ideas off of.

    • Croquette@sh.itjust.works
      link
      fedilink
      arrow-up
      24
      arrow-down
      1
      ·
      1 day ago

      LLM are just sophisticated text predictions engine. They don’t know anything, so they can’t produce an “I don’t know” because they can always generate a text prediction and they can’t think.

      • zeca@lemmy.eco.br
        link
        fedilink
        arrow-up
        2
        arrow-down
        3
        ·
        1 day ago

        They could be programmed to do some double/triple checking, and return “i dont know” when the checks are negative. I guess that would compromise the apparence of oracle that their parent companies seem to dissimulately push onto them.

        • sip@programming.dev
          link
          fedilink
          arrow-up
          6
          ·
          edit-2
          14 hours ago

          they don’t check. you gotta think in statistics terms.

          based on the previously inputed words (tokens actually, but I’ll use words for the sake of simplicity), which is the system prompt + user prompt, the LLM generates a list of the next possible words that makes most sense, then picks one from the top few. How much it goes down the list on lower possible words is based on temperature configuration. Then the next word, and the next, etc, each time looking back.

          I haven’t checked on the reasoning models, what that step actually does, but I assume it just expands the user prompt to fill in stuff that thr LLM thinks the user was lazy to input, then works on the final answer.

          so basically is like tapping on your phone keyboard next word prediction.

          • zeca@lemmy.eco.br
            link
            fedilink
            arrow-up
            2
            ·
            11 hours ago

            The chatbots are not just LLMs though. They run scripts in which some steps are queries to an LLM.

              • zeca@lemmy.eco.br
                link
                fedilink
                arrow-up
                1
                ·
                edit-2
                9 hours ago

                That the script could incorporate some checking mechanisms and implement an “i dont know” for when the LLMs answers fails some tests.

                They already do some of that but for other purposes, like censoring, or as by recent news, grok looks up musks opinions before answering questions, or to make more accurate math calculations they actually call a normal calculator, and so on…

                They could make the LLM produce an answer A, then look up the question on google and ask that LLM to “compare” answer A with the main google results looking for inconsistencies and then return “i dont know” if its too inconsistent. Its not a rigorous test, but its something, and im sure the actual devs of those chatbots could make something much better than my half baked idea.

      • Cyberflunk@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        9
        ·
        1 day ago

        Tool use, reasoning, chain of thought, those are the things that set llm systems apart. While you are correct in the most basic sense, it’s like saying a car is only a platform with wheels, it’s reductive of the capabilities

        • Croquette@sh.itjust.works
          link
          fedilink
          arrow-up
          4
          ·
          1 day ago

          LLM are prediction engine. They don’t have knowledge, they only chain words together related to your topic.

          They don’t know they are wrong because they just don’t know anything period.

          • zeca@lemmy.eco.br
            link
            fedilink
            arrow-up
            3
            arrow-down
            2
            ·
            1 day ago

            They have a point, chatbots are built on top of LLMs, they arent just LLMs.

    • fuzzzerd@programming.dev
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      24 hours ago

      Are you using Claude web chat or Claude code? Because my experience with it is vastly different eve when using the same underlying model. Clause code isn’t perfect and gets stuff wrong, but it can run the project check the output and realize it’s mistake and fix it in many cases. It doesn’t fix logic flaws, but it can fix hallucinations of library methods that don’t exist.