Absolutely needed: to get high efficiency for this beast … as it gets better, we’ll become too dependent.

“all of this growth is for a new technology that’s still finding its footing, and in many applications—education, medical advice, legal analysis—might be the wrong tool for the job,”

      • Terrasque@infosec.pub
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        16 hours ago

        Yes, which has improved some tasks measurably. ~20% improvement on programming tasks, as a practical example. It has also improved tool use and agentic tasks, allowing the llm to plan ahead and adjust it’s initial approach based on later parts.

        Having the llm talk through the tasks allows it to improve or fix bad decisions taken early based on new realizations on later stages. Sort of like when a human thinks through how to do something.

        • technocrit@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          4
          ·
          edit-2
          22 hours ago

          For example? Citations?

          Pretty sure these “tasks” are meaningless metrics made up by pseudo-scientific grifters.

          • IsaamoonKHGDT_6143@lemmy.zip
            link
            fedilink
            English
            arrow-up
            3
            ·
            21 hours ago

            AlphaFold 3 which can help in the prediction of some proteins. Although it has some limitations, it cannot be used in all cases, only in what it can perform without any problem.

          • Jakeroxs@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            1
            ·
            22 hours ago

            Small bits of code, language related tasks, basic context understanding, not metrics I have literally measured simply noticed has improved compared to non reasoning models in my homelab testing. 🤷‍♂️