• 0 Posts
  • 1.3K Comments
Joined 3 years ago
cake
Cake day: June 16th, 2023

help-circle


  • It is a different substrate for reasoning, emergent, statistical, and language-based, and it can still yield coherent, goal-directed outcomes.

    That’s some buzzword bingo there… A very long winded way of saying it isn’t human-like reasoning but you want to call it that anyway.

    If you went accept that reasoning often fails to show continuity, well then there’s also the lying.

    Examining a reasoning chain around generating code for an embedded control scenario. At one point it says the code may effect the behavior of how a motor is controlled, and so it will test if the motor operates.

    Now the truth of the matter is that the model has no access to perform such a test, but the reasoning chain is just a fiction, so it described a result, asserting that it performed the test and it passed, or failed. Not based on a test, but by text prediction. So sometimes it says it failed, then carries on as if it passed, sometimes it decides to redo some code to address the error, but leaves it broken in real life. Of course it can claim it works when it didn’t at all. It can show how “reasoning” can help though. If the code is generated based on one application, but when applied to a motor control scenario, people had issues and so generating the extra text caused it to zero in on some stack overflow thread where someone made a similar mistake.








  • The “reasoning” models aren’t really reasoning, they are generating text that resembles “train of thought”. If you examine some of the reasoning chains with errors, you can see some errors are often completely isolated, with no lead up and then the chain carries on as if the mistake never happened. Errors that when they happen in an actual human reasoning chain propagate.

    LLM reasoning chains are generating essentially fanfics of what reasoning would look like. It turns out that expending tokens to generate more text and discarding it does make the retained text more more likely to be consistent with desired output, but “reasoning” is more a marketing term than describing what is really happening.



  • I just don’t get how so many people just start by it. Every time I set my expectations lower for what it can be useful at, it proceeds to prove itself likely to fail at that when I actually have a use case that I think one of the LLMs could tackle. Every step of the way. Being told by people that the LLMs are amazing, and that I only had a bad experience because I hadn’t used the very specific model and version they love, and every time I try to verify their feedback (my work is so die-hard they pay for access to every popular model and tool), it does roughly the same stuff, ever so slightly shuffling what they get right and wrong.

    I feel gaslit as it keeps on being uselessly unreliable for any task that I would conceivably find it theoretically useful for.




  • Real documents, but describing tips submitted during the 2020 election without apparent link to more investigative content.

    Any random person can generate a tip, so a tip is a starting point for an investigation, but in and of itself should not be considered newsworthy.

    I’m sure you’d have scary sounding pizzagate themed tips implicating Hilary Clinton during her run. I’m sure there were Hunter Biden laptop tips in 2020. I’m sure there were terrorist themed tips about Obama during his runs.




  • There was also the prolific serial to USB components. The market was flooded with perfectly functional clones. Prolific deliberately broke support for clones, penalizing a ton of people who had no idea.

    When people did too good a job cloning some of their chips, they made the driver break even their own chips.

    Of course, in this case the vendor got their stuff into the standard Windows driver without even needing users to download anything…

    The ultimate effect is that our datacenter just uses Linux laptops because in practice serial adapters for Windows are just too unreliable unless we try to be supply chain detectives for the cheap little serial adapters we buy.