• applebusch@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    3
    ·
    7 days ago

    Well no I wouldn’t. Not because I think piracy is wrong, yo ho ho. Not because I think machine learning is completely worthless. Because garbage in garbage out. The vast majority of humanity’s written works is contradictory, utilizes flawed logic, is based on flawed information and data, is intentionally misleading, or an outright fabrication. Even with trustworthy work there’s so much assumed context it could easily be misinterpreted or indecipherable. Failing to curate training data is just asking for bullshit.

    • dependencyinjection@discuss.tchncs.de
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      7 days ago

      Although I agree with the stuff about written things being contradictory, but I think your comment is a little reductive about machine learning.

      Machine learning has rapidly transformed many areas, here’s a few:

      • Object recognition
      • Facial recognition
      • Medical imaging
      • Language translation
      • Speech recognition
      • Text generation
      • Drug Discovery
      • Genomics

      The list is rather endless really. Take text recognition and computer vision. People that are blind can now wear Meta (shit company I know) glasses and actually go shopping, pick up items and have the labels read to them. Thats fucking awesome.

      • applebusch@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 days ago

        Yeah I was only talking about in the context of LLMs, which honestly don’t feel like the best use of machine learning to me. Real scientists and engineers using machine learning to create efficient heuristics to solve real bounded problems, and actually verifying the output through conventional means, is incredibly powerful. There’s still a lot of overhyped bullshit out there outside the LLM chatbot space, but the point stands that any machine learning algorithm should have its training data carefully curated. The techbro strategy of throwing more nodes and random data at an LLM hoping it will magically hit some exponential threshold of performance is stupid.