• Lemminary@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    45
    ·
    edit-2
    6 months ago

    They’re not serving you the exact content they scraped, and that makes all the difference.

    • localhost443@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      4
      ·
      6 months ago

      Well if you believe that you should look at the times lawsuit.

      Word for word on hundreds/thousands of pages of stolen content, its damming

      • Lemminary@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        9
        ·
        6 months ago

        Why do you assume that I haven’t? The case hasn’t been resolved and it’s not clear how The NY Times did what they claim, which is may as well be manipulation. It’s a fair rebuttal by OpenAI. The Times haven’t provided the steps they used to achieve that.

        So unless that’s cleared up, it’s not damming in the slightest. Not yet, anyway. And that still doesn’t invalidate my statement above, because it’s still under very specific circumstances when that happens.

        • Emy@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          6 months ago

          Also intention is pretty important when determining the guilt of many crimes. OpenAI doesnt intentionally spit back an author’s exact words, their intention is to summarize and create unique content.

            • Lemminary@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              6 months ago

              No, the real defense is “that’s not how LLMs work” but you are all hinging on the wrong idea. If you so think that an LLM is capable of doing what you claim, I’d love to hear the mechanism in detail and the steps to replicate it.

            • whofearsthenight@lemm.ee
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              6 months ago

              I mean, I’m not sure why this conversation even needs to get this far. If I write an article about the history of Disney movies, and make it very clear the way I got all of those movies was to pirate them, this conversation is over pretty quick. OpenAI and most of the LLMs aren’t doing anything different. The Times isn’t Wikipedia, most of their stuff is behind a paywall with pretty clear terms of service and nothing entitles OpenAI to that content. OpenAI’s argument is “well, we’re pirating everything so it’s okay.” The output honestly seems irrelevant to me, they never should have had the content to begin with.

              • Lemminary@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                6 months ago

                That’s not the claim that they’re making. They’re arguing that OpenAI retains their work they made publicly available, which OpenAI claims is fair use because it’s wholly transformative in the form of nodes, weights and biases, and that they don’t store those articles in a database for reuse. But their other argument is that they created a system that threatens their business which is just ludicrous.

    • Cethin@lemmy.zip
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      6
      ·
      6 months ago

      It’s great how for most of us we’re taught that just changing the order of words is still plagerism. For them they frequently end up using the exact same words as other things and people still argue it somehow is intelligent and somehow not plagerism.

      • Lemminary@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        6 months ago

        “Changing the order of words” is what it does? That’s news to me. And do you have examples of it “using the exact same words as other things” without prompt manipulation?

        • asret@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          6 months ago

          Why does the prompting matter? If I “prompt” a band to play copyrighted music does that mean they get a free pass?

          • Lemminary@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            6 months ago

            That’s not a very good analogy because the band would be reproducing an entire work of art which an LLM does not and cannot. And by prompt manipulation I mean purposely making it seem like the LLM is doing something it wouldn’t do on its own. The operating word is seem, which is what I meant by manipulation. The prompting here is irrelevant, but how it’s done is. So unless The Times releases the steps they used to get ChatGPT to output what it did, you can’t really claim that that’s what it does.

            In a blog post, OpenAI said the Times “is not telling the full story.” It took particular issue with claims that its ChatGPT AI tool reproduced Times stories verbatim, arguing that the Times had manipulated prompts to include regurgitated excerpts of articles. “Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI said.

          • stewsters@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            6 months ago

            If you passed them a sheet of music I’d say that’s on you, it would be your responsibility to not sell recordings of them playing it.

            Just like if I typed the first chapter of Harry Potter into word it is not Microsoft’s intent to breach copyright, it would have been my intent to make it do it. It would be my responsibility not to sell that first chapter, and they should come after me if I did, even though MS is a corporation who supplied the tools.