Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

  • dhork@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 year ago

    There’s an additional question: who holds the copyright on the output of an algorithm? I don’t think that is copyrightable at all. The bot doesn’t really add anything to the output, it’s just a fancy search engine. In the US, in particular, the agency in charge of Copyrights has been quite insistent that a copyright can only be given to the output if a human.

    So when an AI incorporates parts of copyrighted works into its output, how can that not be infringement?

    • cerevant@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      3
      ·
      1 year ago

      How can you write a blog post reviewing a book you read without copyright infringement? How can you post a plot summary to Wikipedia without copyright infringement?

      I think these blanket conclusions about AI consuming content being automatically infringing are wrong. What is important is whether or not the output is infringing.

      • dhork@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        3
        ·
        edit-2
        1 year ago

        You can write that blog post because you are a human, and your summary qualifies for copyright protection, because it is the unique output of a human based on reading the copywrited material.

        But the US authorities are quite clear that a work that is purely AI generated can never qualify for copyright protection. Yet since it is based on the synthesis of works under copyright, it can’t really be considered public domain either. Otherwise you could ask the AI “Write me a summary of this book that has exactly the same number of words”, and likely get a direct copy of the book which is clear of copyright.

        I think that these AI companies are going to face a reckoning, when it is ruled that they misappropriated all this content that they didn’t explicitly license for use, and all their output is just fringing by definition.

        • Whimsical@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          I’m expecting a much messier “resolution” that’ll look a lot like YouTube’s copyright situation - their product can be used for copyright infringement, and they’ll be required by law to try and take appropriate measures to prevent it, but will otherwise not be held liable as long as they can claim such measures are being taken.

          Having an AI recite a long text to bypass copyright seems equivalent in my mind to uploading a full movie to youtube. In both cases, some amount of moderation (itself increasingly algorithmic) is required to not only be applied, but actively developed and advanced to flout efforts to bypass it. For instance, youtube pirates will upload things with some superficial changes like a filter applied or showing the movie on a weird angle or mirrored to bypass copyright bots, which means the bots need to be more strict and better trained, or else youtube once again becomes liable for knowing about these pirates and not stopping them.

          The end result, just like with youtube, will probably be that AI models have to have big, clunky algorithms applied against their outputs to recalculate or otherwise make copyright-safe anything that might remotely be an infringement. It’ll suck for normal users, pirates will still dig for ways to bypass it, and everyone will be unhappy. If youtube is any indicator, this situation can somehow remain stable for over a decade - long enough for AI devs to release a new-generation bot to restart the whole issue.

          Yaaaaaaaaay