‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • flop_leash_973@lemmy.world
    link
    fedilink
    English
    arrow-up
    71
    arrow-down
    2
    ·
    edit-2
    9 months ago

    If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.

    But that would never happen. Almost like the whole of copyright has been perverted into a scam.

    • tinwhiskers@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      20
      ·
      9 months ago

      Using copyrighted material is not the same thing as copyright infringement. You need to (re)publish it for it to become an infringement, and OpenAI is not publishing the material made with their tool; the users of it are. There may be some grey areas for the law to clarify, but as yet, they have not clearly infringed anything, any more than a human reading copyrighted material and making a derivative work.

      • hperrin@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        3
        ·
        9 months ago

        It comes from OpenAI and is given to OpenAI’s users, so they are publishing it.

        • linearchaos@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          3
          ·
          9 months ago

          It’s being mishmashed with a billion other documents just like to make a derivative work. It’s not like open hours giving you a copy of Hitchhiker’s Guide to the Galaxy.

          • hperrin@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            9 months ago

            New York Times was able to have it return a complete NYT article, verbatim. That’s not derivative.

            • Fraubush@lemm.ee
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              1
              ·
              9 months ago

              I thought the same thing until I read another perspective into it from Mike Masnick and, from what he writes, it seems pretty clear they manipulated ChatGPT with some very specific prompts that someone who doesn’t already pay NYT for access would not be able to do. For example, feeding it 3 verbatim paragraphs from an article and asking it to generate the rest if you understand how these LLMs work, its really not surprising that you can indeed force it to do things like that but it’s an extreme and I’m qith Masnick and the user your responding to on this one myself.

              I also watched most of today’s subcommittee hearing on AI and journalism. A lot of the arguments are that this will destroy local journalism. Look, strong local journalism is some of the most important work that is dying right now. But the grave was dug by these large media companies and hedge funds that bought up and gutted those local news orgs and not many people outside of the industry batted an eye while that was happening. This is a bit of a tangent but I don’t exactly trust the giant headgefunds who gutted these local news journalists ocer the padt deacde to all of a sudden care at all about how important they are.

              Sorry fir the tangent butbheres the article i mentioned thats more on topic - http://mediagazer.com/231228/p11#a231228p11

              • hperrin@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                3
                ·
                9 months ago

                So they gave it the 3 paragraphs that are available publicly, said continue, and it spat out the rest of the article that’s behind a paywall. That sure sounds like copyright infringement.

      • A_Very_Big_Fan@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        9 months ago

        any more than a human reading copyrighted material and making a derivative work.

        It seems obvious to me that it’s not doing anything different than a human does when we absorb information and make our own works. I don’t understand why practically nobody understands this

        I’m surprised to have even found one person that agrees with me

        • BURN@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          Because it’s objectively not true. Humans and ML models fundamentally process information differently and cannot be compared. A model doesn’t “read a book” or “absorb information”

          • A_Very_Big_Fan@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            9 months ago

            I didn’t say they processed information the same, I said generative AI isn’t doing anything that humans don’t already do. If I make a drawing of Gordon Freeman or Courage the Cowardly Dog, or even a drawing of Gordon Freeman in the style of Courage the Cowardly Dog, I’m not infringing on the copyright of Valve or John Dilworth. (Unless I monetize it, but even then there’s fair-use…)

            Or if I read a statistic or some kind of piece of information in an article and spoke about it online, I’m not infringing the copyright of the author. Or if I listen to hundreds of hours of a podcast and then do a really good impression of one of the hosts online, I’m not infringing on that person’s copyright or stealing their voice.

            Neither me making that drawing, nor relaying that information, nor doing that impression are copyright infringement. Me uploading a copy of Courage or Half-Life to the internet would be, or copying that article, or uploading the hypothetical podcast on my own account somewhere. Generative AI doesn’t publish anything, and even if it did I think there would be a strong case for fair-use for the same reasons humans would have a strong case for fair-use for publishing their derivative works.

  • kingthrillgore@lemmy.ml
    link
    fedilink
    English
    arrow-up
    50
    arrow-down
    1
    ·
    9 months ago

    Its almost like we had a thing where copyrighted things used to end up but they extended the dates because money

    • rivermonster@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      2
      ·
      9 months ago

      I was literally about to come in here and say it would be an interesting tangential conversation to talk about how FUCKED copyright laws are, and how relevant to the discussion it would be.

      More upvote for you!

    • Ultraviolet@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      9 months ago

      This is where they have the leverage to push for actual copyright reform, but they won’t. Far more profitable to keep the system broken for everyone but have an exemption for AI megacorps.

  • 800XL@lemmy.world
    link
    fedilink
    English
    arrow-up
    47
    arrow-down
    2
    ·
    9 months ago

    I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.

    Remember when the RIAA got caught with pirated mp3s and nothing happened?

    What a stupid timeline.

      • kiagam@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        ·
        9 months ago

        we should use those who break it as a beacon to rally around and change the stupid rule

      • Grabbels@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        9 months ago

        Except they pocket millions of dollars by breaking that rule and the original creators of their “essential data” don’t get a single cent while their creations indirectly show up in content generated by AI. If it really was about changing the rules they wouldn’t be so obvious in making it profitable, but rather use that money to make it available for the greater good AND pay the people that made their training data. Right now they’re hell-bent in commercialising their products as fast as possible.

        If their statement is that stealing literally all the content on the internet is the only way to make AI work (instead of for example using their profits to pay for a selection of all that data and only using that) then the business model is wrong and illegal. It’s as a simple as that.

        I don’t get why people are so hell-bent on defending OpenAI in this case; if I were to launch a food-delivery service that’s affordable for everyone, but I shoplifted all my ingredients “because it’s the only way”, most would agree that’s wrong and my business is illegal. Why is this OpenAI case any different? Because AI is an essential development? Oh, and affordable food isn’t?

        • afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          9 months ago

          I am not defending OpenAi I am attacking copyright. Do you have freedom of speech if you have nothing to say? Do you have it if you are a total asshole? Do you have it if you are the nicest human who ever lived? Do you have it and have no desire to use it?

  • Milk_Sheikh@lemm.ee
    link
    fedilink
    English
    arrow-up
    53
    arrow-down
    12
    ·
    9 months ago

    Wow! You’re telling me that onerous and crony copyright laws stifle innovation and creativity? Thanks for solving the mystery guys, we never knew that!

  • Alien Nathan Edward@lemm.ee
    link
    fedilink
    English
    arrow-up
    46
    arrow-down
    11
    ·
    9 months ago

    if it’s impossible for you to have something without breaking the law you have to do without it

    if it’s impossible for the artistocrat class to have something without breaking the law, we change or ignore the law

      • Krauerking@lemy.lol
        link
        fedilink
        English
        arrow-up
        16
        arrow-down
        4
        ·
        9 months ago

        Oh sure. But why is it only the massive AI push that allows the large companies owning the models full of stolen materials that make basic forgeries of the stolen items the ones that can ignore the bullshit copyright laws?

        It wouldn’t be because it is super profitable for multiple large industries right?

        • afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          9 months ago

          Just because people are saying the law is bad doesn’t mean they are saying the lawbreakers are good. Those two are independent of each other.

          I have never been against cannabis legalization. That doesn’t mean I think people who sold it on the streets are good people.

  • unreasonabro@lemmy.world
    link
    fedilink
    English
    arrow-up
    34
    arrow-down
    1
    ·
    9 months ago

    finally capitalism will notice how many times it has shot up its own foot with their ridiculous, greedy infinite copyright scheme

    As a musician, people not involved in the making of my music make all my money nowadays instead of me anyway. burn it all down

  • Blackmist@feddit.uk
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    3
    ·
    9 months ago

    Maybe you shouldn’t have done it then.

    I can’t make a Jellyfin server full of content without copyrighted material either, but the key difference here is I’m not then trying to sell that to investors.

      • Shazbot@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        ·
        9 months ago

        Reading these comments has shown me that most users don’t realize that not all working artists are using 1099s and filing as an individual. Once you have stable income and assets (e.g. equipment) there are tax and legal benefits to incorporating your business. Removing copyright protections for large corporations will impact successful small artists who just wanted a few tax breaks.

      • BURN@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        9 months ago

        They protect artists AND protect corporations, and you can’t have one without the other. It’s much better the way it is compared to no copyright at all.

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            1
            ·
            9 months ago

            They’re screwed less than they would be if copyright was abolished. It’s not a perfect system by far, but over restrictive is 100x better than an open system of stealing from others.

            • agitatedpotato@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              9 months ago

              So without copyright, if an artist makes a cool picture and coca cola uses it to sell soda and decided not to give the artist any money, now they have no legal recourse, and that’s better? I don’t think the issue is as much copyright inherently, as much as it is who holds and enforces those rights. If all copyrights were necessarily held by the people who actually made what is copy-written, much of the problems would be gone.

  • whoisearth@lemmy.ca
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    3
    ·
    9 months ago

    If OpenAI is right (I think they are) one of two things need to happen.

    1. All AI should be open source and non-profit
    2. Copywrite law needs to be abolished

    For number 1. Good luck for all the reasons we all know. Capitalism must continue to operate.

    For number 1. Good luck because those in power are mostly there off the backs of those before them (see Disney, Apple, Microsoft, etc)

    Anyways, fun to watch play out.

    • SCB@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      arrow-down
      7
      ·
      edit-2
      9 months ago

      There’s a third solution you’re overlooking.

      3: OpenAI (or other) wins a judgment that AI content is not inherently a violation of copyright regardless of materials it is trained upon.

      • Hedgehawk@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        6
        ·
        9 months ago

        It’s not really about the AI content being a violation or not though is it. It’s more about a corporation using copyrighted content without permission to make their product better.

        • SCB@lemmy.world
          link
          fedilink
          English
          arrow-up
          18
          arrow-down
          9
          ·
          9 months ago

          If it’s not a violation of copyright then this is a non-issue. You don’t need permission to read books.

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            12
            arrow-down
            7
            ·
            9 months ago

            AI does not “read books” and it’s completely disingenuous to compare them to humans that way.

              • BURN@lemmy.world
                link
                fedilink
                English
                arrow-up
                9
                arrow-down
                3
                ·
                9 months ago

                Backed by technical facts.

                AIs fundamentally process information differently than humans. That’s not up for debate.

                • SCB@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  6
                  arrow-down
                  10
                  ·
                  9 months ago

                  Yes this is an argument in my favor, you just don’t understand AI/LLMs enough to know why.

            • whoisearth@lemmy.ca
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              6
              ·
              9 months ago

              Similarly I don’t read “War and Peace” and then use that to go and write “Peace and War”

    • rivermonster@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      9
      ·
      9 months ago

      It’s why AI ultimately will be the death of capitalism, or the dawn of the endless war against the capitalists (literally, and physically).

      AI will ultimately replace most jobs, capitalism can’t work without wage slave, or antique capitalism aka feudalism… so yeah. Gonna need to move towards UBI and more utopian, or just a miserable endless bloody awful war against the capitalists.

  • Ook the Librarian@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    10
    ·
    9 months ago

    It’s not “impossible”. It’s expensive and will take years to produce material under an encompassing license in the quantity needed to make the model “large”. Their argument is basically “but we can have it quickly if you allow legal shortcuts.”

    • NeatNit@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      1
      ·
      9 months ago

      hijacking this comment

      OpenAI was IMHO well within its rights to use copyrighted materials when it was just doing research. They were* doing research on how far large language models can be pushed, where’s the ceiling for that. It’s genuinely good research, and if copyrighted works are used just to research and what gets published is the findings of the experiments, that’s perfectly okay in my book - and, I think, in the law as well. In this case, the LLM is an intermediate step, and the published research papers are the “product”.

      The unacceptable turning point is when they took all the intermediate results of that research and flipped them into a product. That’s not the same, and most or all of us here can agree - this isn’t okay, and it’s probably illegal.

      * disclaimer: I’m half-remembering things I’ve heard a long time ago, so even if I phrase things definitively I might be wrong

      • dasgoat@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        9 months ago

        True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

        That is how these people essentially function, they’re the tax loophole guys that make sure you and I pay less taxes than Amazon. They are scammers who have no regard for ethics and they can and will use whatever they can to reach their goal. If that involves lying about how you’re doing research when in actuality you’re doing product development, they will do that without hesitation. The fact that this product now exists makes it so lawmakers are now faced with a reality where the crimes are kind of past and all they can do is try and legislate around this thing that now exists. And they will do that poorly because they don’t understand AI.

        And this just goes into fraud in regards to research and copyright. Recently it came out that LAION-5B, an image generator that is part of Stable Diffusion, was trained on at least 1000 images of child pornography. We don’t know what OpenAI did to mitigate the risk of their seemingly indiscriminate web scrapers from picking up harmful content.

        AI is not a future, it’s a product that essentially functions to repeat garbled junk out of things we have already created, all the while creating a massive burden on society with its many, many drawbacks. There are little to no arguments FOR AI, and many, many, MANY to stop and think about what these fascist billionaire ghouls are burdening society with now. Looking at you, Peter Thiel. You absolute ghoul.

        • NeatNit@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          9 months ago

          True, with the acknowledgement that this was their plan all along and the research part was always intended to be used as a basis for a product. They just used the term ‘research’ as a workaround that allowed them to do basically whatever to copyrighted materials, fully knowing that they were building a marketable product at every step of their research

          I really don’t think so. I do believe OpenAI was founded with genuine good intentions. But around the time it transitioned from a non-profit to a for-profit, those good intentions were getting corrupted, culminating in the OpenAI of today.

          The company’s unique structure, with a non-profit’s board of directors controlling the company, was supposed to subdue or prevent short-term gain interests from taking precedence over long-term AI safety and other such things. I don’t know any of the details beyond that. We all know it failed, but I still believe the whole thing was set up in good faith, way back when. Their corruption was a gradual process.

          There are little to no arguments FOR AI

          Outright not true. There’s so freaking many! Here’s some examples off the top of my head:

          • Just today, my sister told me how ChatGPT (her first time using it) identified a song for her based on her vague description of it. She has been looking for this song for months with no success, even though she had pretty good key details: it was a duet, released around 2008-2012, and she even remembered a certain line from it. Other tools simply failed, and ChatGPT found it instantly. AI is just a great tool for these kinds of tasks.
          • If you have a huge amount of data to sift through, looking for something specific but that isn’t presented in a specific format - e.g. find all arguments for and against assisted dying in this database of 200,000 articles with no useful tags - then AI is the perfect springboard. It can filter huge datasets down to just a tiny fragment, which is small enough to then be processed by humans.
          • Using AI to identify potential problems and pitfalls in your work, which can’t realistically be caught by directly programmed QA tools. I have no particular example in mind right now, unfortunately, but this is a legitimate use case for AI.
          • Also today, I stumbled upon Rapid, a map editing tool for OpenStreetMap which uses AI to predict and suggest things to add - with the expectation that the user would make sure the suggestions are good before accepting them. I haven’t formed a full opinion about it in particular (and especially wary because it was made by Facebook), but these kinds of productivity boosters are another legitimate use case for AI. Also in this category is GitHub’s Copilot, which is its own can of worms, but if Copilot’s training data wasn’t stolen the way it was, I don’t think I’d have many problems with it. It looks like a fantastic tool (I’ve never used it myself) with very few downsides for society as a whole. Again, other than the way it was trained.

          As for generative AI and pictures especially, I can’t as easily offer non-creepy uses for it, but I recommend you see this video which takes a very frank take on the matter: https://nebula.tv/videos/austinmcconnell-i-used-ai-in-a-video-there-was-backlash if you have access to Nebula, https://www.youtube.com/watch?v=iRSg6gjOOWA otherwise.
          Personally I’m still undecided on this sub-topic.

          Deepfakes etc. are just plain horrifying, you won’t hear me give them any wiggle room.

          Don’t get me wrong - I am not saying OpenAI isn’t today rotten at the core - it is! But that doesn’t mean ALL instances of AI that could ever be are evil.

          • dasgoat@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            3
            ·
            9 months ago

            ‘It’s just this one that is rotten to the core’

            ‘Oh and this one’

            ‘Oh this one too huh’

            ‘Oh shit the other one as well’

            Yeah you’re not convincing me of shit. I haven’t even mentioned the goddamn digital slavery these operations are running, or how this shit is polluting our planet so someone somewhere can get some AI Childporn? Fuck that shit.

            You’re afraid to look behind the curtains because you want to ride the hypetrain. Have fun while it lasts, I hope it burns every motherfucker who thought this shit was a good idea to the motherfucking ground.

  • wosat@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    9 months ago

    This situation seems analogous to when air travel started to take off (pun intended) and existing legal notions of property rights had to be adjusted. IIRC, a farmer sued an airline for trespassing because they were flying over his land. The court ruled against the farmer because to do otherwise would have killed the airline industry.

  • kibiz0r@lemmy.world
    link
    fedilink
    English
    arrow-up
    31
    arrow-down
    11
    ·
    9 months ago

    I’m dumbfounded that any Lemmy user supports OpenAI in this.

    We’re mostly refugees from Reddit, right?

    Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

    But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

    At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

    This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

    There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.

    And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.

    This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.

    But.

    If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

    • flamingarms@feddit.uk
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      9 months ago

      And yet, I believe LLMs are a natural evolutionary product of NLP and a powerful tool that is a necessary step forward for humanity. It is already capable of exceptionally quickly scaffolding out basic tasks. In it, I see the assumptions that all human knowledge is for all humans, rudimentary tasks are worth automating, and a truly creative idea is often seeded by information that already exists and thus creativity can be sparked by something that has access to all information.

      I am not sure what we are defending by not developing them. Is it a capitalism issue of defending people’s money so they can survive? Then that’s a capitalism problem. Is it that we don’t want to get exactly plagiarized by AI? That’s certainly something companies are and need to continue taking into account. But researchers repeat research and come to the same conclusions all the time, so we’re clearly comfortable with sharing ideas. Even in the Writer’s Guild strikes in the States, both sides agreed that AI is helpful in script-writing, they just didn’t want production companies to use it as leverage to pay them less or not give them credit for their part in the production.

      • EldritchFeminity@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        9 months ago

        The big issue is, as you said, a capitalism problem, as people need money from their work in order to eat. But, it goes deeper than that and that doesn’t change the fact that something needs to be done to protect the people creating the stuff that goes into the learning models. Ultimately, it comes down to the fact that datasets aren’t ethically sourced and that people want to use AI to replace the same people whose work they used to create said AI, but it also has a root in how society devalues the work of creativity. People feel entitled to the work of artists. For decades, people have believed that artists shouldn’t be fairly compensated for their work, and the recent AI issue is just another stone in the pile. If you want to see how disgusting it is, look up stuff like “paid in exposure” and the other kinds of things people tell artists they should accept as payment instead of money.

        In my mind, there are two major groups when it comes to AI: Those whose work would benefit from the increased efficiency AI would bring, and those who want the reward for work without actually doing the work or paying somebody with the skills and knowledge to do the work. MidJourney is in the middle of a lawsuit right now and the developers were caught talking about how you “just need to launder it through a fine tuned Codex.” With the “it” here being artists’ work. Link The vast majority of the time, these are the kinds of people I see defending AI; they aren’t people sharing and collaborating to make things better - they’re people who feel entitled to benefit from others’ work without doing anything themselves. Making art is about the process and developing yourself as a person as much as it is about the end result, but these people don’t want all that. They just want to push a button and get a pretty picture or a story or whatever, and then feel smug and superior about how great an artist they are.

        All that needs to be done is to require that the company that creates the AI has to pay a licensing fee for copyrighted material, and allow for copyright-free stuff and content where they have gotten express permission to use (opt-in) to be used freely. Those businesses with huge libraries of copyright-free music that you pay a subscription fee to use work like this. They pay musicians to create songs for them; they don’t go around downloading songs and then cut them up to create synthesizers that they sell.

    • Milk_Sheikh@lemm.ee
      link
      fedilink
      English
      arrow-up
      5
      ·
      9 months ago

      Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided… This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

      The internet is genuinely already trending this way just from LLM AI writing things like: articles and bot reviews, listicle and ‘review’ websites that laser focus for SEO hits, social media comments and posts to propagandize or astroturf…

      We are going to live and die by how the Captcha-AI arms race is ran against the malicious actors, but that won’t help when governments or capital give themselves root access.

  • S410@lemmy.ml
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    24
    ·
    9 months ago

    They’re not wrong, though?

    Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

    Things that we don’t even think about as “protected works” are in fact just that. Doesn’t matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don’t know about you, but I’ve never seen a license notice attached to a napkin doodle.

    Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren’t licensed to use. You wouldn’t end up with a person well suited to exist in the world. They’d lack education regarding science, technology, they’d lack understanding of pop-culture, they’d know no brand names, etc.

    Machine learning models are similar. You can train them that way, sure, but they’d be basically useless for real-world applications.

    • AntY@lemmy.world
      link
      fedilink
      English
      arrow-up
      51
      arrow-down
      4
      ·
      9 months ago

      The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

      • BURN@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        9 months ago

        Also an “AI” is not human, and should not be regulated as such

            • afraid_of_zombies@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              2
              ·
              9 months ago

              I don’t think it is. We have all these non-human stuff we are awarding more rights to than we have. You can’t put a corporation in jail but you can put me in jail. I don’t have freedom from religion but a corporation does.

              • BURN@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                9 months ago

                Corporations are not people, and should not be treated as such.

                If a company does something illegal, the penalty should be spread to the board. It’d make them think twice about breaking the law.

                We should not be awarding human rights to non-human, non-sentient creations. LLMs and any kind of Generative AI are not human and should not in any case be treated as such.

                • afraid_of_zombies@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  2
                  ·
                  9 months ago

                  Corporations are not people, and should not be treated as such.

                  Understand. Please tell Disney that they no longer own Mickey Mouse.

      • S410@lemmy.ml
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        20
        ·
        9 months ago

        Not necessarily. There’s plenty that are open source and available for free to anyone willing to provide their own computational power.
        In cases where you pay for a service, it could be argued that you aren’t paying for the access to the model or its results, but the convenience and computational power necessary to run the model.

    • Exatron@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      6
      ·
      9 months ago

      The difference here is that a child can’t absorb and suddenly use massive amounts of data.

      • S410@lemmy.ml
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        15
        ·
        edit-2
        9 months ago

        The act of learning is absorbing and using massive amounts of data. Almost any child can, for example, re-create copyrighted cartoon characters in their drawing or whistle copyrighted tunes.

        If you look at, pretty much, any and all human created works, you will be able to trace elements of those works to many different sources. We, usually, call that “sources of inspiration”. Of course, in case of human created works, it’s not a big deal. Generally, it’s considered transformative and a fair use.

        • Exatron@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          9 months ago

          The problem is that a human doesn’t absorb exact copies of what it learns from, and fair use doesn’t include taking entire works, shoving them in a box, and shaking it until something you want comes out.

          • S410@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            9 months ago

            Expect for all the cases when humans do exactly that.

            A lot of learning is, really, little more than memorization: spelling of words, mathematical formulas, physical constants, etc. But, of course, those are pretty small, so they don’t count?

            Then there’s things like sayings, which are entire phrases that only really work if they’re repeated verbatim. You sure can deliver the same idea using different words, but it’s not the same saying at that point.

            To make a cover of a song, for example, you have to memorize the lyrics and melody of the original, exactly, to be able to re-create it. If you want to make that cover in the style of some other artist, you, obviously, have to learn their style: that is, analyze and memorize what makes that style unique. (e.g. C418 - Haggstrom, but it’s composed by John Williams)

            Sometimes the artists don’t even realize they’re doing exactly that, so we end up with with “subconscious plagiarism” cases, e.g. Bright Tunes Music v. Harrisongs Music.

            Some people, like Stephen Wiltshire, are very good at memorizing and replicating certain things; way better than you, I, or even current machine learning systems. And for that they’re praised.

            • Exatron@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              9 months ago

              Except they literally don’t. Human memory doesn’t retain an exact copy of things. Very good isn’t the same as exactly. And human beings can’t grab everything they see and instantly use it.

              • S410@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                edit-2
                9 months ago

                Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.

                Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

                Here’s an image from an article claiming that machine learning image generators plagiarize things.

                However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

                If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.