"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Star@sopuli.xyz · 10 months ago

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Lemminary@lemmy.world · edit-2 10 months ago

They’re not serving you the exact content they scraped, and that makes all the difference.

localhost443@discuss.tchncs.de · 10 months ago

Well if you believe that you should look at the times lawsuit.

Word for word on hundreds/thousands of pages of stolen content, its damming

Lemminary@lemmy.world · 10 months ago

Why do you assume that I haven’t? The case hasn’t been resolved and it’s not clear how The NY Times did what they claim, which is may as well be manipulation. It’s a fair rebuttal by OpenAI. The Times haven’t provided the steps they used to achieve that.

So unless that’s cleared up, it’s not damming in the slightest. Not yet, anyway. And that still doesn’t invalidate my statement above, because it’s still under very specific circumstances when that happens.

Emy@lemmy.world · 10 months ago

Also intention is pretty important when determining the guilt of many crimes. OpenAI doesnt intentionally spit back an author’s exact words, their intention is to summarize and create unique content.

pm_me_your_titties@lemmy.world · 10 months ago

Ah, yes. The defense of “I didn’t mean to do it.” Always a classic.

Lemminary@lemmy.world · 10 months ago

No, the real defense is “that’s not how LLMs work” but you are all hinging on the wrong idea. If you so think that an LLM is capable of doing what you claim, I’d love to hear the mechanism in detail and the steps to replicate it.

whofearsthenight@lemm.ee · 10 months ago

I mean, I’m not sure why this conversation even needs to get this far. If I write an article about the history of Disney movies, and make it very clear the way I got all of those movies was to pirate them, this conversation is over pretty quick. OpenAI and most of the LLMs aren’t doing anything different. The Times isn’t Wikipedia, most of their stuff is behind a paywall with pretty clear terms of service and nothing entitles OpenAI to that content. OpenAI’s argument is “well, we’re pirating everything so it’s okay.” The output honestly seems irrelevant to me, they never should have had the content to begin with.

Lemminary@lemmy.world · 10 months ago

That’s not the claim that they’re making. They’re arguing that OpenAI retains their work they made publicly available, which OpenAI claims is fair use because it’s wholly transformative in the form of nodes, weights and biases, and that they don’t store those articles in a database for reuse. But their other argument is that they created a system that threatens their business which is just ludicrous.

Cethin@lemmy.zip · 10 months ago

It’s great how for most of us we’re taught that just changing the order of words is still plagerism. For them they frequently end up using the exact same words as other things and people still argue it somehow is intelligent and somehow not plagerism.

Lemminary@lemmy.world · 10 months ago

“Changing the order of words” is what it does? That’s news to me. And do you have examples of it “using the exact same words as other things” without prompt manipulation?

asret@lemmy.zip · 10 months ago

Why does the prompting matter? If I “prompt” a band to play copyrighted music does that mean they get a free pass?

Lemminary@lemmy.world · edit-2 10 months ago

That’s not a very good analogy because the band would be reproducing an entire work of art which an LLM does not and cannot. And by prompt manipulation I mean purposely making it seem like the LLM is doing something it wouldn’t do on its own. The operating word is seem, which is what I meant by manipulation. The prompting here is irrelevant, but how it’s done is. So unless The Times releases the steps they used to get ChatGPT to output what it did, you can’t really claim that that’s what it does.

In a blog post, OpenAI said the Times “is not telling the full story.” It took particular issue with claims that its ChatGPT AI tool reproduced Times stories verbatim, arguing that the Times had manipulated prompts to include regurgitated excerpts of articles. “Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI said.

stewsters@lemmy.world · edit-2 10 months ago

If you passed them a sheet of music I’d say that’s on you, it would be your responsibility to not sell recordings of them playing it.

Just like if I typed the first chapter of Harry Potter into word it is not Microsoft’s intent to breach copyright, it would have been my intent to make it do it. It would be my responsibility not to sell that first chapter, and they should come after me if I did, even though MS is a corporation who supplied the tools.

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Yani Bellini Saibene (@yabellini@fosstodon.org)