Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

It's hard to guess who you're talking about but that is not what Google does. They scrape public web servers, respecting robots.txt directives and then they provide links back to the original site.

And if someone hosts user-submitted copyrighted information without a license, the copyright holder can submit a complaint via DCMA and the hoster has safe harbor from liability.

OpenAI is literally scraping copyrighted works, packing it up and selling it without a license. Art, books, magazines, everything. No safe harbor from DMCA for doing that.



I’m willing to admit that training a model on a book is not exactly the same as a person reading a book and remembering what they learned. Are you willing to admit that training a model on the book is not the same thing as copying that book?

A big problem here is that copyright law was a massively problematic thing even before transformer model tech was developed. It has been distorted beyond recognition by a few companies, who have state everything on a broad and permanent copyright rather than the limited one we started out with.

We probably need new definitions in the law, because pretending that training a model on some thing equals copying it isn’t based in reality. It’s an emotional appeal meant to gin up outrage.


Are you willing to admit that training a model on the book is not the same thing as copying that book?

This type of interaction is not helpful. It's an argumentative strawman argument. Who said training a model is the same thing as copying a book? Of course it isn't. But who said it had to be?

Here's an idea... read the first five books in the series "A Song of Ice and Fire". Then sit down and write your own version of Book 6 to continue the story and sell it without a license. Guess what's going to happen? You will be sued into bankruptcy.

What OpenAI is doing is lot more similar to that than literally copying things. And it's still wrong and illegal.

It has been distorted beyond recognition by a few companies, who have state everything on a broad and permanent copyright

I agree with you, what Disney and others have done with copyright extensions is immoral and should be illegal. But it's not illegal.

pretending that training a model on some thing equals copying it isn’t based in reality

No, it isn't based in reality. Which is why nobody made the claim. They're packaging up a derivative work and selling it. Don't have to look hard to see examples that this is just as infringing as outright copying.


> Which is why nobody made the claim. They're packaging up a derivative work and selling it.

Pardon me. I did conflate the overall claim that it inherently violates copyright law with a more specific claim (not made) that it "is copying." Since copyright also enumerates the "making derivatives" rights as well as the "copy rights" I acknowledge you have in your argument more than the zero legs to stand on that i implied.

> They're packaging up a derivative work and selling it. Don't have to look hard to see examples that this is just as infringing as outright copying.

This is an interesting claim. It rests on the question of whether the model itself is a derivative work, or if it's a tool (or something between a tool and a trained person).

A photocopier can be used to reproduce ASOIAF and a word processor can be used to create a blatantly derivative work, but I assume we agree that that isn't the problem of Xerox or Microsoft. The derivative works produced with those technologies are the 'illegal' items, not the programs that were used to build them.

If I wrote my own GoT fanfiction, ripping off whole characters, names, and settings, and read my own stories in the privacy of my own home, am I breaking any copyright law? I don't think I would be. I would rightly get in hot water if I tried to sell them, and would probably rightly get in hot water even if I just posted them to Github for free given that I'm distributing the derivative works.

I think using AI tools to generate derivative works could place the user (Not OpenAI, etc) in rightful legal jeopardy if they distribute or sell those works -- on the other hand, if they are simply keeping them for their own personal enjoyment I think it's not that different than if they wrote them themselves. (I also think that rightsholders are acting a little paranoid with those concerns, as though anyone would seriously choose not to buy the latest book or movie or painting only because some poor AI knock-offs exist, but I acknowledge that has little bearing on whether some action is or is not legal.)


I think using AI tools to generate derivative works could place the user (Not OpenAI, etc) in rightful legal jeopardy if they distribute or sell those works

I'm not sure I understand this. It is OpenAI that is scraping the copyrighted works, packaging them up into a derivative work and selling access to it.

If a user never enters a prompt asking ChatGPT to create a new ASOIAF book, pieces of those previous copyrighted books are still in OpenAI's model and available for sale by OpenAI.

Chat-GPT the LLM itself is the derivative work that OpenAI is selling access to.


We're trending deeply into RMS's the right to read here.

I mean, lets say I am a storywriter and I have an exceptional memory when reading books, and you buy access to talk to me to get story ideas (as a human being, no API). Lets say I also read ASOIAF. Are you telling me that anything I write that mentions winter is now intellectual property of GRRM?

In my eyes your idea of derivative work can fuck right off. Pieces of those of that copyright are also in my mind, but I give no ownership, nor any privilege's to said book writers. IP holders do not get all the benefit of free data in society, then hold the rest of us hostage.


Ok. I guess we just disagree then. In my view, that model doesn’t “contain” the works. It contains lists of numbers (and not in the ASCII sense that a silly rebuttal would make, I mean only the tokens and weights) and not “pieces of” the books. If I published a statistical analysis of word frequency in your books, I don’t think you’d have a slam dunk CI case against me. Even if someone could use those to generate some passages of your book. It certainly can’t generate the whole book, we can plainly see that (otherwise OpenAI has actually invented magic compression). Just as if you sold consulting services, and employed people who had read those books many times and sold your service to budding fantasy authors to help them write better, those consultants are not themselves derivative works just because they learned the material. The derivative work would be those people’s output (if it rips off that material).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: