I see, so your argument is that because the training data is not stored in the model in its original form, it doesn’t count as a copy, and therefore it doesn’t constitute intellectual property theft. I had never really understood what the justification for this point of view was, so thanks for that, it’s a bit clearer now. It’s still wrong, but at least it makes some kind of sense.
If the model “has no memory of training data images”, then what effect is it that the images have on the model? Why is the training data necessary, what is its function?
the training data is not stored in the model in its original form,
It is not stored in the model, period. Same as you do not store the shape of the letters you’re reading right now, not even the words, but their overall meaning. Remembering the meaning of what I write here, you can then produce words and letters again and you might be close but even with this short paragraph you’ll find it very hard to make an exact replica. That’s because you did not store it in its original form, not even compressed, you re-encoded it using your own understanding of language, of the world, of everything.
I see, so your argument is that because the training data is not stored in the model in its original form, it doesn’t count as a copy, and therefore it doesn’t constitute intellectual property theft. I had never really understood what the justification for this point of view was, so thanks for that, it’s a bit clearer now. It’s still wrong, but at least it makes some kind of sense.
If the model “has no memory of training data images”, then what effect is it that the images have on the model? Why is the training data necessary, what is its function?
Here’s a video explaining how diffusion models work, and this article by Kit Walsh, a senior staff attorney at the EFF.
It is not stored in the model, period. Same as you do not store the shape of the letters you’re reading right now, not even the words, but their overall meaning. Remembering the meaning of what I write here, you can then produce words and letters again and you might be close but even with this short paragraph you’ll find it very hard to make an exact replica. That’s because you did not store it in its original form, not even compressed, you re-encoded it using your own understanding of language, of the world, of everything.