Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I have many GPL projects (e.g. https://github.com/rochus-keller/Oberon, https://github.com/rochus-keller/Luon, https://github.com/rochus-keller/Micron) and spend a significant amount of time in them. GPL has always explicitly permitted commercial use; that's a feature, not a bug, dating back to Stallman's original vision. Any person or company can use my code (or Kefir code) under the terms of the GPL, as I use code given away by companies under GPL or even more liberal licences for free. That's the deal. GPL is a license explicitly designed to maximize use, so it doesn't make sense to object to a specific form of use. The claim that AI companies are somehow violating GPL by training on GPL code is legally baseless (I studied law here in Switzerland and had lectures about international IP law); also the FSF itself has not claimed otherwise; even if it were prohibited, it would be a copyright enforcement problem, and not a reason to stop publishing. I don't know Kefir, but it looks like a great (even optimizing) compiler. So it's really a pitty that its development is no longer open source.
 help



The GPL, unlike the BSD and such, intends to prevent the closing of distributed derivative works. LLMs trained on GPL code can produce derivative works without any enforcement mechanism.

You may be fine with that, but the GPL is not a public domain license, and LLM training treats all things as if they were public domain.


> LLMs trained on GPL code can produce derivative works

This confuses two completely separate things. GPL governs distribution of derivative works. An LLM trained on GPL code does not distribute that code. The model weights are not a copy, a derivative, or a distribution of the training data in any legally recognizable sense; "influenced by" is not "derived from". The enforcement argument is a non sequitur; the GPL has never had a technical enforcement mechanism; it's always been legally enforced after the fact by copyright holders who discover violations. So if the LLM would indeed produce output sufficiently similar to my code and someone would publish it in violation of GPL, I have the same legal means to enforce my rights as if the code was copied by a human.


> An LLM trained on GPL code does not distribute that code.

You can't simply make that assertion. You'll have to prove that LLMs do not actually contain encoded copies of copyrighted code and that they are incapable of reproducing such code verbatim.

There is no evidence for such a claim, and so your entire argument is completely baseless.


> You'll have to prove that LLMs do not actually contain encoded copies

In law, the presumption is that an act is lawful unless proven otherwise. The burden lies on whoever claims a violation occurred. I already went into the case of sufficiently similar reproduction in my previous response.


I mean… it's been common knowledge for a while that they do in fact contain the original data.

https://www.reddit.com/r/programming/comments/oc9qj1/copilot...

You can disagree all you want, but there's ample evidence of this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: