In the EU, opt-out is not a legally valid way to obtain the necessary consent. H...

booi · 2026-03-27T22:36:20 1774650980

probably by paying the fine and doing it anyway

justinclift · 2026-03-27T23:22:22 1774653742

s/fine/lawyers/

x0x0 · 2026-03-27T22:37:17 1774651037

For personal data. I don't believe you can reasonably claim code is personal data any more than a hammer is your personal data.

layer8 · 2026-03-27T22:51:06 1774651866

Every Git commit is likely to contain personal data, in the form of the author’s name and email address usually present in a commit’s metadata. Furthermore, unless GitHub is prohibiting users from submitting personal data via their ToS (which, given the above, would be impractical), the only thing that matters is whether the data in fact contains personal data or not. GitHub cannot just assume that it doesn’t. And processing that data for new purposes requires user consent.

fph · 2026-03-27T23:10:49 1774653049

By that logic, you can't use any user input to train an LLM, because what if they decide to write their own name.

layer8 · 2026-03-27T23:15:46 1774653346

Indeed, you can’t unless you have appropriate consent. Which isn’t difficult to obtain if you have clearly defined purposes, but you have to do it.

x0x0 · 2026-03-28T00:02:19 1774656139

Since commits aren't code, that's no problem.

The idea that because any piece of code could possibly contain some personal data -- while 99.99% of it doesn't -- that therefore the entirety is PD is not supported by the gdpr. You could as well say any text field anywhere can hypothetically have someone type their name and is thus personal data as well.

layer8 · 2026-03-28T00:59:28 1774659568

The current change applies to all input and output from and to Copilot. This can be used to create profiles about personal preferences, for example.

Personal data is about identifying a person and relating information to that person. A name in an unrelated text field isn’t personal data if you can’t tell the relation between the name and the person who input it, or any surrounding data. The contents of a repository, however, and the interaction with Copilot, can very well help identifying the account holder and their personal data. For example, I might be processing personal health data identifiable as such in a private repository with the help of Copilot.

x0x0 · 2026-03-28T20:02:59 1774728179

> This can be used to create profiles about personal preferences

And since it's not, so what?

> I might be processing personal health data identifiable as such in a private repository with the help of Copilot.

That remains nonsense. The fact that you could put PD in a place not intended to hold PD does not magically transform entire datasets into PD because 1 record may contain it. This is covered in a24 (risk-based), and multiple edpb discussions of proportionate measures. There is zero requirement to guarantee anything collected for a different purpose is not misused by the user, assuming you're not encouraging that misuse.

johndough · 2026-03-27T22:57:59 1774652279

Code often contains personal data. Here are over 400 files on GitHub with email addresses:

https://grep.app/search?regexp=true&q=%5Ba-z%5D%7B8%2C%7D%5C...

For example, license files often contain names and many package managers require a contact person.

When this goes to court, GitHub will probably make the excuse that they somehow did not know that people upload personal data, but the fact that this happens so often that they had to make a secret scanner to stop people from uploading their private keys will prove them as liars.