Large Language Lies and the Law: some thoughts on #noyb vs OpenAI on false LLM output.
Some key legal observations:
1) GDPR vs privacy law
Concerning fake news about individuals, one might claim that personality rights, guaranteed by EU Member State tort law, are the right legal venue, not the GDPR. The German Constitutional Court has made remarks in this direction in its "Recht auf Vergessen I" case (on the right to be forgotten) in 2019. However, this would be wrong IMO. The GDPR fully applies to fake news concerning personality aspects of data subjects and, if anything, takes primacy vis-à-vis Member State tort law. For a deep dive on this, see my habitation, open access here, pages 525 ff.: https://lnkd.in/eVGWz_Qm
2) Inspiration from personality rights
Personality rights can be an inspiration, though. In 2013, the German Federal Court (BGH) decided the famous Autocomplete case. It held that Google is responsible for libelous autocompletes, for example those suggesting that the wife of the former German president was once a prostitute. However, no duty to proactively filter autocompletes. Rather, companies need to set up a notice and action mechanism, correcting output once put on notice (cf. DSA)
This can be adapted to LLMs: autocomplete models are small lang. models. User expectations concerning the autocomplete's accuracy were decisive for the BGH. Despite disclaimers, such expectations may be said to exist with LLMs, too. Hence, to the very least, personality tort law suggests that OpenAI etc. need to implement an effective notice and action mechanism.
More discussion in our paper: https://lnkd.in/edDP2hpp
See also the excellent, long analysis by Prof Sandra Wachter et al.: https://lnkd.in/eTQswCMK
3) GDPR accuracy principle
Art. 5(1)(d) GDPR enshrines the accuracy principle. Personal data must be correct and up-to-date. However, this principle is not without limits (cf. Rec. 39). IMO, it must be weighed against countervailing fundamental rights, such as those enjoyed by the LLM providers. Hence, I would argue for a de minimis threshold. The result would be basically the same as under personality tort law. Important false information violates the principle; trivial information perhaps does not (e.g., if ChatGPT gets my birthday wrong, unless that is important in the setting).
4) Remedies
a) Fines and damages, also for immaterial harm
b) Users can potentially demand correction; difficult questions concerning the right to erasure wrt the information encoded in the model
c) In serious cases, users may enjoin companies from offering the model until the problem has been fixed
This is why this case is so powerful, in my view: data protection and copyrights are the cases that can "break" an LLM as they can lead to injunctions and, potentially, model deletion.
Comments welcome, legal history in the making!