By Luca Belli
After testifying in the US Senate, urging lawmakers to regulate artificial intelligence, the CEO of OpenAI arrived in Rio de Janeiro to debate the future of technology.
Despite all the apocalyptic scenarios recently floated in open letters by various researchers and stars like Elon Musk, I confess that my biggest concern is not the future of AI, but the present.
The current scenario seems to demonstrate not only the clear need to regulate AI but a likely non-compliance with existing regulations, particularly with regard to personal data protection frameworks such as the Brazilian General Data Protection Law (LGPD).
Such disregard for existing data protection obligations is largely driven by two factors. First, the huge market incentive that encourages companies and startups to release new generative AI systems as quickly as possible, with very questionable or simply non-existent impact assessments, to be the first to receive juicy rounds of investments and capture the attention of users.
Secondly, a certain lethargy of the regulatory authorities of the vast majority of the 162 countries that currently have general data protection laws in recognising that many generative AI systems are in total breach of such laws.
FREE READ of the day by MediaNama: Click here to sign-up for our daily newsletter with the top story of the day delivered daily before 9 AM in your inbox.
In the following paragraphs, I will try to illustrate the abovementioned points based on a personal experience, analysing why services like ChatGPT, developed by OpenAI, violate the LGPD in multiple ways and why I took the initiative to petition the Brazilian Data Protection Authority — the ANPD — so that such non-compliance will be properly investigated.
On April 10th, I sent an email request – the only communication channel available on OpenAI’s website – to receive information on the following points:
- The identity of the data controller of ChatGPT, which according to all existing data protection laws, including LGPD, must be explicitly disclosed.
- Access to all of my personal data held by OpenAI, including those utilised to train the Large Language Models (LLMs) that power ChatGPT, as well as clear and precise information about the origin of said data, the form, and duration of the processing of said data by OpenAI, in accordance with articles 9, 11, 18 and 19 of the LGPD.
- Clear and adequate information regarding the criteria and procedures used by the ChatGPT service to prepare answers when asked about “who is Luca Belli” or “where does Luca Belli live”, considering that the answers to such questions, clearly prepared based solely on the treatment automated collection of personal data, include a wide range of Luca Belli’s personal data together with a wide range of information that is factually wrong, but presented as true, therefore directly affecting my interests, pursuant to article 20 of the LGPD.
After receiving a simple automatic response in English – despite having written in Portuguese as any other Brazilian is entitled to do – I sent a further email to solicit a response or definition of a deadline for a response. It was only a month later that I received another automated response – again in English, despite having written in Portuguese – without offering any useful elements to answer the questions raised on April 10th, merely explaining the introduction of some new features by OpenAI to facilitate data portability.
It was on that day that I began to seriously suspect that the ChatGPT service might be in complete breach of articles 9, 11, 18, 19, 20, 37, 39, and 41 of the LGPD.
It should be noted that the true Gordian knot is not the collection by the ChatGPT service of its users’ data, but what personal data is processed and how to create the systems that allow the functioning of ChatGPT and other generative AI, the so-called Large Language Models (LLMs). Questions regarding the processing of personal data by ChatGPT have already been raised very pertinently by the Italian Data Protection Authority, the Garante, which led to the blocking of ChatGPT access in Italy by OpenIA for a month until the new data control features were introduced.
ChatGPT does not differentiate between true and false information available on the Internet
To understand the second point, it is important to recall that the LLMs that enable generative AI systems like ChatGPT are trained on huge databases, typically scraped from the Internet, processing enormous amounts of information accessible online, including personal data.
Such information – think of any web page, Wikipedia entry, etc. – is used as a freely available resource. However, personal data is not a freely available resource. This core issue is the very reason why data protection frameworks exist in the 162 countries mentioned above, including Brazil.
Any existing data protection law requires the processing of personal data transparently, including regarding training LLMs (which, of course, qualifies as processing personal data) to support AI services. Such transparency entails specifying what data is processed and for what purpose, what is the legal basis for processing, who is the person responsible for defining how data will be processed i.e. the data controller, etc. All these basic elements are not even mentioned on the ChatGPT website.
In addition, as highlighted above, when anyone asks the chatbot “tell me everything you know about Luca Belli”, the service responds in both detailed and fanciful ways, mixing true personal data with credible but false information. In fact, the answers are really convincing, despite their inaccuracy.
This is actually one of the main peculiarities of generative AI systems based on LLMs: they are structured to elaborate responses that seem syntactically perfect, but do not care to check whether such results are reliable.
In this sense, the system is absolutely not an example of “intelligence”. Rather, it is simply unable to differentiate between true and false information. The systems operate as so-called “stochastic parrots”, that is, thanks to training on billions of linguistic elements, it manages to identify which word, statistically, should follow the precedents to offer the result that is most similar to human language.
But if it is not programmed to offer results based on real information, it simply does not consider the truth as a necessary element for its functioning.
Is generative AI necessarily incompatible with data protection?
AI systems can hardly be defined as compatible with data protection when they are trained with any data (including personal) available online without the consent of the holder or any other legal basis allowing the processing, without guaranteeing the full enjoyment of the rights of data subjects.
Does it mean that generative AI is necessarily incompatible with data protection? Absolutely not. It simply means that more time, care and impact assessments need to be devoted to training systems to make sure they comply with current regulations.
Tech companies have a huge market incentive to be the first to deploy new AI systems to capitalise on such innovation (for OpenAI it was USD 10 billion ). However, society’s interest is not to have AI as soon as possible, but to have the technology safe as possible, and as law compliant as possible.
Nobody’s life would have changed negatively if ChatGPT had been available a year later, or maybe two, after testing it and complying with existing legislation — which was clearly not uppermost in the list of concerns, despite the existence of legislative obligations in this regard.
It is a fallacy to argue that AI cannot be developed legally. It is not only possible but mandatory to develop AI systems in compliance with the law.
Present AI applications should comply with the law
In Brazil, as in the other 161 countries, society has already chosen to have a solid data protection framework. However, what we must understand is that it is simply useless to enshrine the protection of personal data in the Constitution, in the law and create an authority, if then, in practice, the normative framework continues to be ignored not only without any consequences but even being rewarded by the market.
AI has enormous potential to help humanity and everyone should be able to benefit from its innovation. But the companies that are at the forefront of its development are clearly demonstrating to us that the biggest risk we have is not that AI will become conscious to subdue humanity. It is rather that companies deploy AI systems without caring about existing obligations, just to be the first to bring them to market.
It should be noted that instead of travelling around Latin America to pontificate about the future and pontificating on the need for AI regulation in the US Senate, the OpenAI executive and his colleagues should have considered the need to comply with regulations that already exist.
Our biggest problem is not avoiding a future in which AI subjects humans, but dealing seriously with a present in which certain humans believe they are not subject to the law, and such a belief is awarded billions by the market.
Luca Belli is Professor at FGV Law Schoool and Coordinator at Center for Technology & Society.
This post is released under a CC-BY-SA 4.0 license. Please feel free to republish on your site, with attribution and a link. Adaptation and rewriting, though allowed, should be true to the original.
- ChatGPT introduces new setting that enables users to turn-off chat history
- ChatGPT introduces new setting that enables users to turn-off chat history
- Italy imposes temporary ban on OpenAI’s ChatGPT over privacy concerns
- Europol Report Highlights Criminal Use Cases of Generative AI Models Such as ChatGPT
You must be logged in to post a comment Login