Discover more:ChatGPT, Data Privacy, data scraping, Free Reads, Generative AI, Views

Views: Can generative AI collect our data from the Internet?

Is it safe to consider all “publicly available data” as public?

Published

April 28, 2023

By Sreenidhi Srinivasan and Pallavi Sondhi

Chat GPT can write sonnets, code websites, and even pass the bar exam. It learned how to do this by training on huge amounts of data. A lot of this data is personal information about individuals scraped from the Internet, often without them knowing.

Catching on to this, last month, Italy’s data protection regulator stopped Chat GPT’s operations over a breach of their data norms.

India is still finalising its data protection law. Against the backdrop of Italy’s action, we discuss how Chat GPT would fare under India’s proposed law, and if there are lessons for us to draw from this episode.

STAY ON TOP OF TECH POLICY: Our daily newsletter with the top stories of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!

Chat GPT under the scanner across the EU

Italy’s ban on ChatGPT was prompted by a few reasons:

There was no legal basis to justify the massive collection of data to train Chat GPT’s algorithms.
Open AI did not have appropriate age-gating mechanisms to ensure that children’s data was not collected to train algorithms.
The company didn’t give people adequate notice before collecting their data.
Chat GPT gave out factually incorrect information.

Italy had also earlier restricted “Replika”, an AI-powered chatbot, over similar grounds. Taking a cue from Italy, regulators in Germany, Spain, France, and Ireland are exploring actions.

Italy has now asked OpenAI to abide by certain norms for the ban to be lifted. Open AI must publish information about its data processing and must clarify the legal basis for processing personal data for training its AI. It must allow users to seek correction of inaccurate data or its deletion and allow users to object to OpenAI’s use of their personal data to train its algorithms.

While Italy’s approach raises several interesting questions, we focus on one key issue – training AI models by using data that’s available freely and publicly. Think public social media profiles, news pieces, Reddit posts, and so on.

Is data from public sources ‘private’?

Chat GPT’s technical paper says its training data includes “publicly available personal information”. Under EU law, any data that can identify an individual is ‘personal information’. To collect and use such data, a business must meet privacy norms, regardless of whether it’s collected from the individual directly or is available publicly and freely.

Interestingly, under India’s current data protection law – rules under the Information Technology Act, data that is “freely available” or “accessible in public domain” is not considered sensitive data. And so, for collecting and using such publicly available information, you need not abide by data protection rules.

But the draft Digital Personal Data Protection Bill 2022 (India’s current draft data protection law) takes a different position. One that’s similar to the EU approach. Even if you collect data from public sources, if it relates to an identifiable individual, it is ‘personal’. And all do’s and don’ts that attach to collection and use of personal data apply to it (with one exception – around deemed consent).

How can data be collected and used to train AI models?

In the EU, even if a business is collecting/ scraping personal information off the Internet, it must still justify its collection and use under one of six legal ‘bases’ set out in the GDPR. User consent is one basis. Another is fulfilling a contract. But the one that is often used for training AI algorithms or for improving a product is “legitimate interests” of a business.

As such, India’s draft law doesn’t require the data collector to have legal bases. However, to collect and use personal data, a platform must get users’ consent or deemed consent, i.e. either you get actual consent from individuals or your collection/ use of data falls within one of the ‘deemed consent’ grounds recognised in law, such as processing data for complying with a court order or responding to a medical emergency or a public health response or processing data for ‘reasonable purposes’ recognised by the Indian government.

‘Deemed consent’ may help in training AI

Taking repeated consent to collect data for training AI models is cumbersome. So developers are likely to consider two “deemed consent” grounds that could be relevant here.

One, under the draft law, consent can be assumed when you are processing “publicly available personal data” in “public interest. Say, if a platform scoops up a public Reddit thread where users discuss their worst dating encounters, to train its algorithm. Does the AI developer not need to take users’ consent separately to process this data since it is publicly available?

Two, consent can be inferred when an individual voluntarily provides her information and can be reasonably expected to do so. For e.g., a user signs up on Reddit. Reddit’s privacy policy says “Much of the information on the Services is public and accessible to everyone, even without an account. By using the Services, you are directing us to share this information publicly and freely.” Can the user’s catch-all consent to the privacy policy be considered as consent to sharing of their data with AI models like Chat GPT and Open AI’s use of that data for training algorithms?

Interestingly, platforms like Reddit are going to start charging AI developers for accessing their content. But the question of consent/ deemed consent would remain.

Using data to train AI models- A reasonable purpose?

As India seeks to establish itself as an AI powerhouse, it would be worth exploring if the use of data to train AI models should be a ‘reasonable purpose’ under India’s data protection law. This should be subject, of course, to appropriate checks and balances. For instance, similar to Italy’s guidance, individuals could be allowed the right to object to the use of their personal data for training AI models – an opt-out rather than an opt-in.

Sreenidhi Srinivasan is a Partner and Pallavi Sondhi is a Senior Associate at Ikigai Law.

This post is released under a CC-BY-SA 4.0 license. Please feel free to republish on your site, with attribution and a link. Adaptation and rewriting, though allowed, should be true to the original.

Also Read:

Discover more:ChatGPT, Data Privacy, data scraping, Free Reads, Generative AI, Views

Written By Guest Author

News

TikTok Criticises US Ban Or Divest Bill, Vows To Fight In Court

"We believe the facts and the law are clearly on our side, and we will ultimately prevail," the company said on the enactment of...

Sharveya Parasnis24 hours ago

News

It will take a multi-year investment cycle before Meta’s AI offerings become profitable: Insights from Meta’s Earnings Call

Zuckerberg expressed confidence in monetizing AI through methods like ads and paid access to larger models, leveraging Meta's successful history with scaled technologies.

Kamya Pandey1 day ago

News

ICICI bank’s mobile app accidentally revealed credit card details of 17k customers

The data leakage comes on the same day as the Reserve Bank of India (RBI) restricted Kotak Mahindra Bank from onboarding customers over online/mobile...

Kamya Pandey1 day ago

News

Views: Response to NPCI CEO’s comments that what is not written in regulations is a no-go for fintech entities

NPCI CEO Dilip Asbe recently said that what is not written in regulations is a no-go for fintech entities. But following this advice could...

Sarvesh MathiFebruary 29, 2024

News

Views: The opportunities and challenges for PhonePe’s Indus Appstore

Notably, Indus Appstore will allow app developers to use third-party billing systems for in-app billing without having to pay any commission to Indus, a...

Sarvesh MathiFebruary 22, 2024

News

Views: Why Rapido is moving to subscription model

The existing commission-based model, which companies like Uber and Ola have used for a long time and still stick to, has received criticism from...

Sarvesh MathiFebruary 19, 2024

News

Views: Why PhonePe’s Indus Appstore can challenge Google Play’s dominance in India

Factors like Indus not charging developers any commission for in-app payments and antitrust orders issued by India's competition regulator against Google could contribute to...

Sarvesh MathiSeptember 25, 2023

News

Views: Open Source AI—A Nebulous Concept Bearing a Heavy Weight

Is open-sourcing of AI, and the use cases that come with it, a good starting point to discuss the responsibility and liability of AI?...

Guest AuthorSeptember 20, 2023

Please subscribe to MediaNama. Don't share prints and PDFs.

News

Search queries for international air tickets growing at 43% – Google

Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...

Sneha JohariMarch 23, 2016

Advert

Advertisement: 135 Digital Job Listings at JobNama – 9th June 2010

135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...

MedianamaJune 9, 2010

News

Twitter takes down tweets from MP, MLA, editor criticising handling of pandemic upon government request

By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...

Aroon DeepApril 24, 2021

News

Ola, Uber drivers say they are exhausted, fear being wiped out

Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...

Soumyarendra BarikFebruary 24, 2021

MediaNama

News

Views: Can generative AI collect our data from the Internet?

Latest Headlines

Free Reads

News

TikTok Criticises US Ban Or Divest Bill, Vows To Fight In Court

News

It will take a multi-year investment cycle before Meta’s AI offerings become profitable: Insights from Meta’s Earnings Call

News

ICICI bank’s mobile app accidentally revealed credit card details of 17k customers

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.

Views

News

Views: Response to NPCI CEO’s comments that what is not written in regulations is a no-go for fintech entities

News

Views: The opportunities and challenges for PhonePe’s Indus Appstore

News

Views: Why Rapido is moving to subscription model

News

Views: Why PhonePe’s Indus Appstore can challenge Google Play’s dominance in India

News

Views: Open Source AI—A Nebulous Concept Bearing a Heavy Weight

Please subscribe to MediaNama. Don't share prints and PDFs.

You May Also Like

News

Search queries for international air tickets growing at 43% – Google

Advert

Advertisement: 135 Digital Job Listings at JobNama – 9th June 2010

News

Twitter takes down tweets from MP, MLA, editor criticising handling of pandemic upon government request

News

Ola, Uber drivers say they are exhausted, fear being wiped out

Trending

Latest News