Ad: India’s Data Protection Bill is here, and your business needs to adapt. K&S Digiprotect, with its team of data protection experts, offers compliance services tailored to help you adapt to the new regulations, safeguard your data and build trust with your customers. Contact us now!
Under India’s Digital Personal Data Protection (DPDP) Bill, 2023, AI services like OpenAI’s ChatGPT and Google Bard might be able to scrape publicly available personal data from the internet to train their models without any consent or without adhering to any other provisions of the Bill.
Clause 3(c)(ii) of the Bill states the Act shall not apply to personal data that is made or caused to be made publicly available by the user to whom such personal data relates.
As an example, the Bill illustrated that if an individual, while blogging her views, has publicly made available her personal data on social media, then processing of that data won’t come under the purview of the data protection law.
Currently, AI companies don’t need any user consent to scrape the personal data of Indian citizens because there is no data protection law in place to require this and this status quo might continue because of the exemption to publicly available personal data.
Article continues below ⬇, you might also want to read:
- India’s Data Protection Bill Tabled In The Lok Sabha: MPs Voice Concern
- Summary: India’s Digital Personal Data Protection (DPDP) Bill, 2023
- How India’s Digital Personal Data Protection Bill, 2023, Deals With Cross-Border Transfers Of Personal Data
- India’s Digital Personal Data Protection Bill, 2023: What Privacy Rights Do Individuals Have?
AI companies largely rely on publicly available data to train their models. Google, for example, recently pitched to web publishers an equivalent of robots.txt to train its AI models. In another example, Twitter limited the number of tweets users can see in a day to prevent the scraping of data on their platform by AI bots.
Countries like Italy, Canada, Japan and the US are investigating OpenAI’s ChatGPT for scraping data from the internet because questions have been raised on how this might infringe on privacy and copyright. The US FTC, for example, asked OpenAI to explain all the types of personal data collected by the company, how it’s used, what’s the source, who all have access to it, and how the company prevents personal information from becoming a part of its training data.
Read the Bill Summary here.
Even if companies were to obtain consent, it might not be an uphill task for some companies like Google, whereas others like OpenAI will face a challenge. For instance, Google updated its privacy policy earlier this month stating that the company “may collect information that’s publicly available online or from other public sources to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.” Any user who wants to use Gmail, Google Search, Google Maps, etc. has to consent to this privacy policy. And since a vast majority of people use these services of Google, they effectively give Google consent to train AI models with the data they provide. The same logic would apply to other Big Tech companies like Microsoft, Meta, etc. OpenAI, on the other hand, might have a tougher time getting consent because it does not offer other services like email or search like Google does where it can slip in a clause in the privacy policy. It will have to instead devise another way to get consent from users.
A previous version of the data protection bill, the Digital Personal Data Protection Bill, 2022, also allowed the “processing of publicly available personal data” without consent from users if the processing is done in the “public interest.” One of the grounds that qualified as “public interest” was “preventing the dissemination of false statements of fact,” which could’ve been used by AI companies to justify the scraping of data as they could argue that training their models on this data helps them provide more accurate information to users.
STAY ON TOP OF TECH POLICY: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!
Update (August 3, 6:55 pm): Rewrote the post after re-examining one of the clauses in the Bill that allows publicly available personal data to stay out of the ambit of the Bill.
