- Government has carte blanche: Both Personal Data Protection Bill, 2019 and the expert committee report on Non Personal Data give the government wide-ranging powers to collect and analyse data, without adequate safeguards
- GDPR not ‘revolutionary’: GDPR only evolved existing privacy and data protection laws, it did not radically alter how companies collect and use data
- Inferred data needs to be dealt specifically: There is a need to strenghten specific provisions to safeguard individual libery and community rights when it comes to inferred data
- Privacy within context of hierarchies: Need to think of privacy within contexts and in terms of hierarchies, when swathes of data are analysed by artificial intelligence and automated tools
- Balance utility and privacy: Need to balance the trade-offs between the utility of AI and protecting privacy and data
- Power structures need to be dealt with: Inherent political and economic power hirerachies between the state and citizens and within the private sector needs to be addressed
“Privacy is such a deeply held human need. In many ways the need for privacy is deeply tied to the ability to exercise one’s liberty. That’s the fundamental reason why one desires privacy. It’s freedom” said Rahul Panicker, chief research and innovation officer, Wadhwani Institute for Artificial Intelligence. “As data accumulates, we have more centralized data depositories and large centralized AI models that work off centralized or decentralized data. How does the concentration of power affect this balance that impinges on individual liberty?,” Panicker said.
He was speaking at MediaNama’s discussion on the Impact of Data Policies on Artificial Intelligence, in context of upcoming ecosystem changes and regulations, such as the Personal Data Protection Bill and the Non-Personal Data Framework, held on January 28. MediaNama hosted the discussion with support from Facebook, Microsoft and Flipkart. The Centre for Internet and Society was the community partner for these sessions. All quotes have been edited for clarity and brevity.
According to Panicker, there are trade-offs with AI, depending on the use-case, between the utility of machine-based decisions on large swathes of data, and the privacy implications of collecting data primarily for use in automated decision making. “There are areas where AI can also benefit privacy. For example, today most threat protection mechanisms rely on AI techniques to rapidly identify data leaks and other issues. There are algorithmic ways to address needs for privacy,” he said.
“There are techniques like differential privacy or federated learning or more recently split learning, that preserve privacy while allowing AI systems to be developed on top of data. Differential privacy, in fact, guarantees a performance on what the trade off is between reduction in performance versus degree to which privacy is preserved. Many of the challenges are also in institutional capacity. In the case of personal health data, there is a need to buildup institutional capacity in health ministries to handle sensitive data” —Rahul Panicker, chief research and innovation officer, Wadhwani AI
Issues with data and privacy laws
Power, rights and accountability: Arindrajit Basu, research manager at the Centre for Internet & Society said that to regulate AI, and to safeguard privacy of individuals and communities, the use-case needs be regulated across the entire data chain. “Whether it’s the data algorithms or the application, you have to apply your mind. In the case of predictive policing, the data is important. Because if my data is being used against me and the community I belong to, then obviously it needs to be regulated at the data level and also at the algorithm level,” he said.
Basu pointed out that Section 25 of the PDP Bill is a “tragedy” for individual rights and foreign policy. “As a nation that’s trying to get adequate status and trying to advocate for itself as a soft power in terms of our technology policy, we are allowing state intelligence agencies to basically have carte blanche access to data. That’s problematic,” he said.
“The operator or the beneficiary of an AI process or algorithm is not the same as the impacted party. In fact, there is a huge power differential between the beneficiary and the impacted party. There is very little even in terms of rights against the algorithm. I’m concerned that there aren’t actually rights for the data subjects that are applicable against the power structures, which includes delegation of control to the machine and also the political economy” — Arindrajit Basu, research manager, Centre for Internet & Society
Privacy as contextual integrity: According to Divij Joshi, researcher and Mozilla Tech Policy Fellow, the Supreme Court’s Puttaswamy judgement of 2017 recognised privacy as contextual integrity. “The idea is very simple right that, you give information or data within a particular context, and you expect when that information is being passed between any individuals or between institutions, that context will remain secure. I think this is at the core of many of the debates that we are having about how power plays out you know in the digital world and within the digital data.”
“This idea of new protections, due process protections against automated decisions that are made on the basis of data and information, has a component of data protection. But I think their idea is fairly broad. How do we align technologies that are making consequential decisions about us? I don’t think we need a separate law for AI. Data protection is a fairly small subset of that very larger problem, of how we regulate or recognise” —Divij Joshi, researcher and Mozilla Tech Policy Fellow
Impact of GDPR: While India awaits Parliamentary approval for the government’s personal data law, in addition to regulations on non-personal data, the European experience with the General Data Protection Regulations (GDPR), introduced in 2016, has been less than desired, said Frederike Kaltheuner, researcher and Mozilla Tech Policy Fellow. “GDPR is an evolution, not a revolution of the previous laws that existed. There was lots of great expectation that this would revolutionise or even challenge the surveillance capitalism-dominant business model of some of the world’s largest tech companies. But now two years down the road, it is fair to say that this has not happened,” she said.
The new law increased compliance and regulatory obligations on companies and organisations that process personal data, Kaltheuner said. “In theory, at least on paper, it strengthened the rights of the companies, it strengthened the rights of data subjects. It became really clear, that data rights are a really powerful tool,” she said. However, while the GDPR says that consent for sharing and using data should be informed, explicit and ambiguous, it is actually quite a high bar that is hardly ever met,” she said.
“We have lots of public authorities in Europe who are too afraid to use digital tools, because of GDPR. And then at the same time, there is a massive enforcement gap. I believe that data protection is a fundamental right, it is absolutely crucial in a data fight society. But it’s not the only legal instrument we need. It might not be the most effective one, because in a world that’s made of data, data protection authorities are tasked with regulating everything, and that exceeds their capacity and technical expertise,”—Frederike Kaltheuner, researcher and Mozilla Tech Policy Fellow
Inferred Data and the power of AI
Inferred data as personal data: Governing inferred data is “incredibly difficult”, felt Kaltheuner. “I think there’s no rational argument why inferred data should not be treated as personal data. Not all inferred data is personal data. But there’s a subject of inferred data that clearly falls under the definition. The risks associated with inferred data are high, if not higher than the data that is collected about me because as part of rights is the right to know what inferred data about me exists in the first place and the right to get a copy of this data,” she said.
“There are many workarounds as a company you can use. ‘It’s not an inference about you. We’re just putting you in a group with other people.’ That is one sort of legal workaround. From a principled perspective, inferred data can absolutely be personal data. Whether you have inferred my health status or whether you have automatically collected data about my health, the risk for inaccuracies are lot higher. I think inaccurate data is as dangerous as accurate data,” said Kaltheuner.
“AI is incredible at recognizing patterns. We’re dealing with a spectrum that ranges from non-controversial use of AI to pseudoscience, dangerous, inherently racist pseudoscience. I think what is most useful is, is pushing companies and government applications to be more on responsible end of this, while also talking about the broader power dynamics that are at stake,”—Frederike Kaltheuner, researcher and Mozilla Tech Policy Fellow
Impact of AI and inferred data: Joshi said there are two important questions to ask: what is the implication of data driven decisions on the scope of personal autonomy and, how does that affect or limit a person’s privacy? And what impact do inferences have on group privacy? “When it comes to groups, a lot of these models are essentially deriving statistics from group behavior, they are finding patterns among large sets of people, and deriving attributes of those groups and then applying it to different groups,” he said.
“Our constitutional literature, as well as a lot of theoretical literature around privacy, states that privacy is the ability to protect personal autonomy and protect your choices. A lot of automated technologies, both in the material world and online, keep modifying your behavior according to the information that they’re collecting about you. This is information that is necessarily computable. The implication of that is that you need to start modifying your behaviour in order to be able to be recognized by these smart or automated technologies”— Divij Joshi, researcher and Mozilla Tech Policy Fellow
Mosaic theory and public data: The draft PDP Bill says that the state can access vast amounts of non-personal data without considering the nuances of what it really means to conduct evidence-based policymaking, said Basu. “If my personal data or even data inferred from my personal data is being utilised against myself and the community on the basis of evidence-based policymaking, and then that data has been used to discriminate against me, it is obviously that is a huge problem. When multiple facts are put together — and that’s why it’s called the mosaic theory —, it is considered a privacy violation. If there are enough inferences that are made about you from data in the public domain and there are enough data points, it is no longer non-personal data,” he said.
“The Indian Constitution calls for the protection of civil liberties and also calls for the state to use the constitution as a vehicle of structural adjustment and breaking down of power asymmetries. The constitution already exists as the model that’s supposed to regulate issues like predictive policing. The sad part is that we haven’t actually used it or used it well enough” — Arindrajit Basu, research manager, Centre for Internet & Society
Personally Identifiable Information: Both Joshi and Arindrajit said that since the draft PDP and NPD expert committee recognise personally identifiable information (PII) as a specific category of data, there is some safeguard. “One thing that the PDP Bill does better than the GDPR is that it recognises inferred data as PII. We don’t have a right to explanation like the GDPR does, but I guess this is one better thing than the GDPR,” Basu said.
AI-lending and privacy of borrowers
AI can address bias in lending: According to Abhishek Agarwal, co-founder and chief executive officer, CreditVidya a lot of companies who say they use AI, don’t actually use AI and actually do linear regressions to provide credit-scores. “From a credit perspective, we are far behind in terms of what AI can possibly achieve. If and when we get there, some of the concerns surrounding biases in the algorithm have to be looked into. The definition of consistency and bias should be addressed in the input that goes into the machine, because what you feed into the machine defines the bias,” he said.
“There are more human biases involved in Indian lending ecosystem today, more than machines. Now if you go to a bank, the branch manager will look into you and he will make more a biased judgement than a machine would. In fact, the industry is moving into a machine-based system to remove biases from a branch manager. Machines can make more consistent decisions and make more unbiased decisions especially in the Indian context. If we were forced to only put in transactional data into the machine and nothing else, that’s really straight forward because the machine can give you a more unbiased and consistent decision on, let’s say a bank statement, than a human can,”—Abhishek Agarwal, co-founder and chief executive officer, CreditVidya
Lenders analyse only consented data: Meghna Suryakumar, founder and chief executive officer of Crediwatch, said that the fundamental tenet of the PDP Bill is that data can only be collected for a legitimate purposes. “So as fintechs, we are collecting data for a specific purpose like giving loans. When it comes to an individual, you absolutely need to respect their right to privacy but if you are collecting non-personal or data which is not in the public domain, I would not call it personal data but data not in the public domain,” she said.
“If the individual is willing to share data with their consent, because they want a loan, then they should be allowed to it. All of the data that would be taken between a borrower and a lender, whether today it is there or not, has always been based on consent. We’re trying to replace data as an asset class, so that marginalised people don’t have to mortgage their property for a loan. So the data is taken on for a valid purpose and as long as this is based on consent, we can enable them to get a better loan at a lower interest rate to improve their lives or their business, which is a justified cause”— Meghna Suryakumar, founder and chief executive officer, Crediwatch
Facial and behavioural analysis: Several fintech lenders and alternative credit scoring companies are graduating to use AI tools to assess their borrowers’ credit worthiness and the probability of default. One such lender is EarlySalary who are working on deploying a facial recognition tool to assess their borrowers’ behavioural pattern, the company’s head of analytics Balakrishnan Narayanan said.”We have images of customers, existing customers who have borrowed from us and we try to learn patterns from that data point,” he said.
“To understand the intention of the customer we look at their keyboard strokes, how quickly the customer is typing, whether it is the robot that is typing the form or is an human who is sitting behind and typing, and facial recognition. We are trying to move into understanding the behavioral aspects of the customers, rather than capturing an alternate data like social media activitiy. There is a thin line between personal and non-personal data here, which lenders need to safeguard”—Balakrishnan Narayanan, head of analytics, EarlySalary
Questioning the science: While fintechs experiment with new AI-based tools for voice or facial recognition that can identity behavioural patterns, Vidushi Marda, digital programme officer at ARTICLE 19, questioned the assumptions these systems are built on. “The problem is that with building technologies, we are making consequential decisions about people’s lives. We decide whether people can pay a college loan or not based on extremely problematic science. And instead of thinking about the foundations in which these systems are built, we end up talking about optimisation problems. Those are important questions, but I think spending a little bit of time also thinking about what the underlying issue or assumption we are working with is really important,” she said.
“This distinction between personal and non-personal to me doesn’t actually make sense from a regulatory point of view. Everything that is not personal data ends up being non-personal data. The lines between personal and non-personal are extremely thin, but I think what is more important is that this assumption that anonymisation suddenly is this magic wand, that lets us do things, is actually not true at all,”— Vidushi Marda, digital programme officer, ARTICLE 19
Also in this series: