“From a customer perspective, the entire purpose of data protection will be defeated if inferred data is not regulated. Because that’s literally the biggest tool in the hands of technology companies, and that’s where most privacy violations are likely to happen,” Nikhil Narendran, partner at Trilegal, said. Narendran was speaking at our discussion on June 26 on the impact of the Personal Data Protection Bill, 2019, on cloud and telecom services, supported by Microsoft and Google.

The definition of personal data in the bill includes “inferred data” within it. “But I completely understand the point with respect to the fact that it brings up practical difficulties while implementing or while creating new algorithms or during the data processing for that matter,” he said.

Definition of personal data:

“personal data” means data about or relating to a natural person who is directly or indirectly identifiable, having regard to any characteristic, trait, attribute or any other feature of the identity of such natural person, whether online or offline, or any combination of such features with any other information, and shall include any inference drawn from such data for the purpose of profiling;

Should inferred data be part of the bill? How can it be regulated?

As Microsoft’s director of government affairs and public policy S. Chandrasekhar explained, inferred data can be generated simply “while monitoring whether a computer has been taken control of by bots. IoT devices, AI and cognitive services can also generate inferred data”. According to him, it should completely be dropped from the purview of the bill, “because while the intent might be very noble, the way it will play out might become nightmarish”.

Tarun Dua, chairman and managing director of E2E Networks agreed, explaining that in newer techniques of artificial intelligence, one is primarily like a customer that is feeding an artificially intelligent model their own data. “The inferred data becomes the part of this retrained model forever, and this cannot be deleted forgotten, unlearned, and becomes part of the model repository forever. Inferred data would be very very hard to separate out,” he said. “Specific exceptions can probably be made. These exceptions could be for when you are trying to infer about personalized data, and then actually verbalizing it, in say “what books and movie do you like?”. However, he concluded that inferred data should not be included within the definition of personal data.

There was general agreement that inferred data should not be included, as it gives rise to complications in implementation. In fact, Udbhav Tiwari, public policy advisor at Mozilla, pointed out that inferred data should not be part of personal data in the bill as “many of the protections that the bill gives to data can’t apply to inferred data“, “because inferred data is more often than not insights, it’s not necessarily data purely as we understand it“.

Approaches for the regulation of inferred data

“We’ve already had some conversations on anonymous data and whether anonymous data can even exist in the first place, the possible risks that can occur from reidentification, and how the bills sort of specifically criminalizes that,” Mozilla’s Udbhav Tiwari said. However, this does not necessarily mean that inferred data should not be regulated at all, in Tiwari’s opinion. Instead of not regulating inferred data at all, a delicate balancing act needs to be struck, he added.

Even in the European Union’s report on the sucess of the GDPR lays out concerns on anonymized data and inferred data, and how the fact that they’ve been completely excluded from the GDPR’s purview has led to difficulties in enforcement, he pointed out.

“We can have a conversation about how to best regulate inferred data, a conversation that AI regulation also tends to deal with, with regard to transparency and accountability,” he said. Even then, keeping inferred data within the scope of the PDP Bill “is something we should be wary of”. Instead, it’s something that the Data Protection Authority should be prepared for, Tiwari said, adding that, “But I don’t think we need to see changes in the bill per se to be able to deal with that possibility”.

A nuanced approach can be taken towards inferred data, Narendran suggested. 

“The reason why this problem is really significant under the PDP bill, is because of the fact that we don’t have legitimate interest processing unlike under the GDPR. Because the GDPR currently has a contractual necessity clause and a legitimate business interest processing clause, you don’t need to necessarily go and take the consent of each and every party, because it [inferred data] a very natural outcome of the processing”

A significant problem can be solved by having such a clause, he said.

Even then, questions about data minimisation and other aspects with respect to inferred data will remain, he said. “The legitimate business processing is not necessarily an answer to that. But I think of course we need to look further to solve those issues as well.”

Issues with cloud service providers being mandated to share ‘non-personal data’

A contentious provision of the Personal Data Protection Bill, 2019, is that it allows the government to requisition anonymised personal data or non-personal data from companies for targeted delivery of services or for policy-making.

Section 91:

(2) The Central Government may, in consultation with the Authority, direct any data fiduciary or data processor to provide any personal data anonymised or other non-personal data to enable better targeting of delivery of services or formulation of evidence-based policies by the Central Government, in such manner as may be prescribed. Explanation.—For the purposes of this sub-section, the expression “non-personal data” means the data other than personal data.

Cloud services have limited access to data

It’s crucial to note that even if cloud service providers get access to data, it’s in a form that does not reveal what the customer has stored. “The data controllers who take care of the data handling process storage or analytics should ensure that when they pass on data to processers, they have specific contractual obligations to ensure that the processor cannot see the data, in order to guarantee privacy, Venkatesh Krishnamoorthy, country manager for India, BSA The Software Alliance said. BSA’s global members include Salesforce, Autodesk, and Microsoft. “So if I’m using a cloud storage facility, I have very little idea of the data that I am handling,” he added.

He went to explain:

“As a rudimentary example, say a cloud service provider is storing four columns of data. One column is sensitive personal information like health data, the other is inferred data, and say two other columns are non-sensitive non-critical and simple data. As a cloud service provider, I have no idea how you the data controller is categorizing all these classes of data. All I get is a dump. At times I provide an analytic service, which lets me access certain aspects, but this is based on the data controller as well”

Therefore, it is out of the question for a cloud service provider to even be in a position to infer what part of the data is non-personal data, Krishnamoorthy said. “On top of it, if the government does ask for it, it will be impossible or more importantly illegal to give it. There have been some recommendations for filtering techniques on how this kind of data could easily be identified. “That’s also definitely illegal if cloud service providers are doing it without the knowledge of the controller’s or the data subject.”

Shweta Rajpal Kohli, a Salesforce country director for government affairs and public policy (India & South Asia), said data processors such as Salesforce have no control and do not intend to have any control over the data being provided by their customers and data controllers. “Putting an obligation on data processors to first understand the categorization of the data, and then to provide the data to the government, is almost impossible to comply with. And I am sure that it will be challenged.”

I hope that the [Joint Parliamentary] Committee looking into the bill is going to understand that this is something which is impossible for us to go through. The government has formed an entire mechanism, with the Gopalakrishnan Committee [for Non-Personal Data] to speciically look into the issue of non-personal data. Now to mix issues and include non-personal data as part of the ill is to complicate issues further.

We would urge the government is keep non-personal data aside, and try and understand that it’s a different animal altogether. The government’s intent may sound as if all they are trying to get is anonymized data for community benefit purposes, and that is understandable. But the purpose and manner in which they [the government] is going about it is very different, and I don’t think it’s achieving the purpose, rather it’s complicating issues a lot more.

Shweta Rajpal Kohli, Salesforce

How the govt can deploy anonymisation, the need to hold them accountable

More than a year after the Road Transport Ministry enacted the Bulk Data Sharing policy, it recently decided to scrap it altogether, citing privacy concerns. The policy, released in March 2019, allowed the Ministry to sell Vahan (vehicles registration) and Sarthi (drivers licence) databases to companies, and educational institutions. What implications would a bulk data sharing policy, such as the Road Transport Ministry’s, have under the data protection framework? Can the government use anonymisation provisions to operate databases such as the Vahan database, and then create analytics around it?

  • “The bill gives the government vast amounts of power, and pretty much nothing applies to the government  The government can literally create any number of databases, and the bill gives them some sort of legislative sanction to do so. To answer your questions, I think we’ll see more of these Vahan database incidents happening,” Narendran said. “With respect to anonymization, I’m sure they’re gonna rely on anonymisation provisions to strengthen the legitimacy around the kind of databases that they’re going to create, but I’m not sure that’s going to be the final solution in terms of transparency” because the issue ultimately is that there is no way to hold the government accountable.

A classic example is the Aarogya Setu project, wherein the protocol looks beautiful in theory it’s probably one of the best per global standards, Narendran said. A report that I worked on found that Aarogya Setu pretty much comes up to global standards, “but the point is this no way that you can hold the government accountable. You cannot technically challenge the government’s decisions, there is no transparency,” he added. “Unless the bill has some sort of provision which can be used to hold the government accountable, it’s not going to really help us as uh a country or a democracy,” he concluded.

Recommendations to the government

  1. Remove inferred data from the ambit of personal data: Inclusion of inferred data significantly broadens the definition of personal data and does not add to privacy protections for consumers. Cloud service providers are cannot provide complete privacy for the data they are trying to protect. Inferences should be excluded from personal data, it has not been done in GDPR. There are various obvious ramifications and it is not in the larger interest of the overall architecture being built.
  2. The DPA should be cognizant of the issues that will arise out of inferred data. Instead of not regulating inferred data at all, a delicate balancing act could be struck. Instead of including it in the bill, it’s something that the Data Protection Authority should be prepared for.
  3. Remove Section 91 from the bill: Cloud service provider don’t have visibility on the data they are handling for their customers. It would be impossible for them to comply with any government demands for anonymised personal data or non-personal data. Instead, the framework for dealing with NPD should be left to the Non-Personal Data committee, which was formed specifically for this. Putting an obligation on data processors to understand the categorization of the data, and then provide the data to the government is almost impossible to comply with.

Read our complete coverage of the discussion here.