A number of speakers at MediaNama’s discussion on the Governance of Non-Personal Data, held on August 6, pointed out that the Non-Personal Data Report does not really establish why there is an attempt to regulate non-personal data, and what benefits would arise from it. The report only “assumes” that “data has this intrinsic value” that can be of use to Indians, to the government, to the private sector, and to the individuals. However, that intrinsic value is never explained in the report, a number of speakers remarked.

The discussion was held with support from Centre for Communication Governance (NLU Delhi), Facebook and FTI Consulting. The discussion on data trusts was held under the Chatham House Rule. All quotes have been edited for clarity, brevity and anonymity.

Report does not justify the need for a separate non-personal data regulation

A speaker explained that the leap from “data as an economic good” to the proposed NPD framework misses a few steps since alternate methods exist for most of the “problems” the report attempts to solve. For instance, to incentivise innovation and to create economic value from data, frameworks exist within the competition law and the intellectual property rights regime, they said.

To deal with harms that can potentially arise form use of community data, the Personal Data Protection Bill, is already looking at some of them. “Concerns around collective community harms could be included within the PDP Bill itself,” they proposed.

The technical architecture proposed in the report may also “collapse on itself” as most data sets today are mixed data sets, that is an inextricable mix of personal and non-personal data, the speaker continued. “Even if the non-personal data in question is connected to traffic rights — an example that is often brought up —, it will still have personal data underlying it. So, even if a person did not create or was not the data source, he would have personally identifiable information or related information. So, even if it doesn’t identify you as a person, it could relate to you and have consequences for you,” they said.

This over-regulation, they concluded, could lead to regulatory arbitrage wherein companies could try and be governed as, say, a data fiduciary under the PDP Bill or a data custodian under the NPD framework depending on which is cheaper.

Problems with the report’s methodology

A participant pointed out that to conflate regulation of non-personal data and community data with an attempt to curtail the dominance of Big Tech is “misleading” and “diverts attention” away from the issues. It also makes people who critique this idea seem like “bad people” who want Big Tech dominance to continue, they said.

Another speaker said that the report has not taken into account existing policies and frameworks that were formulated to deal with non-personal data and data sharing, such as the National Data Sharing and Access Policy, and the data.gov.in portal. The report, in fact, appears to be indicative of a larger trend at play in terms of policy, another speaker said. “Several policy documents in the last three years have talked about sharing data with the Indian government, with Indian start-ups, such as in the earlier versions of the e-commerce policy draft, Economic Survey, etc.,” they said.

  1. Assumptions have not been tested: A number of speakers voiced their concerns that the assumptions in the report have not been tested at all. Instead of “unpacking the underlying assumptions”, the report has made arguments on the basis of a few use cases, a speaker argued. The report does not propose a framework to identity the actual use cases where non-personal data and its regulation would be useful, another speaker pointed out. “The report does not go into the specifics of how that’s going to happen without going into specifics of implementation or reasons for collection of data,” the latter speaker said.
  1. Lack of checks and balances: A discussant pointed out that the report does not talk about any checks and balances in the system and just assumes that the new authority would come up with the necessary regulation.
  1. No explanation of how data is a good: The report does not delve deeper into how data is an economic good. “A good could be a purely private good, something which is non-rival and non-excludable,” a speaker said. The report admits that data, by nature, is non-rivalrous but it is an excludable right. The problem occurs because rival and non-rival goods are not a binary, but a spectrum, as per the speaker. “If everybody uses what I am producing and freely uses it with no benefit to me, then that changes my incentives to supply that good. So, the supply of that will get depleted eventually,” they said.

Understanding how data is used in digital economy

To resolve one of the fundamental problems of the report, at least two participants explained how data is actually used in a digital economy. “You can’t simply look at non-personal data as a category for regulation and ignore all the other parts of the digital economy which have gone into creating that data. You need to look at platforms, the infrastructure and the business models that go into the very creation of this category of non-personal data,” a speaker said. If this is not done, there is the danger of co-opting the model of surveillance/informational capitalism “which we have recognised as the source of inequities and as the source of kind of problems in the digital economy,” they said.

Data is not neutral: They cited this paper by Angela Xiao Wu and Harsh Taneja that talks about how “data is ultimately just an abstraction that is shaped by the context in which it is created and for what it’s created right”. They explained that Uber, for instance, does not give you an accurate normative perspective of what city streets look like; it gives you a perspective of what city streets look like under the influence of Uber surge pricing algorithm so that they can get the most value out of the commuters in those streets. “Now, if you want to take all of that information and then input into city planning for traffic management or for whatever you want you know, over here it mentions like, how do we make sure that senior citizens are safe on the streets, it’s not going to reflect the values of democratic governments or what people actually want, it’s going to reflect why Uber has collected this data and what it’s been using it for. That’s one way, that’s one problem of simply assuming that data is this neutral resource that can be mined and extracted for public use or even shifted out of context and reused elsewhere,” they said.

They also cited this paper by Linnet Taylor and Christine Richter to show how when IBM tried to deploy a big data system to regulate water supply and management in Bangalore, similar problems cropped up even through IBM’s Smart City system was purposefully built for smart cities. “But problems around what kind of technology is recognised, what kinds of water pipes and what kinds of houses actually have access to say technologies of water measurement, etc. cropped up,” they said. It is thus necessary to consider how the category of non-personal data is constituted and if it is even useful to share it.

Raw data is an oxymoron: One of the speakers explained that “data is not something that naturally exists out in the world and is available for us to take”. “At every moment it’s an abstraction that’s created by someone for a specific purpose,” they said. At no point is there a category such as raw data, they stressed. It is shaped by standard-making processes of either public or private players, they said. Another speaker pointed out that the committee’s assumption that raw, factual data just exists, and is ripe for anyone’s plucking is an oversight. The report proposes that companies should not be compensated for collecting factual data.

Inferred data is based on probabilities: The speaker explained that inferred data is based on probability of what may be true, not on the basis of what is measured or captured by standardised technologies. This, then allows companies to categorise people on the basis of what else may be true about them, what other communities they may be part of, what other interests and attributes and habits they may have, and this is what is of value to companies, the speaker explained. 

Defining ownership without considering power relations is a red flag: The report has proposed different ownership patterns for the three kinds of NPD that it identifies. However, one speaker said that this approach, where an ex nihilo existence of data is assumed for purposes of data ownership is a problem. “We need to critically assess where this data is coming from, who is making it and for what purpose, else it will further entrench the power of these platforms, and the kind of inequitable system that we have already in place,” they said.

Report’s definition of community is too superficial

The report defines community as “any group of people that are bound by common interests and purposes, and involved in social and/or economic interactions. It could be a geographic community, a community by life, livelihood, economic interactions or other social interests and objectives, and/or an entirely virtual community”. This according to a number of people is not a good characterisation of a community — both in the analog world, and within the digital economy.

Communities in the analog world require modicum of self-awareness: Outside of this report, a speaker pointed out, a community is thought of as “a group of people who are self-aware that they have membership of the group or they articulate such membership”. There is an intersectionality associated with a being a community, as in, people can inhabit multiple communities at once, something the report does not consider. Another speaker cited Elinor Ostrom’s work on the commons to argue what while defining a community, certain attributes are relevant including history of prior interaction, some sort of homogeneity, shared key attributes, shared knowledge, shared social capital, etc. Adivasi communities identified under the Forest Rights Act are such a community, they said.

Overlaps between public, private and community personal data:  Although the report delineates three categories of NPD — public, private and community —, it does not consider overlaps between them. “It’s likely that everything will end up being a community data by this current definition,” a speaker said.

Digital economy creates ‘ad hoc’ communities: One of the speakers explained that unlike analog communities, data systems create “ad hoc” communities where any semblance of self-awareness is not possible. These ad hoc communities “are often based on inferences and happening inside databases that the person itself is not aware of”, they said. These grouping are not dependant on extant or existing identities. Such grouping are not determined by nationality, domicile, etc. but would like: “you are somebody between the ages of probably twenty-five and thirty-five and you have liked one hundred such consumer products and therefore you are a part of the community that likes one hundred similar products and is aged such and such.”

Lack of protections for particular communities: Given this lack of awareness of belonging to such a community, it becomes very difficult to enforce community rights on such members. A speaker pointed out that the report report doesn’t delve into the question of what kind of communities require specific kinds of protections. “And the definition is broad enough to capture both of these ad hoc communities which are created at every moment of data analysis, as well as existing communities which may require special, specialised protections as we have under Article 15 and 16 of the constitution,” they said.

Privacy acknowledges group privacy but doesn’t address harms associated with it: Surveillance capitalism, one speaker said, derive value from targeting an individual with certain habits, not the individual. An Eve is not as important as a woman who likes to buy red socks, history books, and camping gear. “It doesn’t really matter anymore that they can’t identify me as long as they have specific relevant categories of data for me which can otherwise be anonymous according to which they can categorise me. So, that’s precisely where the harm rises,” they said.

Consent has not been unpacked

Multiple speakers pointed out that the report does not unpack the question of consent adequately. “At the moment most users believe that they are consenting for data to be used to actually deliver a service or a product while the value of data does not come from that. The value of data comes from making predictions about our behaviour or actually influencing our behaviour in real time,” one of the speakers said

Problem with definition of data trustees

The concept of data trustees is a deeply problematic one within the report. The data principal group/community will exercise its data rights through an appropriate community data trustee. In the case of community data, unlike personal data where an individual can directly exercise control over her data, the concept of trustee for community data comes in, who would exercise such rights on the behalf of the community.

The first problem is that the member may not know that they are a part of a community, a speaker pointed out. In that scenario, how is the member supposed to participate in process of sharing their non-personal data?

The question of who should represent a community is a particularly vexed one. The report assumes that a ministry or government department would be the default data trustee. This has its own set of harms.

For instance, if an individual belongs to a group of women who buy vibrators online, is the state really the best and most sympathetic entity to represent their concerns?

The report also doesn’t acknowledge the intersectional nature of people’s identities and how sub communities can exist within larger communities, leading to intra-community conflicts. For instance, who is the best entity to represent the concerns of a women within a particular religious group whose interests may not be as dominant as the larger interests of that specific community?

Overlapping communities is another issue. For instance, if you have education data of school students, who is the data trustee — the Ministry of Human Resource Development or the Ministry of Women and Child Development?

Problems with the stakeholder ecosystem

The stakeholder ecosystem that the report identifies, with several new entities such as data trusts, data trustees, data exchanges, etc., needs more elucidation. “There is a lot of overlap in the way

stakeholders have been identified and we also are depending heavily on this idea of trust, on the idea of duty of care, on the idea of fiduciaries,” a speaker said.

This regime of duty of care or a fiduciary duties means that that there should be no conflict of interest between the person who is a fiduciary and the person who is the principal agent, the speaker said. Thus, “the fiduciary should not be making any money of the specific service that the principal is asking them to undertake for them other than whatever service fee there is possible,” they said. That is where “the entire model just breaks down” since there are no checks and balances, they said.

Relationship between community and data trustee not fleshed out

The report proposes the concept of beneficial ownership for community wherein the community would have economic rights over the NPD but the decisions would be taken by the data trustee. How the data trustee would consult with the community or how it would be held accountable for decisions takes on behalf of the community have also not been specified in the report.

Identification of community data does not mean that the community is “suddenly massively empowered”, a discussant said. Furthermore, this data trustee does not own the data trust and does not have any actual rights against the data trust except to ask for the enforcement of safeguards, another speaker pointed out. “In effect, I am not able to see, in this entire structure, what power even that data trustee who may not be representing the community has,” the latter speaker said.

Potential ways to give communities control over their data

Case: Indigenous peoples in Canada
In Canada, indigenous peoples have special status within the Canadian state owing to the painful legacy of colonialism. They often have different governance structures over which the Canadian government has no control. “They are making it very clear that community data for them means data that can be collected because they feel it’s important for them,” the speaker said. The communities have control over how the data is collected and used

Case: Representation by consumer groups before TRAI
Another speaker gave the example of consumers as a community. The Telecom Regulatory Authority of India (TRAI) has a mechanism where “they allow these NGOs which are supposed to represent the interests of consumer groups to be enrolled with the authority, and when issues related to consumers come up, these agencies are supposed to represent their interest”. There are, of course, concerns about the state’s own incentive for engaging in such work, and of its agents, but it could still work as a template, the speaker said.

Data exchanges run the risk of excluding the marginalised

A participant pointed out that by engaging in data trade, wherein market forces determine the price of data, chances are that most valuable data is what most people would be interested in. “What happens to the rest of it right, what happens to the data that belongs to people or groups of people or whatever you want to call it, NPD or PD or whatever else, that people don’t want. What happens to the effort, especially in India, where everything is being run by the state? Will those communities, will that data then not be collected at all?” they asked.

A speaker pointed out that data’s value cannot always be priced. “Non-price aspects of data are often the most determinative factor in terms of benefits or harms,” they said. Another speaker pointed out that the report doesn’t consider the fact that “you are not selling the data, you are selling underlying what the data represents”. “Data itself, like in a big data economy, serves as a signal like price used to be signals for commodities,” they explained.

 Read our complete coverage from the two-day discussion here.