By Siddharth Sonkar and Jyotsna Jayaram
Last month, the Gopalakrishnan Committee released the revised report (Report) on the regulatory framework governing non-personal data (NPD) for public consultation. While the European Union regulates the flow of non-personal data across borders of different states, through an unprecedented framework, India may become the first to regulate the flow of NPD within national borders.
The Report defines non-personal data as data which is without any personal data (PD). While the Report does not define PD, the Personal Data Protection (PDP) Bill defines PD to include inter alia any information which is directly or indirectly capable of identifying an individual. Significantly, the Report considers PD which has been ‘anonymised’ (i.e. irreversibly de-identified) as NPD.
Perhaps recognising the fuzzy distinction between what constitutes PD and NPD, the Report remains silent on the precise distinction between the two categories of information. Instead, the Report acknowledges that there can exist ‘mixed datasets’ or datasets which have combinations of both PD as well as NPD. Mixed datasets which are inextricably linked to PD would be governed by the PDP Bill instead of the NPD framework.
Mandatory Sharing of Non-Personal Data
The Report seeks to enable the sharing of NPD for defined purposes with appropriate safeguards. Further, NPD may be required to be mandatorily shared by private entities if the request is made by another entity (state or private) for a ‘public purpose’ in order to ensure a greater degree of data-drivenness.
Outside the scope of public purpose, however, the Report does not envisage mandatory data-sharing. To further ensure that the construct of public purpose is flexible and adaptive, the Report understands the concept broadly to include the creation of high value datasets (HVDs) for public good. The Report hopes that the Indian Government identifies HVD domains such as healthcare, geospatial or transportation data. An example of an HVD platform would be the India Urban Data Exchange (IUDX), an open source platform enabling researchers to share smart city data with each other to unlock the value of this data.
Losing the Incentive to Anonymise
While the obligation to share is intended to benefit the community at large, it could lead to potentially inadvertent consequences. The obligation to mandatorily share datasets for public purpose under certain circumstances may discourage businesses from wanting the NPD framework to apply to them.
As a result, businesses may cherry-pick whether or not they want to anonymise PD based on their willingness to expose their datasets to mandatory sharing. Further, their incentive to anonymise may reduce since businesses would lose their exclusive rights over access datasets, i.e. the incentive to invest significantly and innovate to create high value datasets in the first place. The absence of such incentive could encourage regulatory arbitrage, i.e. endeavours to escape associated obligations under the NPD framework.
Businesses could adopt either of two practices: (1) they could actively choose not to anonymise PD, so that the information remains within the scope of the PDP Bill; (2) based on (1), they could claim that their datasets (often as a result of deciding not to anonymise PD) constitute ‘mixed datasets’ inextricably linked to PD. The Report does not define what constitutes a mixed dataset which is inextricably linked to PD. Even the regulatory framework in the EU, from which the term is borrowed, does not define the meaning of being ‘inextricably linked’ to PD.
While the European Commission’s Practical Guidance acknowledges the absence of a definition of ‘inextricably linked’, it suggests that for practical purposes, the phrase could refer to a situation where a “dataset contains personal data as well as non-personal data and separating the two would either be impossible or considered by the controller to be economically inefficient or not technically feasible”. These terms, however, may be extremely subjective in terms of interpretation, leading to lack of clarity in regulation.
The absence of clear demarcation of PD from NPD coupled with the undefined construct of an inextricably linked mixed dataset undermine the larger purpose of the NPD framework, which is to encourage access to information through regulation. Instead, data subjects may begin to exercise a reduced degree of informational privacy, since relatively more NPD would consciously not be anonymised by businesses.
Anonymised PD or NPD however, even if there exists some degree of risk relating to re-identification, effectively safeguards privacy. Scholars have pointed out that anonymisation is a significant tool in encouraging businesses to operate in a privacy-compatible manner. The obligation to mandatorily share NPD, therefore, could not just cause detriment to businesses but also data subjects who may begin to experience a greater risk of re-identification and an increased potential of associated harms.
Is Self-Regulation the Ideal Way Forward?
In the absence of a clear mandate to share, however, businesses may also not have the requisite incentive to share valuable datasets they have created through their efforts. Incentives to voluntarily share NPD could however, be introduced otherwise, such as through waivers of specific conditions under state licenses or registrations, tax deductions, etc. This could potentially ensure a balance between the need to encourage sharing of HVDs while also retaining the incentive to anonymise. This would ensure that efforts to encourage sharing of NPD do not have the inadvertent consequences of encouraging regulatory arbitrage and hurting privacy.
In the alternative, the Committee could also consider exclusivity over NPD for a limited period of time (as in the case of intellectual property) before the mandate to share applies. In other words, before finalising the obligation to mandatorily share NPD, we must also consider our alternatives.
Since the harms of potential risks of reidentification associated with insufficiently anonymised PD may outweigh the benefits of mandatory sharing, it is hoped that this regulatory endeavour does not inadvertently encourage regulatory arbitrage.
Jyotsna Jayaram is a Counsel at the Technology Media and Telecommunications (TMT) team at Trilegal. Siddharth Sonkar is an Associate at the TMT team at Trilegal.
- #NAMA: What Does The Non-Personal Data Framework Mean For Businesses? Will It ‘Unlock’ Economic Potential Of Data?
- #NAMA: Issues With Definition Of Communities, Public Good, And Unabated Sovereign Access To Non-Personal Data
- #NAMA: What Would A Non Personal Data Authority’s Role Be? Is One Even Required?