#Ad: MediaNama’s is holding a round table discussion on the governance of non-personal data on August 6 and 7, 2020. You can apply to attend here.

The Committee of Experts on Non-Personal Data Governance Framework has recommended that a separate legislation be formulated to govern non-personal data, and a new regulatory body, while also addressing a long standing question: what exactly is Non-Personal Data?

In its draft report, released on July 12, 2020, the committee has defined non-personal data as any data that is not related to an identified or identifiable natural person, or is personal data that has been anonymised. It has said that Non-Personal Data (NPD) should be regulated by a new regulatory body, the Non-Personal Data Authority (NPDA). This data, the committee recommends, should be further classified into three categories — public NPD, community NPD, and private NPD. The report identifies and defines new stakeholders in the non-personal data ecosystem, including data principal, data custodian, data trustee, and data trust, and contours their obligations and mechanisms to enable data sharing. It has also sets circumstances under which a private organisation, that collects non-personal data, needs to be remunerated.

Public Consultation

The draft report of the expert committee that MEITY had formed in September 2019 to come up with a data governance framework for non-personal data is available here.

Anyone can submit comments till August 13, 2020, 11:45pm, via the MyGov platform (here).

To submit comments, users have to declare that “The feedbacks submitted here will be kept confidential, no public disclosure will be made at any stage [sic]”. At the time of publication, 43 submissions had been made on the MyGov platform, none of them disclosed.

Here is a summary of the recommendations made by the committee in the 72-page report (our comments have been italicised):

It is important to note that this is a governance framework, or a set of guiding principles, not a draft legislation or bill. It recommends that there should be a separate legislation that governs non-personal data.

Recommendation 1.1: Define Non-Personal Data

Non-personal data: When the data is not ‘personal data’ (as defined under the PDP Bill), or the data is without any personally identifiable information (PII). It could be:

  • Data that never related to an identified or identifiable natural person, such as data on weather conditions, data from sensors installed on industrial machines, data from public infrastructures, etc., or
  • Data that was initially personal data but was later made anonymous. “Data which are aggregated and to which certain data transformation techniques are applied, to the extent that individual specific events are no longer identifiable, can be qualified as anonymous data”.

Recommendation 1.2: Define 3 categories of non-personal data:

  1. Public Non-Personal Data: “Non-personal data collected or generated by the governments, or by any agency of the governments, and includes data collected or generated in the course of execution of all publicly funded works. All Non-Personal Data collected or generated by the Government where such data is explicitly afforded confidential treatment under a law, shall not constitute Public Non-Personal Data.” This includes anonymised data of land records, public health information, vehicle registration data, data on pollution levels collected by a university, etc.
  2. Community Non-Personal Data: “Non-Personal Data, including anonymised personal data, and non-personal data about inanimate and animate things or phenomena — whether natural, social or artefactual, whose source or subject pertains to a community of natural persons. Provided that such data shall not include Private Non-Personal Data.”
    • For instance, besides datasets collected by the municipal corporations and public electric utilities, datasets comprising user-information collected even by private players like telecom, e-commerce, ride-hailing companies, etc., should be considered Community Data. Here, the ‘raw / factual data’, without any processing / derived insights, may be characterised as the Community Data.
    • Community: “any group of people that are bound by common interests and purposes, and involved in social and/or economic interactions. It could be a geographic community, a community by life, livelihood, economic interactions or other social interests and objectives, and/or an entirely virtual community.”
  3. Private Non-Personal Data: “Non-Personal Data collected or produced by persons or entities other than the governments, the source or subject of which relates to assets and processes that are privately-owned by such person or entity, and includes those aspects of derived and observed data that result from private effort.” It includes:
    • inferred or derived data / insights involving application of algorithms, proprietary knowledge.
      MediaNama’s comment: This could come in conflict with the Personal Data Protection Bill, 2019, wherein inferred data is included within the definition of the personal data.
    • data in a global dataset that pertains to non-Indians and which is collected in foreign jurisdictions (other than India).
    • data generated in the case of Generative Adversarial Networks where two AI engines contest against each other and create new data instances that resemble the AI engine’s training data.

Recommendation 1.3: Assess sensitivity of non-personal data

Like personal data, even non-personal data can be sensitive if it:

  • Relates to national security or strategic interests
  • Bears risk of collective harm to a group (collective privacy, etc.)
  • Is business sensitive or confidential information
  • Is anonymised data that bears the risks of re-identification

In addition, non-personal data that is born from sensitive personal data will inherit its sensitive characteristic. Thus, anonymised and aggregated data from sensitive personal data will yield sensitive non-personal data. This is because anonymisation of personal data does not remove the possibility of harm to the data principal and “non anonymisation technique provides perfect irreversibility”.
MediaNama’s comment: It is not clear what happens to anonymised data that arises from the yet undefined critical personal data.

Recommendation 1.4: Require consent for anonymisation and usage of non-personal data

Since the requirement of consent under Section 11 of the PDP Bill does not apply to non-personal data, the committee has recommended that the “data principal should also provide consent for anonymisation and usage of this anonymised data while providing consent for collection and usage of his/her Personal Data.”

“It is clear from industry feedback to the Committee and from its own research that large collections of anonymised data can be de-anonymised, especially when using multiple Non-Personal Data sets. This risk is considered by this Committee to be a valid one. Hence the individual (data principal) needs more protection.”

“The guiding principle in this regard, should be that the Personal Data that is anonymized should continue to be treated as the Non-Personal Data of the data principal. In this manner, any subsequent harms arising from re-identification, or otherwise arising from processing, can be acted upon by the data principal.”

Recommendation 2: Defines roles and stakeholders in the non-personal data ecosystem

Data Principal

Unlike personal data, where the data principal is the natural person to whom the personal data relates, in case of non-personal data, it is determined by the category of non-personal data:

  1. Public non-personal data: When the government collects data related to citizens (like census), companies (like company registration, financial filings) and communities, the data principal is the corresponding entity (individuals, companies, communities) to whom the data relates.
  2. Private non-personal data: When the private sector collects data related to citizens, companies and communities, the data principal is the corresponding entity (individuals, companies, communities) to whom the data relates.
  3. Community non-personal data: The community which is the source and/or subject of community is the data principal and “should be able to exercise key rights, including economic rights” over it.
    MediaNama’s comment: This is reminiscent of how the Justice B. N. Srikrishna Committee envisioned community data. 

Data custodian

Data custodian undertakes collection, storage, processing, use, etc. of data in a manner that is in the best interest of the data principal. The data custodian may be considered as data fiduciary, “subject to certain directions and control and acting as per the interest of data principal/group/community.”

  • The community’s ‘best interest’ is communicated to data custodians by data trustees on behalf of the data principal community in the form of data advice, recommended data practices requirements/guidelines, etc.
  • Data custodians have a ‘duty of care’ towards the concerned community in how they handle the NPD related to it. “This concept of ‘duty of care’ is a general set of obligations, which can in time be specified better, by regulatory guidelines, practices, rules, legislations etc.” and could include duties in form of anonymisation standards and requirements, protocols and means for safe data sharing, etc.
  • The NPD legislation, while providing community data rights, will also lay down principles and guidelines for various incentives for data custodians, respective data privileges, compensations, where needed, the nature of the required, well-regulated data markets, and so on.
    • Since symmetric data sharing obligations may not work for all data businesses, especially small Indian companies and start-ups, the legislation could have provisions like threshold size for data sharing, and graduated sharing obligations.

MediaNama’s comment: It is not cleary defined, whether a government ministry, department or agency that collects and processes data can also be a data custodian, or if only private entities are data custodians. For reference, the PDP Bill, 2019, includes the state in its definition of a data fiduciary. The Data Principal Community has not been defined, and it is not clear who defines the best interest of data principals or the community.

Data Trustees

The data principal group/community will exercise its data rights through an appropriate community data trustee. In the case of community data, unlike personal data where an individual can directly exercise control over her data, the concept of trustee for community data comes in, who would exercise such rights on the behalf of the community.”

The subsequent legislation on non-personal data will lay down the principles and guidelines for who is an appropriate trustee for group/community data. In principle, it would be the closest and most appropriate representative body for the community concerned. “The legitimate trustee for any set of community data will be the closest and most appropriate representative body for that community, which will, in many cases, be an appropriate community body or Central/ State/ Local government agency” such as the Ministry of Health and Family Welfare as a data trustee of data on diabetes among Indians or Manipur government as a data trustee of data on Meitei language.

MediaNama’s comments: From the examples given in the report, it is clear that both government entities (ministries, state governments, public universities, governments departments) and non-government entities (citizen groups, NGOs) can be data trustees. The question is whether private, for-profit companies can be data trustees or private organisations funded by companies and their representative, such as iSpirt. Similarly, can privately incorporated entities within the government such as BECIL be data trustees? Or privatised utility providers such as power suppliers or railways in the future? It is also not clear if data trustees also need to be registered somewhere, or need to be made publicly available in a list, or if they would be commonly understood. If it is the last case, that could lead to a fair amount of ambiguity.

In addition, when there are two or more potential data trustees for a community, who determines which of these will be the data trustee who will exercise the community’s rights? In other instances, for non-personal data related to, let’s say, data collected related to types of content watched on online streaming services, how will a data trustee be determined? If there are no governing organisations, private or ohterwise, but there is non-personal data, then what is done?

What can a data trustee do?

  • Seek enforcement of safeguards on sharing of community non-personal data, of which it is the trustee, before the Non-Personal Data Authority (NPDA)
  • Recommend enforcement of obligations on data custodians to the NPDA. These obligations, such as transparency and reporting mechanisms, regulations of data practices will be defined by the NPD legislation and would take into account factors including nature of data, kind of data practices, context of data use, nature and sensitivity of the involved sector, nature of expected outcomes, etc.
  • Collaborate with the NPDA to seek and enforce data sharing of community data on specific data requests. For instance, the NPDA may work with the government transport department, the data trustee of transport data, on whether, how and with whom their community data related to commuting through various modes of transportation, is shared.
  • Implement decisions in the interest of the communities on their behalf in a “strict-rules based manner, with adequate checks against abuse of power by government or other representative agencies, which requires and elaborate institutional structure of this purpose”.

Data Trusts

A data trust is defined as “Institutional structures, comprising specific rules and protocols for containing and sharing a given set of data”

  • Sources of data trusts’ data are multiple sources which are relevant to a particular sector, and required to provide digital or data services; sources of data could include:
    • Data shared by data custodians “as many private organisations may come forward to share data held by them”
    • Public organisations producing and holding various public data
  • What would data trusts do? Manage and provide “important data for sector-specific purposes” that government/data trustees seek mandatory sharing of; it could include both mandatorily and voluntarily shared data
  • Who manages data trusts? Public authorities, just as public infrastructure underpins much of industrial economy, or these can be managed by new, neutral bodies, cooperatives, or industry associations, etc. “Different forms may be found fit for different kinds of data and different sharing needs.”
  • ‘Data infrastructure’ includes “corresponding technical-material elements required for data sharing, like actual databases, APIs, organisational systems, etc.”

Recommendation 3.1 and 3.2: Data Ownership

At the beginning of the section on Data Ownership, the report defined ‘data sovereignty’ when it comes to non-personal data, saying,

“The ownership of the Non-Personal Data collected about people in India and collected in India should be defined. The laws, regulations and rules of the Indian State apply to all the data collected in/from India or by Indian entities.”

The recommendations that follow:

Recommendation 3.1: For NPD derived from personal data of an individual, the data principal remains the same for NPD.

Recommendation 3.2: Different owners of different types of Non-Personal Data

  1. Public Non-Personal Data is a national resource since it is derived from public efforts.
  2. Private Non-Personal Data: While it appears that the private organisation would own such data, for the purposes of data sharing, only raw/factual data that is related to a community might be shared, “subject to well-defined grounds at no remuneration”. If data processing adds value to the raw data, remuneration may occur. “Algorithms / proprietary knowledge may not be considered for data sharing.”
    MediaNama’s comment: it remains to be seen how this exception for algorithms and proprietary knowledge would play out against the new draft e-commerce policy that seeks to give government access to algorithms to check for biases.
  3. Community Non-Personal Data collected in India is ‘beneficially owned’ by the related community but rights over it vest with the data trustee of that community. It could be collected by private data custodians or public organisations but may be “a collective or shared asset because many parties have overlapping legitimate contributions to and interests in it”. Moreover, if the community non-personal data about the group/community gives systemic intelligence about it, the “group/community should be able to determine and control how such data and intelligence is used — maximising data’s benefits for itself and eliminating or minimising harms”. Some examples:
    • Raw / factual datasets comprising anonymised user-information data collected by private data custodians (such as telecom, e-commerce, ride hailing companies, etc.), may be considered Community Data.
    • Private data custodians’ drones taking pictures of agriculture farms of local farmers, with or without standing crops, and using it to analyse soil types, health of crops etc. may be considered as community data.

The report mentions that “The principle of raw data is, [that it should be] standards compliant, machine readable and [with] fidelity as collected. The raw data will be made available in usable formats, and only an open, reviewed license-free standard can be used.”

What is required to ensure a community’s rights over the Community Non-Personal Data?

  • Legal framework that incentivises data custodians, and ensures no undue restrictions on its use by others while protecting the community’s rights: Since data is non-rivalrous and thus the same dataset may have multiple data custodians and be valued by multiple communities, it is important to incentivise data collection by data custodians, and ensure that a community’s rights over it do not unduly restrict use of data by others
  • Concept of beneficial ownership/interest be used to allocate primary economic and other statutory rights over data: This ensures that community that produces the raw/factual data also benefits from the processing of Community Non-Personal Data. When it comes to intangible assets such as knowledge and data, the term ‘ownership’ means a set of primary economic and statutory rights whereby many actors may have simultaneous overlapping rights and privileges, that may interfere with each other. Thus, beneficial ownership/interest is used to safeguard the community’s interests and ensure that benefits from non-personal data are accorded to the community.
    • “Accordingly, such data [community non-personal data] may be shared in instances where there are defined grounds or purposes for sharing of Non-Personal Data with citizens, Indian start-ups, Indian companies, Indian public and private universities, Indian public and private research labs, Indian Non-Government organisations, and the Indian Central and State Governments.”

Recommendation 4: Create a new category of business – “data business”

Who is a data business? Any commercial, government, or non-government entity that processes or manages data beyond a certain data-related threshold.

  • This is a horizontal classification and not an independent industry sector. When businesses in different sectors collect or process data beyond a threshold level, they will be categorised as a data business.

Who will determine the thresholds? NPDA which may consult with sector regulators. Thresholds will vary with time, context and need. NPDA will regulate data businesses.

What will be the data business’s obligations?

These obligations must be fulfilled whether or not the business is regulated by another sectoral regulator.

  • Submit metadata about data user and community with details such as classification, closest schema, volume, etc. The directory of data classification and scheme will be published by NDPA, to which businesses that deal with new types of data are encouraged to make improvements and extensions. Additions to the list will go “through a peer review, academic review process as per IETF framework, guided by a Technical Advisory body created as per Open API guideline”.
  • Integrate raw data pipes with the NDPA for submission of raw data upon request. The NDPA will define the time period for this.
  • Disclose data elements collected, stored and processed, and data-based services offered and what data they collect, process, use, in which manner, and for what purpose, where data is stored, standards adopted to store and secure data, nature of data processing and data services provided (akin to pharma industry and food products).
  • Harmonise directories and disclosures required for personal and non-personal data so that businesses have to give that information only once.
  • Open access to metadata directories within India for Indian citizens and India-based organisations. This includes access to metadata directories by all such data businesses, including governments. “By looking at the meta-data, potential users may identify opportunities for combining data from multiple Data Businesses and/or governments to develop innovative solutions, products and services. Subsequently, data requests may be made for the detailed underlying data.”

“Data businesses will provide, within India, open access to metadata and regulated access to the underlying data.”

How will businesses comply? Businesses will register as a data business, but this is not a license. For businesses below the threshold, registration is voluntary. Compliance and disclosure processes will be lightweight and fully digital. Compliance process must also be easily discoverable for businesses across sectors. The company will also delegate a data officer for periodic disclosure.

What will the registration process look like? Initial registration would require a business ID (or country code and country business ID), digital platform/business name(s), associated brand names, rough data traffic and cumulative data collected in terms of number of users, records and data, and the nature of data business, and kinds of data collection, aggregation, processing, uses, selling, data-based services developed, etc.

Recommendation 5: Allow individuals and organisations controlled access to non-personal data for sovereign, public interest and economic purposes.

Access to the three categories of non-personal data may be requested for three purposes:

1. Sovereign purposes such as national security, law enforcement, legal or regulatory purposes. These purposes could include mapping physical and cyber security vulnerabilities, mapping crime and taking preventive measures, a regulator wanting to stay abreast of developments in the sector, or national security (via telecommunications metadata, geospatial or financial data, etc.)

2. Core public interest purposes such as community uses/benefits or public goods, research and innovation, for policy development, better delivery of public services, etc.

  • Identify high-value datasets at a national level through relevant government departments acting as data trustees of the data sets, such as health, geospatial, transportation data, and identify sectors that would benefit from NPD, such as agriculture, education, skills development, supports, logistics, MSMEs, etc.
  • Use non-personal data for research purposes such as an Indian genome repository. Sectoral data spaces, with sectoral clouds, which bring together private and public organisations should be created but the report doesn’t specify who should create those data data spaces. It is also not clear how an Indian genome repository would be an instance of non-personal data since mapped genomes cannot really be anonymised, and genetic data itself is calssified as sensitive personal data under the PDP Bill. 

3. Economic purposes to encourage competition, provide level-playing field or encourage innovation through start-ups.

  • Data requests by start-ups to data businesses: When start-ups and other businesses have access to metadata about data collected by data businesses (including governments), they may “identify opportunities for combining data from multiple Data Businesses and/or governments to develop innovative solutions, products and services” thereby spurring innovation. Such requests for access to data from a start-up or a business are private requests to the data custodian. In case of a dispute, the NDPA will determine whether the data is requested for any of these pre-defined purposes. In such cases, a public shared database is typically created so that it can be accessed by all.
    MediaNama’s comment: This seems to suggest that even government entities can be data custodians even though it is not explicitly stated in the definition of a data custodian.
  • Data requests by data trustees/governments: Importance of such community data will be identified by the data trustees/governments in consultation with sector regulators/authorities. The data trustees/governments can then directly seek access to such community data from private actors and place it in appropriate data trusts/data infrastructures for access to relevant parties.
  • Set up data and cloud innovation and research labs for new digital solutions. Stakeholders in industry, research, education, government and policy would collaborate on data-specific issues such as interoperability, 5G, Internet of Things, AI, etc. to address specific issues in different sectors.
  • Leverage data as training data for AI/ML systems: Organisations (public, private, start-ups, research, etc.) may be eligible to run their algorithms on centralised anonymised systems, without getting access to download the underlying data, and thus train their AI systems. This would be done though data trusts/infrastructures wherein the NPDA may have to intervene.
    • Incentivise data collectors to provide AI training datasets wherein the raw/factual data is properly labelled. This can be done via third party data trusts/infrastructures which would integrate the services of specialised data service providers for labelling data.

Use health sector as pilot case

Even though the PDP Bill classifies health data as sensitive personal data, the NPD report proposes that health data be used as a pilot use-case for the NPD governance framework since “large anonymised data sets of health data could lend community level insights into diseases, epidemics, and community genetics — leading to better tailored health solutions for the community”. Large anonymised datasets of health data would be classified as community non-personal data and could be shared for:

  • Regulatory purpose: public health purposes, disease control and prevention
  • Core public interest purpose: better healthcare, accuracy, increased specificity health care models, treatment protocols and diagnostic bots
  • Economic purpose: supporting digital start-ups and domestic digital industry in health sector

The benefit for the community is derived from the ability to develop new diagnostic bots or AI systems for healthcare diagnosis, delivery and patient care via access to community health non-personal data.
MediaNama’s comment: It is not clear how under the PDP Bill, such a use case would be possible.

Recommendation 6: Define data sharing mechanisms

The government needs to improve on Open Government Data initiatives, and make high-quality public non-personal datasets available. Moreover, data sharing principles must be applied uniformly to all three categories of non-personal data.

Mechanisms for sharing private non-personal data:

  1. Only raw/factual data that is collected by a private organisation related to community data needs to be shared, at no remuneration.
  2. If value-add is considerable, then for reasons of overriding public interest, data sharing may be mandated on fair, reasonable and non-discriminatory (FRAND) based remuneration.
  3. If value addition increases further, the concerned data should be brought to a well-regulated data market where the price will be determined by market forces within general frameworks of openness, fairness, etc.
  4. At “a certain level of high value-add”, the private organisation that collected the data will determine how it will use the data as the economic privileges will be considered inherent to the data itself.

MediaNama’s comment: The report does not specify who would regulate such a data market and who would set the threshold for value addition. Moreover, when market forces are allowed to play out, they benefit the largest players with existing data monopolies and those who have the first mover advantage. 

Process for requesting data:

1. A data request is to a data business for its metadata. The request could also ask for underlying data.
2. A business/start-up raises a data sharing request to a data custodian on the basis of the latter’s metadata.

MediaNama’s comment: It is not clear how steps 1 and 2 differ, as in, data businesses are a subset of a data custodian and if a data sharing request can be made to to a data custodian, it means it can also be made to a data business.

3. Two situations then arise:

  • Data custodian accepts the request: Transaction is complete.
  • Data custodian rejects the request: The request is then made to the NPDA which evaluates the request from a social, public, economic benefit perspective.
    • If the request results in benefits, NPDA requests the data custodian to share the raw/factual data.
    • If the request won’t result in benefits, NPDA denies the request.

4. The data trustee, who is responsible for the community data, may decide to make the requested data available as a public use database by creating a data trust that will ensure that de-anonymisation concerns are addressed.

5. The data custodian that had initially collected the data and from whom the data is requested may also market value-added services beyond the factual data.

Checks and balances for data sharing

1. Restrictions on cross-border data flows: Since directories/databases are prone to de-anonymisation, they are subject to restrictions on cross-border data flows as defined under Section 33 of the PDP Bill. This means:

  • Sensitive NPD may be transferred outside India, but will continue to be stored within India.
  • Critical NPD, which will be a corollary of yet to be defined critical personal data, can only be stored and processed within India.
  • General NPD can be stored and processed anywhere in the world.

Jurisdiction: For all community and public NPD, Indian law and regulation will continue to apply to such data and will take precedence over any other law. Safeguards to ensure that could be in the form of obligations, bilateral agreements, etc. on the basis of adequacy in the foreign jurisdictions. The entity that takes such data outside India will be legally responsible for complying with the data sharing requirements.

2. Contract between cloud provider and data business must comply with terms of storage, processing and usage of data as specified by NPDA.

3. Tools: “Testing and probing tools are continuously run on the data in these secure clouds and reports generated, auto-submitted by cloud providers and registered organisations to check compliance.”

4. Expert Probing: Experts, academic labs and Indian organisations registered though a self-serve peer review process “are encouraged” to probe the released data, cloud defences, etc. for vulnerabilities and risk of reidentification, and report them to the NPDA via its APIs.

5. Academic-Industry Advisory Body that is headed by a globally recognised technical expert can suo moto suggest changes to the standards, algorithms and fund improvements of these tools and systems.
MediaNama’s comment: The composition and selection criteria for admission to this body has not been specified. 

6. Liability will not be be imposed on organisations that comply with the standards via annual lightweight self reported, self-audited digital compliance reports, exhibit good faith and have best-effort internal processes in-line with the best of industry standards, and swiftly remedy any vulnerability found.

Recommendation 7.1: Create a separate Non-Personal Data Authority

NPDA will be responsible for regulating data principal, data custodian, data trustees and data trusts and will need specialised knowledge of data governance, technology, latest research and innovation, etc.

  • NPDA will work in consultation with DPA, CCI and other sectoral regulators to deal with issues of data sharing, re-identification, and collective privacy.

Composition: Undefined as of now but it will “have some members with relevant industry experience”

Recommendation 7.2: Harmonise the roles of the Data Protection Authority, Competition Commission of India, and the Non-Personal Data Authority

How is NPDA different from other regulators?

  • Unlike DPA which focusses on “prevention of personal harm”, NDPA “will focus on unlocking value of Non-Personal Data for India”. MediaNama’s comment: However, DPA’s focus is regulation of personal data, not prevention of personal harm. This seems like an oversight in the language of the report. 
  • Unlike CCI, NPDA will be a proactive actor and ensure provision of necessary data for legitimate economic, social and public purposes.
  • Unlike sector regulators, NPDA will have cross-sectoral view and role to ensure data sharing. Sectoral regulators can build additional data regulation, if required, over those developed by NPDA in a horizontal fashion.

What is the role of the NPDA?

1. Enabling role: Ensure that data is shared between data businesses, data trustees, data trusts and data custodians for sovereign, social welfare, economic, regulatory and competition purposes.

2. Enforcing role: Ensure that all stakeholders follow the Non-Personal Data legislation and its rules and regulations:

  • Regulate data businesses: Define the threshold for their registration, supervise data porting and sharing mandates and requests, manage metadata directories, adjudicate on data sharing disputes, etc.
  • Govern mandatory data sharing: The NPDA will determine if mandatory sharing of community NPD fulfils the guiding principles for such sharing. NPDA will also define the codes of conduct for such sharing.
  • Certify rules and standards including for data sharing, data safety, anonymisation, etc.

3. Work with DPA to ex-ante evaluate risk of re-anonymisation of anonymised data prior to approving requests for data sharing. Such evaluation will be governed by the PDP Bill.

4. Address market failures and supervise market for non-personal data: It should address harms such as:

  • Lack of information in terms of Non-Personal Data usage, the quantum and nature of actual Non-Personal Data assets held by an enterprise, and the consequential potential harms that could result from such Non-Personal Data collection and processing activities.”
  • Linked to market failure, addressing any potential negative externalities caused by Non-Personal Data collection and processing activities, including re-identification, deanonymisation, and potential discriminatory harms to customers and communities.”
  • “Lack of sufficient levels of competition, and access to Non-Personal Data, resulting in exploitative (discriminatory terms of transactions vis a vis other businesses or customers) or exclusionary (directed at restricting competition, and raising market entry barriers) harms.”
  • Recognise ownership rights and privileges over non-personal data

MediaNama’s comment: The report doesn’t establish that when it comes to mixed datasets, which regulator between the Data Protection Authority, and the Non-Personal Data Authority, would take precedence.

Recommendation 8: Enshrine technology-related guiding principles for creating and sharing data directories

Mechanisms for accessing data:

  • Application Programming Interfaces (APIs) whereby all shareable non-personal data and datasets created/maintained by government agencies, companies, start-ups, universities, research labs, non-government organisations, etc. should have Representational State Transfer (REST) API for accessing data.
  • Data sandboxes for experimentation and de ploying algorithms wherein only the output, not the data itself, is shared.
  • Distributed/federated storage for data security so there is no single point of leakage. All sharing should be done via APIs so that all requests can be tracked and logged.
    • For important non-personal data in different sectors, coordinated management of data trusts and data infrastructures will be required. The report does not specify who will carry out such coordination.
  • Standardised data exchanges regardless of data type, exchange method or platform. Input for the exchange coil be in any form and output must be standardised and usable by all stakeholders. The report does not specify who would manage these data exchanges or set standards for them.
  • Prevent de-anonymisation through mechanisms such as differential privacy, homomorphic encryption, and blockchain.

Recommendation 9: Creation of a Non-Personal Data Policy Switch as a single digital clearing house

To address issues around conflicting rules of data trustees over the same body of non-personal data, the report proposes a digital Non-Personal Data Policy Switch. “Using the Policy Switch, even though regulations can emerge from various institutions and regulatory bodies, the encoding, rationalisation (to ensure no contradiction), implementation and clearance/ compliance enforcement may be with a single authority — who is subject to the regulatory guidelines issued by various data trustees.”

It is not clear who this single authority is and how it would function if it subject to the regulatory guidelines issues by various data trustees. This section also causes a contradiction in the definition of data trustee: the assumption here is that the data trustee must be a government body even though the definition of data trustee allows for inclusion of NGOs as well.

What is this Policy Switch? It is “a single digital clearing house for regulatory management of non-personal data”. It is defined by “a set of APIs and a Policy Markup Language spanning all aspects of managing Non-Personal Data publicly and privately.”

What does the Policy Markup Language do? It encodes all interactions and transactions relevant to non-personal data including policies (anonymisation standards, aggregation standards, etc.), adjudication workflows (verification, certification, etc.).

  • This Markup Language, as per the report, should be evolutionary and capable of being regularly updated.

Other comments by MediaNama

  • According to the second draft of the e-commerce policy, any e-commerce regulation has to allow the government to have access to e-commerce data for issues related to security, law and order, law enforcement, taxation and safety of individuals. It also proposes the formulation of a new regulator for e-commerce. The report on NPD governance does not address who will facilitate and adjudicate such access to non-personal data — the e-commerce regulator or the NPDA? In fact, this report does not consider how access to e-commerce related NPD will be regulated if a new e-commerce regulator is also in the picture.
  • The second draft of the e-commerce policy also said that the government might be able to seek disclosure of source code and algorithms for e-commerce to regulate biases within the system. Any such access would be access to proprietary algorithms which, the report on NPD governance, says is not required when it comes to private non-personal data.

***Update (July 14, 2020 11:24 am): Added a section titled “Other comments by MediaNama”. Added a note about this report being a framework, not a draft legislation in the first section. Originally published on July 13, 2020 at 6:08 pm.