What stood out:
- Business purpose requests: Private entity to private entity mandatory data sharing requests are not considered to be in the scope of the committee’s recommendations.
- NPD Legislation: The Committee of Experts has proposed that the Non Personal Data (NPD) framework become the basis of a new legislation for regulating NPD.
- Sovereign purpose requests exempt: the Non Personal Data Authority will not adjudicate the validity of data requests under Sovereign purpose (national security, law enforcement etc).
- Entire raw data databases exempt: The committee has limited data requests to specific data-fields, and no requests can be made for entire databases.
- Private inferred data exempt:Private entities need not make available inferred, derived data, including trade secrets, algorithms, analytics.
- Purpose limitation: Data-fields can only be asked for, for specific, defined purposes.
- Data Processors exempt: Data processors cannot be asked to give NPD belonging to data custodians (whose data they’re processing).
- Meta-data directory: Data Businesses will have to share meta-data on the data fields collected by them, with the Non Personal Data Authority, which will manage a “Meta-data directory”, which will be available under open access.
- Additional member added to the committee: Rahul Matthan, Partner at Trilegal, for “Report Preparation” is named as a committee member in this report.
- Response to the Draft Report: over 1500 submissions in feedback, none of which were made public by the committee. Note that the MyGov page for the previous draft mentions 1458, which indicates that submissions were received outside of the MyGov platform as well.
The Committee of experts on Non Personal Data has released a revised report on the Draft Non Personal Data Framework, addressing several of the concerns raised, including those in around 1500 submissions it received on the previous version (summary; my take) of the report.
- The revised draft report may be viewed here.
- Submissions are being accepted till 27th January 2021, via MyGov here.
- The committee is, once again, shunning transparency, stating “The feedback submitted here will be kept confidential; no public disclosure will be made at any stage.”
Note from MediaNama:
- Submissions: In case you’d like to support transparency in public consultations, please feel free to mail a copy of your submission to MediaNama at email@example.com, with a note in the email text stating “Please feel free to publish my submission on the Draft Non Personal Data Framework consultation on MediaNama.com”. We’ll create a public list in the interest of transparency.
- Writing for us: In case you’d like to write for MediaNama on the intended impact of this version of the proposed framework, please contact me at firstname.lastname@example.org.
Here is a summary of the recommendations made by the committee in the 61-page report:
1. Recommended Regulation
“While public agencies produce a lot of data, much of the required data will be collected by and be in the hands of private companies.”
The committee proposes a single national level regulation to establish rights over NPD collected and created in India, with the NPD Framework becoming the basis of a new legislation for regulating NPD. Guiding principles:
- Sovereignty: India has rights over data of India, its people and organisations and benefit of this data must accrue to India and its people.
- Privacy: Misuse, reidentification and harms must be prevented
- Simplicity: regulations should be simple, digital and unambiguous
- Innovation and entrepreneurship: Data should be freely available for innovation and entrepreneurship
2. Definition of NPD (Section 4)
Non Personal Data: When data is not ‘Personal Data’, as defined under PDP Bill, or the data is without any Personally Identifiable Information:
- Data not related to individuals/national persons (weather, from industrial sensors, from public infrastructure.
- Data that has been anonymised or aggregated so that individual data is not identifiable.
Personal Data vs Non Personal Data (Section 5)
- No link between personal data and Non Personal Data: The Committee says that there is no overlap between the Non Personal Data and the data sought to be regulated by the Personal Data Protection Bill, 2019. Section 2B of the PDP Bill states that provisions of the bill will not apply to any personal data that has been anonymised.
- Mixed Datasets which have inextricably linked personal data and NPD will be governed by the PDP Bill
- Re-identification: If data is re-identified either as a failure of anonymisation technology, association of anonymised datasets that result in re-identification or conscious re-identification, the data will fall under the purview of the PDP Bill. Non re-identified data will be governed by NPD framework.
- Consent: Consent for personal data doesn’t apply automatically to NPD. Data collectors should provide notice and offer the data principal the option to opt out of data anonymisation. Opt-outs should be prospective. If consent has been provided and the data not been anonymised, then revocation of consent should be an option. MediaNama’s take: this is a potential privacy harm. It incentivises opting out of anonymisation.
Committee Recommendation: Amend provisions of the PDP Bill
“At present the provisions of Section 91(2) and Section 93(x) attempt to establish within the PDP Bill a regulatory framework within which even non-personal data could be regulated under the provisions of the PDP Bill. In order to ensure that the two frameworks are mutually exclusive yet work harmoniously with each other it would be advisable to delete these sections from the PDP Bill and ensure that they are appropriately covered under the NPD framework. If that is done then the words “other than the anonymized data referred to in section 91” in Section 2(B)) could also be deleted as infructuous.
3. Community & Governance of Non Personal Data (Section 7)
- Rights over NPD:
- Right to derive economic and other value and maximising data’s benefits for the community.
- Right to eliminate or minimise harms from the data to the community.
- Who will exercise these rights?
- A community can exercise these rights over NPD. A community is defined as “any group of people that are bound by common interests and purposes, and involved in social and/or economic interactions. It could be a geographic community, a community by life, livelihood, economic interactions or other social interests and objectives, and/or an entirely virtual community.”
- The community (through a non-profit organization -Section 8 company, Society, Trust) should be able to raise a complaint with a regulatory authority about harms emerging from sharing non-personal data about their community.
- Governance of NPD:
- Location: Non-personal data derived from personal data shall inherit the sensitivity of the underlying personal data for storage requirements as specified in the PDP Bill.
- Tools –Testing and probing tools are continuously run on the data in secure clouds and reports generated, auto-submitted by cloud providers and registered organisations to check compliance.
- Liability: Organisations are to be indemnified against any vulnerability found as long as they swiftly remedy it and adopt a standards-driven approach (like annual light-weight, self-reported, self-audited digital compliance reports).
- Academia-Industry Innovation Advisory Body –The Non Personal Data Authority shall establish aninnovation advisory body consisting of highly accomplished experts from academia, Government, industryand societyto develop/enhance/innovate on aspects like data sharing, data governance, technical standards like interoperability, privacy-protection, and data stewardship.
4. Data Businesses (Section 6)
The NPD Framework proposes the definition of a new classification of business: a Data Business. A data business is any business which collects, processes, stores or manages data, including both personal and non-personal data. This is a horizontal classification and not an industry sector. A data business can be a data processor or data custodian.
- Data Custodian: is a Government or a Private organization that has an obligation/responsibility to share appropriate NPD when data requests are made for defined data sharing purposes.
- Data custodians have a responsibility towards responsible data stewardship and a ‘duty of care’ to the concerned community in relation to handling non-personal data related to it.
- The data custodian has a responsibility to ensure that no harms to persons / groups of persons occur by re-identification of non-personal data.
- Data custodians must have mechanisms to remedy accidental harms emerging from using NPD for innovation.
- Data Processors: A data processor is a company that processes NPD on behalf of a data custodian, including enterprise software, Software-as-a-Service providers, cloud service providers, Global Capability Centres (GCCs), IT and ITeS companies.
- Threshold for classification: Entities may be classified as data businesses based on certain threshold of data collected/processed, as defined by the regulatory authority.
- Threshold parameters: gross revenue, number of consumers/households/devices handled, % revenue from consumer information
- Data businesses above a threshold will be required to be registered in India. Below the threshold, registration will be voluntary.
- Not a license: Registration will be a one-time activity, not a license.
- Thresholds suggested in the PDP Bill for Significant Data Fiduciary should be harmonised with data thresholds for NPD.
- Data processors will not be expected to share data belonging to data custodians.
- Registration information: Information to be provided for registration of a data business will include
- Business ID, business/platform name, associated brand names, rough data traffic, cumulative data collected (number of users, records and data) etc
- Nature of data services provided (data collection, aggregation, processing, uses, selling etc)
- Locations of storage of data and processing. MediaNama’s take: location of data storage is often sensitive important, impacting the security of the dataset. Location of datasets also often change frequently.
- Meta Data Directory:
- Meta-data is to be shared by the data business, under regulations includes names of the data fields collected by the data business.
- This meta-data about data has to be stored digitally by the Non Personal Data Authority in a Meta Data Directory. NPDA should provide appropriate time period(s) for the Data Business to submit meta-data.
- Open access is to be provided to meta-data directories, within India. Indian organisations can query the repository but not download the metadata.
- Data Trustees may identify business opportunities for combining data from multiple data businesses for community benefit. Data trustees can make requests for relevant sub-sets of data available through High-value Datasets.
5. Data Trustee
A Data Trustee is an organization, either a Government organization or a non-profit Private organization (Section 8 company / Society / Trust), that is responsible for the creation, maintenance, data-sharing of High-value Datasets in India. A data trustee is a data business, and can be created by the coming together of community members. Importantly:
- Only one data trustee per High Value Database (HDV)
- A single data trustee may be responsible for more than one HVD
- A Data Trustee will have to maintain Data infrastructure, including databases, APIs etc.
- Obligations of a data trustee: Data Trustees have a responsibility towards data stewardship and a ‘duty of care’ to the concerned community. Obligations:
- To ensure that high value datasets are only used in the interest of the community.
- To ensure that no harms to persons/groups occur by reidentifcation of NPD
- To establish grievance redressal mechanisms
6. Non Personal Data Authority
The committee has recommendated that the focus of the regulatory body for governing Non Personal Data, “will be on unlocking value in non-personal data for India.”
“Unlike CCI”, the Committee report says, the Non Personal Data Authority “will be a proactive actor providing early and continued support for Indian digital industry and startups, and ensuring that necessary data is available for the community.” Some recommendations regarding the NPDA:
- Industry Participation: It must be created with industry participation.
- Regulatory harmonisation: It should be harmonised with other bodies (DPA, CCI etc)
- Functions: The NPDA will create,
- An Enforcing framework for:
- Establishing the rights of India and its communities over NPD
- Address privacy, reidentification of anonymised personal data and prevent misuse of and harms from data
- Adjudicate only when a data custodian refuses to share data with data trustee
- An Enabling framework for:
- Unlocking economic benefit from NPD for India and communities
- Create a data sharing framework
- Manage meta-data directory of data businesses in India
- An Enforcing framework for:
Important: Sectoral regulators can build additional data regulations over those developed by the NPDA
MediaNama’s take: Giving a regulator competing goals (unlocking economic benefit from data versus consumer protection – addressing privacy and misuse harms – is a bad idea, and will create complications when the regulator has to balance the risk of privacy and misuse harms when it comes to enabling access to certain datasets.
7. High Value Datasets (HDV)
HDV’s are “datasets that are beneficial to the community at large, and shared as a public good.” They are useful for policy-making, job creation, creating new businesses, helpful for research and education, poverty alleviation, financial inclusion, agriculture development, skill development, healthcare, urban planning, environmental planning, energy, diversity and inclusion etc.
A government or NGO may request the creation of a High Value Dataset, in consultation with the Non Personal Data Authority (NPDA).
The NPDA will create guidelines to determine appropriateness of the HVD and the data trustee, covering:
- Objectives and impact
- Is this a valid data trustee
- Has the data trustee secured an expression of interest from a minimum number of community entites
- Does the data trustee have sufficient organisational and technical capability to handle HVD
- A public consultation to map the contours of the HVD.
8. Data Sharing (Section 8)
“Besides data philanthropy, some systematic mechanisms need to be developed to tap the social and public value of data.”
Data sharing refers to the provision of controlled access to non-personal data for defined purposes and with appropriate safeguards in place.
- Purpose of Data Sharing:
- Sovereign Purpose: may include national security, legal purposes, like for mapping security vulnerabilities and challenges, crime mapping, devising anticipation and preventive measures, and for investigations and law enforcement. May include pandemic mapping, prevention, prediction and subsequent interventions. Data requests for sovereign purpose will be made only by public / Government entity, to public or privatedata custodians.
- Public Good purpose: for community uses / benefits or public goods, research and innovation, for policy development, better delivery of public-services, etc. India should identify HVD domains like health, geospatial and/or transportation data.
- Research Purpose: non-personal datacan also be used by Indian researchers and government agencies for creating public goods and services like Indian language translation etc.
- Business Purpose: data that may be shared between two or more for-profit private entities. “Given that data sharing exists between two private entities, the committee has no recommendations on this.”
- “Outside of a Public Good purpose, private entity to private entity mandatory data sharing is not considered in scope of the Committee’s recommendations.”
- NPDA will not adjudicate validity of data requests in case of data requests under sovereign purpose.
- Guidelines for NPD sharing for high value datasets:
- Purpose limitation: NPD sharing is only for specific purposes
- Public Good: Data sharing should benefit greater public good.
- Charges: reasonable charges may be paid to the data custodian for processing of data (anonymisation, aggregation), but not for data collection.
- Access to “private companies’ trade secrets or proprietary information regarding employees/internal processes and productivity data”
- When data sharing is likely to violate privacy of individiuals, groups or communities.
- Data not a part of high value datasets in India are outside the scope of the Committee’s recommendations.
What data can and cannot be sought:
- Raw/Fatual/transactional level: Complete datasets should not be made available by public and private entities. Specific subset of data fields should be.
- Aggregate data level: Should be made available by public and private entities
- Inferred data level: This is derived data, where insights are developed by combining data points, including trade secrets, algorithms, analytics etc. This should not be made available by private entities. Should be made available by public entities, except in cases of national security.
- Non-discriminatory access to data: Data trustees should request for data from all major data custodians in the corresponding data-category to create HVDs. There should be a non-discriminatory access to data from the ecosystem.
9. Technology Architecture (Section 10)
The guiding principles for such a technology architecture include:
- Mechanisms for accessing data: All sharable non-personal data and datasets created or maintained entities should have a REST (Representational State Transfer) API for accessing the data.
- Distributed for data security:
- Sharing to be undertaken using APIs only, such that all requests can be tracked and logged.
- Data storage in a distributed format so that there is no single point of leakage all requests for data must be operated after registering with the company for data access etc.
- Standardized data sharing approach: should be able to take-in any form of data and produce output that is standardized and usable to all stakeholders.
- Prevent de-anonymization: Mechanisms must be put in place to ensure that re-identification of anonymized data does not occur. Best of breed Differential Privacy algorithms may be used to create anonymized data. These algorithms should be jointly evolved by Indian academia and industry, continuously improved using a combination of global open source improvements and with funding to Indian research organisations. These algorithms along with their open-source implementations are made available to Indian organizations along with minimum recommendations for each major type of data.
- The data sets so anonymizedare then submitted or when real-time, streamed into “Secure non-personal data Clouds”.
10. Policy Switch
A Policy Switch is a single digital clearing house for regulatory management of non-personal data.
- The Policy Switch is defined by a set of APIs and a Policy Markup Language spanning all aspects of managing non-personal data publicly and privately.
- Each data trustee may create a separate window of clearance and rules for using data under their regulation.
- Using the Policy Switch, the encoding, rationalisation (to ensure no contradiction), implementation and clearance/ compliance enforcement may be with a single authority – who is subject to the regulatory guidelines issued by various data trustees.
- The Policy Markup Language encodes all interactions and transactions relevant to non-personal data spanning:
- Policies: e.g. access rules, anonymizationstandards, aggregationstandards, business rules, security standards
- Adjudication workflows: e.g. verification, exception adjudication, certification
- Compliance: e.g. registration, compliance submissions, that are applicable to non-personal data such that non-personal data custodians, both public and private, only have to interface with and comply with the Policy Switch digitally, no matter the types or sources of data with which they are engage To reduce the burden on various governance authorities, the Non-Personal Data Authority will create a base set of minimum set of policies, workflows and compliance rules with which all non-personaldatamust comply.
“A well-implemented policy switch will continuously capture corner cases that emerge via built-in adjudication workflows and after verification, update the marked-up policies so that corner cases are captured in definitions as whitelists or blacklists; and as conditional exceptions in the rule hierarchy”.