The report on non-personal data would lead to endless litigation over whether or not individual databases fulfill the criteria of originality and are thus protectable by the copyright law, a speaker at MediaNama’s discussion on the Governance of Non-Personal Data, held on August 7, cautioned. Similarly, the speaker explained that if raw data meets the threshold of originality, which would again be determined by courts on an individual basis, it would be protected by copyright law.

The discussion was held with support from Centre for Communication Governance (NLU Delhi), Facebook and FTI Consulting. The discussion on data trusts was held under the Chatham House Rule. All quotes have been edited for clarity, brevity and anonymity.

Copyright over databases would be determined on a case-by-case basis …

This is because databases are protected as literary works under copyright law, at least definitionally, the speaker, an intellectual property rights lawyer, said. This means that databases would have to be adjudged on a case-by-case basis on whether they are original or not and thus capable of receiving IP protection. However, since databases are definitionally protected by copyright law, India has treaty obligations under Trade-Related Aspects of Intellectual Property Rights (TRIPS) under the WTO. As a result, they concluded that the reports stance on private non-personal data and enabling its sharing with start-ups and potential competitors is an “expropriation of intellectual property”.

The speaker cited the Burlington Home Shopping Pvt Ltd v. Rajnish Chibber & Anr. under which the Delhi High Court had accepted the UK standard of sweat of the brow doctrine as the criterion for offering databases copyright protection. But in 2008, the Supreme Court rules that that was not enough and considered a higher American standard of creativity but eventually settled on the Canadian standard that was a mixture of sweat of the brow and modicum of creativity. As a result, the lawyer explained, mechanical compilation of data will probably not be protected but anything that requires a bit of curation (careful sorting, ordering, analysing, etc.) could be. They clarified that each database would have its own level of originality that would need to be assessed on a case-by-case basis.

… as would copyright over raw data

In case of raw data as well, the speaker explained, if the argument around originality can be made, or one around effort involved in collecting it or creativity involved in collating or compiling it, raw data could also be protected by copyright.

A case of compulsory licensing?

Since the report talks about FRAND obligations to access certain curated or analysed data, the speaker said that they called it expropriation of data for which price-fixing mechanisms would be required. They explained that if a law seeks to expropriate intellectual property, as per the international treaties that India is a signatory to, the law would have to offer FRAND (fair, reasonable and non-discriminatory) pricing, something that the report does mention. Any such expropriation would have to be subjected to the three-step test adopted under the Berne Convention in 1967. If a signatory to the TRIPS agreement, which India is, fails this test, then it is in violation of the agreement.

The speaker warned that when parties don’t agree on FRAND terms, they are settled by courts and are thus comparable to an “involuntary licence” that is tantamount to a compulsory licence in effect. They gave the example of a 2010 case of compulsory licensing in Madras High Court under which the private FM industry had a compulsory licence, with a span of 10 years, issued in their favour against the music industry. The order was stayed by the High Court in 2010 and has not been resolved since then even though the licence will expire in September 2020.

FRAND obligations in telecom sector: The speaker explained that the FRAND obligations come from the telecom industry in this context whereby as per the terms of a standard-essential packeting licence, if a telecom packet is declared to conform to a standard, then a telco can’t refuse to license it on FRAND terms.

Impact on a company like MyGate: A speaker took the case of MyGate to explain that such an app collects large amounts of data and value is derived from that data. This data includes ordering preferences, cab preferences, ride-hailing service preferences, location data, etc. By mandating sharing such data with another company, it would act as a disincentive for investment into the company, the speaker said.

How value is derived from data

A speaker, from the ad tech industry, explained that data is used to glean insights in to the market and figure out the relevant market groups. For instance, if a person listened to Bohemian Rhapsody on Spotify, they would be classified in a certain manner and Spotify could use that data to push similar songs to the user, thereby creating more value in its own product for users, or sell that data to advertisers with other data such as that users who listen to this song tend to buy such products. This creates an advertising group for the advertiser.

Another speaker cited research from University of Cambridge as per which “there is some value in raw data which gets exponentially enhanced and increased when it is combined with other data sets”. “My data may have marginal value but if you combine it with data about my family, about my community, it becomes imminently valuable to advertisers, for example,” they said.

The speaker also cited this Wired article by Gregory Barber wherein the author directly sold his individual data to data brokers for cryptocurrency. At the end of that selling spree, the author had amassed a whopping sum of 0.3 cents, that is, 22 paise. The speaker concluded that the data, when amassed across thousands of people, aggregated and analysed, makes platform/data monopolies to consistently get listed as top 10 valuable companies in the world, but doesn’t have much monetary value at an individual, siloed level.

In terms of the non-monetary value of data, the speaker pointed out that there are costs and benefits related to it in terms of privacy harms, expropriation of intellectual property, and potential benefit to society. “If we cure cancer thereby generating immense value for society, but then what is the cost borne by a company whose patent was appropriated?” they asked.

Abusing objective non-personal data

A speaker highlighted that even objective non-personal data, such as agricultural data, which is declared community non-personal data in the report when collected via private drones, can also be used to cause harms. If this data is bought without consent from the farmers in question, it can be used to sell more expensive products to these farmers, or discriminate between who they sell to and who they don’t, or offer lower premium for crop insurance for one versus the other, the speaker said.

 How is data priced?

The speaker from the ad tech industry explained that there are two ways in which data is priced:

  1. For instance when a company like Foursquare sells data, it goes to different companies and tells them that we know so many people have gone to this restaurant and we have these many points of interests on them that we can sell to you. The price negotiation doesn’t happen in an automated manner.
  2. In an automated exchange of data, a company bids for the kind of users it wants in an automated basis.

Impact on the health sector: A speaker pointed out the immense impact any regulation around non-personal data and its mandatory sharing would have on the health sector, especially in terms of clinical trials. They pointed out that a lot of data is collected during clinical trials which also given to the regulator to show efficacy, safety, etc. All of this regulated through different treaties. Developed countries have pushed for data exclusivity under Article 39 of TRIPS to argue that their clinical data cannot be given to their competitors. By mandating such data sharing, India “will lose the argument form a treaty perspective”, they said.

Problems with the report

  1. Anonymisation is not really possible, and the report acknowledges that: A speaker pointed out that linkability is possible at every point and thus, with more points of linkage, anonymisation is just not possible. The only way out is to encrypt the data, but then it means that nobody, including the data fiduciary that collected the data, can use the data.
  2. Conflict between PDP Bill and NPD report: Since anonymisation is not absolute, the report has set up a conflict between the Data Protection Authority established under the PDP Bill and the Non-Personal Data Authority that the NPD report proposes, multiple speakers concurred.
  3. The report argues for expropriation of intellectual property: “What the report suggests, at least with respect to private data owned by companies, is an expropriation of the intellectual property. And, given that view, I don’t see how the committee addresses the reasonableness, the non-arbitrariness or the pressing requirement for that expropriation,” a speaker said.
  4. Public interests are too broadly defined: A speaker pointed out that the report needs to define public interest and the required outcomes narrowly and in a way that “does not utilise or appropriate private investment to create public infrastructure”. “A specific public interest outcome that is justifiable based on a cost-benefit analysis and addressed upfront as part of licensing agreements” needs to be established, they said. The speaker acknowledged that there might be merit to the open data policy, which already exists, but the problem is that it is not being executed properly, both from a technical and governance perspective.
  5. Lack of certainty for start-ups: The report creates huge uncertainty for start-ups in terms of investment and regulation, two speakers said. It also disincentivises new investment into any company that deals with data collection and analysis of non-personal data.
  6. Data processors have not been considered: Unlike the Personal Data Protection Bill, where data processors at least made a ‘cameo’ — as one speaker characterized it —, the NPD report does not talk about data processors at all. While ownership over data processed by data processed would be determined by contract, it is not clear what will happen to incidental data that is generated in the course of providing processing services such as cloud services.
  7. Relationships between different stakeholder have not been clearly defined: “While you’ve data trusteeships and/or representatives of the community that will pretty much hold the data as in trust for a community, I don’t believe that the report actually brings out any sort of freedom that the trustee will have to refuse to share data,” a speaker said.
  8. Technocratic approach to public interest: A speaker criticised the report’s “technocratic” and “coercive” approach towards creating “a new paradigm of data sharing”. “They have declared all data to be sort of a fair fame without talking about the legal basis, without talking about the normative basis, without conducting a cost-benefit analysis,” they said.
  9. Establishing ownership over community data is difficult: A speaker asked that when a community lays claims of ownership over data, how generic was the information before such claims were made and how was it available. They said that data belongs to an individual and consent for sale and sharing of data would need to be taken from an individual.

Read our complete coverage from the two-day discussion here.