- PhonePe’s digital payments data is aggregated at a district level to ensure privacy
- Only verified individuals have access to PhonePe’s system.
- Google Maps anonymises data at the collection point only after users opt-in to share their data.
- Google’s products have no use case for non-anonymised and non-aggregated data
“Sharing of non-personal data should be flexible and voluntary, not mandatory. While there should be creative encouragement from governmental organizations or policymakers to encourage the sharing of non-personal data, I don’t believe private enterprises or public enterprises, large companies or small companies should be mandated to share such data,” said Karthik Raghupathy, Head of Strategy and Investor Relations at PhonePe.
He was speaking at MediaNama’s ‘Regulating Non-Personal Data‘ event held on February 18, 2022. Other speakers in the panel on Startups and Non-Personal Data included Anal Ghosh from Google Maps, Sijo Kuruvilla George from Alliance of Digital India Foundation, and Zainab Bawa from Hasgeek, with MediaNama’s editor Nikhil Pahwa as moderator.
This event was organised with support from Google, PhonePe, Amazon, Meta, and Microsoft. To support future MediaNama discussions, please let us know here.
The non-personal data (NPD) framework proposed by the expert committee set up by the government mandates the sharing of non-personal data at an aggregate level for public value. Many have suggested that the directive can have inadvertent consequences. The impact will be sweeping as India is in the midst of a startup boom, and startups are among the largest generators of non-personal data.
How did PhonePe decide to share data?
Raghupathy shed light on PhonePe’s data-sharing initiative— PhonePe Pulse. It is a website that allows the public to see how Indians are transacting digitally. It contains all the digital payments data that PhonePe has gathered over the last three years.
Why does PhonePe share this data?
- Open philosophy: “We strongly believe in an open philosophy. We think that sharing data fosters identification of new opportunities, enables innovation, and moves the ecosystem forward. Pulse is an altruistic example of non-personal data sharing. The best use of non-personal data is to leverage anonymized data to promote innovation,” Raghupathy elaborated.
- Representative of the economy: “At our scale, as the largest digital payments player in the country, our data is actually representative of the trends in the economy of the country,” he said.
What kind of data is not shared under Pulse?
Raghupathy said that concerns boiled down to privacy, along with data sanctity or accuracy. It was not only restricted to private data of consumers but also data that could be inferred because privacy can be compromised through differential trends as well, he stated.
The company does not share merchant data at a granular level because it would compromise competitive considerations. Here are some of the considerations enlisted by Raghupathy:
- Aggregated to a district level: “The kind of data we shared publicly is only aggregated up to a district level. We have data up to a lat-long level, which is the most granular, but we decided to stop at the district level,” Raghupathy asserted in his response.
- Ensure controlled access: “We made sure that there are strict controls to monitor access control because you need to make sure that there are no access control issues where people can manipulate when it comes to data verification and data accuracy. We’ve gone through an entire audit, and have an ongoing control process where only verified individuals have access to the system,” he spelled out.
- Conduct a three-stage process for data accuracy: Raghupathy said that there is a three-stage process conducted by separate teams that are disparate with the final sign-off done by a controllership team. It is how PhonePe ensures there is no private data is leaked, and no data that may trigger competitive risk.
What kind of data does Google Maps share with the public?
Anal Ghosh, Senior Program Manager, Google Maps (South Asia), said that Google has been sharing geospatial location data for years. He said that Google Earth Engine, which has been active in India for more than a decade, has more than 700 geospatial datasets freely available for online analysis across a lot of geospatial domains.
The company also provides environmental insights where cities get access to emission-related insights based on which they can take climate action accordingly.
Significance of community mobility data: “Community mobility reports are something that we started a couple of years back in 2020. There were large-scale initiatives taken by governments like social distancing, lockdowns, and curfews to control the transmission of the virus. However, there were no measures to see how the public was responding to these measures and whether they were effective. It is how community mobility reports came into being,” Ghosh elaborated.
Determining the impact of government’s pandemic measures: “We worked with public health officials to determine which places to track to understand the impact of measures. We defined six categories of places, and compared to the start of the pandemic, we started showing the change in mobility over time. The reports are updated on a daily basis till now. We started at the country level because we wanted to be confident of the data we were sharing but we have gone down to a district level,” he said.
Data is anonymised and aggregated: Ghosh described how the data is anonymized and aggregated at the point of collection. “The data is gathered only from users who have chosen to share their location history with Google […] It is off by default on a product so users need to go in and opt in to share it. We apply anonymization and aggregation on top of it,” he said.
“We have seen researchers like ORF, TIFR, IIC analyzing these reports and coming up with insights and governance measures on how the pandemic could be controlled and what should be measures going forward if there is another wave.” — Anal Ghosh
Application of differential privacy: “We apply a technique called differential privacy to ensure that data cannot be tracked. Differential privacy is the second layer that we put when sharing this data. It is to ensure that there is no way that data can be tracked to an individual person. Differential privacy adds more noise to the data. It makes data absolutely random. There is no use case in our products which requires us to use non-anonymous and non-aggregated data,” Ghosh declared.
Why data-sharing requests are declined?
Ghosh explained that data becomes usable at an aggregate level because of the objective to generate insights at a macroeconomic level. “It won’t be usable in the first place if it was at a granular level,” he added.
- Some districts do not have their reports: “We have to consider the quality of the reports being generated. There are a few districts in India where we won’t find these reports. The reason is that there are not enough people sharing their location history with us. There is too much noise to draw insights out of data after differential privacy,” Ghosh disclosed.
- There are no special requests: “We haven’t had any special requests from public health officials because they realize that it’s not available at a personal level as it’s not possible. We are open to sharing data whenever there is a need but each use case needs to be evaluated and seen whether we have usable data. There needs to be a collaborative approach and we can’t take unilateral decisions,” he said.
Raghupathy said that the company was in the early stages of discovering use cases as the project went live only in September 2021. “We’re scratching the surface in terms of various use cases that can be powered with this data,” Raghupathy said while stating that the company was a little behind in terms of new requests.
Raghupathy, however, did say that there have been merchants who have asked for APIs that provide data about their consumers.
“We have a strong data set of our consumers which is much broader than the interactions our partners have with them. It is very valuable to them from a targeting perspective. We’ve said: absolutely not. You are then selling data for data’s sake. There are requests coming in terms of new data elements that people would like to see on Pulse, and we have started to make a list.” — Karthik Raghupathy
- MP Amar Patnaik on Non-Personal Data: Different DPAs would impede protection of citizens’ rights #NAMA
- Regulating Non-Personal Data: Why It Might Not Address Antitrust Concerns Like Data Monopolies And Barriers To Entry #NAMA
- Data Protection bill 2021: How the JPC wants to deal with non-personal data
Have something to add? Subscribe to MediaNama here and post your comment.