A major part of the discussion on data collection was dedicated to how companies can set up purpose limitation for the information that is being collected. “How do you make the data you collect relate to the function you are doing?”  Malavika Raghavan from the IFMR posed a question to the audience. “There are a number of people who are looking at time-limited consent. You know, 10 days later, you automatically delete. If you need that data again, you ask for it again from the user.” She asked if it would be feasible to do that.

Mrinal Sinha, Chief Operating Officer at MobiKwik, was on board with the time-based consent but argued that purpose limitation would still hurt future businesses.

“I’m personally in the camp of time-limited consent where if an app is providing useful information, every time they will refresh their permission, the user will give it. But what happens is what we’re trying to do today will be different from three months down the line. So purpose limitation often limits the users themselves from getting useful data. . There should obviously be governance on access to data. But speaking about purpose limitation, we ourselves as consumers would not know enough about what purposes we would like to limit to or not,” he said.

Sinha explained that there should be data exchange between the user and company when it comes to data. “We (MobiKwik) collect people’s location data as we are a payments platform where a lot of our users are trying to access deals from local offline merchants in their vicinity. Unless someone enables that feature, we are unable to tell what stores in his area would do home delivery etc. So we used to see a lot of people access a lot of people who would look for medical stores who would do home delivery at midnight. So there should be give-and-take for data.”

Raghavan pointed out that it is essential to keep the user in the loop regarding the nature of new features.

“I think the idea when you have features which are additional or opt-in we are missing a trick in how the user is being looked into this decision. We all understand that people have cognitive constraints when it comes to valuing personal data, however, I think we need to think about when we’re offering that feature where the user knows or doesn’t know. So it’s basically when a provider provides a new feature, what can a provider do to flag it up to the user which says ‘we collected this data from you and we’re using it for something else’.

Meanwhile, Manav Sethi from ALT Balaji mentioned that since they are not ad-supported, their data collection is limited to making making the service better. “We are not looking at monetizing through ads, to that extent, all our data collection is about enhancing customer experience, and about content usage – what kind of content is working or what kind of genre is working, what kind of day- pathing is happening , what is peak usage. So every element of data that we collect goes into the improvement of the app and the creation of content buckets,” he said.

Control after you give clients access

However, Hitesh Oberoi, CEO of Info Edge which runs recruitment platform Naukri pointed out that often companies are inundated with data which will not be useful and it will be hard to form purpose limitation in this situation.

“Companies had been listing jobs in newspapers forever before Naukri came along. The truth about India is that, companies don’t want all of these resumes. Because when they put up a job, they put a thousand applications and hire one person. For them the 999 resumes are junk. What happens to CVs after that, we have no control. There are people who come to us and apply for 100 jobs a day. There is no way for recruiters to even get back to so many people because they are inundated with CVs,” Oberoi said.

“So what we do instead is give them tools to sort through them. We try to rank applications and we try to use algorithms on our end to tell recruiters that ‘these are the ones you should look at first and these are the ones you should look at second’ and you try and make their job easier. But you know, the more you keep your job search private, you’re not going to get a job,” he added. “So what you can do is block companies from the registry to say “I don’t want these companies to see my CV”.  You can also say ‘for this period of time, make my CV unsearchable’.” Oberoi added as a possible solution.

Aggregated data and machine learning

Reunka Sane, an associate professor at the National Institute of Public Finance and Policy, pointed out a thorny issue when it comes to aggregated data and using machine learning to build credit profiles. “

“On purpose limitation, we often hear that you will find these machine learning algorithms where who knows what variable is correlated with what. For example, kite flying predicts good credit risk. I know correlation is not causation. So when we say purpose limitation, ex-ante, I may not know that this variable is a good predictor of something. But if I ran that algorithm, there is a benefit to be had to have a proxy for good credit risk. How do companies think about data collection from that perspective?”

Sinha explained that it is crucial to mask the identities of users while running these kinds of algorithms.

“The moment you get to an individual’s data and you are getting to a specific user it is highly dangerous. Because when an employee who has access to it, and when you don’t have policies in place, you can do all kinds of things with it. So long as you are doing stuff which is aggregated where identities are masked, you’re looking at location information for 100,000 people that’s okay for employees to access or different parties in a company. At MobiKwik, we have policies if an individual’s data has to be accessed pretty much no one else has access to it. If it has to be accessed you need 3-4 signatories to authorize it and a case has to be made on why a certain individual’s data has to be accessed. Some of this is organic and some of it you need to decide at what level monitoring data and usage of data gets into the dangerous territory,” he said.

Raghavan explained that it is also essential for businesses to use training data sets and not expose the entire client data set to such algorithms. She pointed out to two examples to illustrate her point. “One is called Microcred that only uses behavioural data where they say it’s not who you are but what you do that matters. On the other hand you have something like Kreditech which uses 20,000 proxies and they have the gold standard algorithm in the world. Now no one knows what’s working for whom, yet. We know that there are certain things which are good predictors – past repayments, and transactions related to these payments. And this is what the credit bureaus have always used. Things we are using right now like whether you sort your contacts by first name or last name, these are untested. So I feel like, as you test and learn. You need to use a training set and not use your entire customer data set as a way to learn,” she said.

Graded approach to data

During the discussion, it emerged that there is a need to have a graded approach to what types of data can be collected and see what kind of permissions or rights could be given to data. Raghavan explained further:

  • “Certain types of data will have rights associated with it which will be on the property framework. Where you can tell people that their data can be used for services and so on and they will be paid for it.”
  • “You can then have data which will be considered intellectual property. You could have something that you have created which will be similar to copyright and will have the author’s moral right where even after you sold it, there might be some non-economic interest in terms of its manipulation.”
  • “You could also have a right where you could injunct someone who has said something wrong about you and get them to correct it.”
  • “Then you have a lot of talk on your digital persona where it is an extension of yourself. Basically, it is saying that if there are some parts of you – whether it is on the cloud or on you – it will get human rights protection. So this will have very different implications for a person in a company if you are withholding someone’s liberty, but if you’re doing a fair exchange with someone and giving them a service, then that’s fine.”