Note: #NAMAprivacy Bangalore is next week. Apply to attend here.
In a system like blockchains, where all the parties involved are know all the information for a transaction, is it even possible to have privacy? If so, what kind of data should be left out of the system? #NAMAprivacy participants discussed the implications of such a system. We all know that blockchain is the underlying technology for cryptocurrencies and bitcoins. It’s a distributed ledger where everyone in the system who has access to the blockchain knows exactly what is happening on each other’s ends.
But first, it is important to distinguish a distributed database and a decentralized ledger system.
Ananth Padmanabhan, a fellow from the Carnegie Endowment for International Peace (CEIP), explained that a distributed database could very well be where there is a central authority which holds the data and multiple copies of that data is there for others to view it. “A classic example of that could be a central government database – say with the taxman, the RBI and others in a governance framework. But that really is not blockchain,” Padmanabhan said.
“In a situation where different actors do not trust each other, what you do normally is that you would set up an intermediary that you all trust. The classic example being currency; ‘I don’t know if the currency you have given me has any value, therefore, the RBI authenticates or certifies that it has value.’ That is the centralized intermediary concept. In the blockchain situation, you are creating an architecture where you are decentralizing the intermediary’s role of certifying a particular transaction or certifying a certain asset class,” he explained.
“In manufacturing, where you have multiple nodes such as procurement, sales, the compliance guys, where all of them can view the data on the system. But we need to see if this is materially different from a distributed database. You don’t have to call it a blockchain and you can just call it an open data model because you don’t need to do any computing on it,” Padmanabhan added.
Public and private blockchains
In this scenario, blockchains can be private or public and each of them has their own set of issues. “The concept of public and private in the context of blockchains is less to do with sharing but more to do with who can authenticate a transaction or not. In a public blockchain – the classic example is bitcoin – where every node can authenticate a transaction,” Padmanbhan said.
“But in the case of permissioned ledger or private blockchain, you select a certain number of entities who have preferential rights to authenticate that transaction. So when we come to access, that’s where we come to the privacy bit. So what does this do in terms of security and in terms of confidentiality? On the security front – the database is replicated across different nodes and each can view the transaction – as a result, any change which is made to the database all the nodes in a system have to verify that and authenticate that change. So on the security front, it is a plus on existing centralized databases.
“Even in a private blockchain, for example in a supply chain, confidentiality is an issue. Not everything in a supply chain transaction is meant to be viewed by all the actors and you need to restrict access. And in that space, there is still research going on. We do not have a fool-proof solution even if it is private blockchain because there are possibilities of reidentification and so on,” Padmabhan added.
Blockchain is not for everyone
Padmanabhan explained that blockchain as a solution cannot be used for everyone. “It also needs to be looked at what domain you are using it. What asset class are you using?” he said. He added that it could be applied to intellectual property (IP).
“I am IP lawyer. And for years the issue has been “who are these copyright societies who have centralized the whole licensing regime?” It’s like the house always wins and the copyright societies always win. The authors and musicians, lyricists, composers don’t make money. Today, you’ve got Spotify thinking of using blockchain for content access and sharing of royalties. So can we have a system without a central authority? I think it is possible.”
“So blockchain could also apply to something like public land records or use it for tax filings where everyone’s tax filings are public,” said Aditya Berlia from the Svarn Group. However, he explained that records kept in a public blockchain could become problematic for individuals on when it comes to security and privacy.
“The holy grail of any investigator is to get access to a criminal organization’s bank accounts. In a public blockchain, we have that access from day one. So the worry to privacy becomes an issue for other people as well. If all land records are public, if for example, here is someone who is very rich and has a lot public land records it could be used to get information to kidnap his kids. Because it becomes a personal security issue to have your land records, bank information online,” he explained.
“So what we need to have laws on publicizing that records on a blockchain. And we also need to look into the what are the protections of that individual where I do have default anonymity,” he explained.
Is KYC important in a blockchain?
During the discussion, audience members were discussing how central or regulatory authorities would approach blockchains. In such a scenario, it becomes important that users on a blockchain should have KYC.
“One of the reasons why the global financial system has been fairly centralized for the last 3-4 decades has been to combat money laundering. Once a financial regulator has access to a blockchain system, will it deter money laundering? On some transactions, you want it to be anonymized, I mean we all do stuff which we are not proud of,” an audience member chimed.
Padmanabhan said that authorities will never allow a public blockchain system to be rolled out “So I think it will never be a public blockchain system and it will only be a permissioned ledger system if we at all use blockchains as use cases. And by permissioned ledger, it would have a set of hierarchies that where the final authority on the transaction will be the bank and the bank is not going to give the go-ahead to the transaction without KYC by actors in that particular transaction.”
However, Prashant Singh from Paytm countered saying that the blockchain systems will not need endorsement from the authorities or KYC.
“The idea of blockchain needing endorsement from the RBI or other authorities is an anti-thesis of the core idea. Even in the existing system, all nodes are not created equal. In bitcoin, all nodes are equally powered. I don’t think you need KYC to map a bitcoin address or a blockchain address to an individual. So what happens in that scenario, it will be ruled by reputation and it the account might get blocked because of reputation. If you are doing something nefarious, forensic accounting will catch up and your bitcoins will be disabled and you lose real money then and there itself,” he said.
However, it seems there is still interest from the Reserve Bank of India Pransato Roy from NASSCOM said. “We’ve sat in a committee with the RBI listening to a whole presentation on cryptocurrency from e-currency.net which is also doing Estonia’s digital currency. The central regulator’s concern is that there will be a shadow economy. The principles are similar but it is not a distributed database, the same technologies are used and it is controlled at the central level. The generation of that currency is controlled at the central level.”
On differential privacy
Anonymization of user data has become a lost cause and to limit the private user data being analyzed, there is a lot of talk on enacting differential privacy on large datasets. “Differential privacy was a response where a lot of people have shown that anonymization of data has become impossible in most situations. When we’re talking about blockchain when we’re saying that most of this data is available publically or privately to certain individuals, what differential privacy was meant to do was to take aggregated data and use that for analysis but try and minimize the privacy risks while doing that,” Smitha Prasad, from the Centre for Communication Governance at National Law University.
“If your data is available publically, then that’s a different situation. Differential privacy does not really help there. Where it does help is when there is a data set or study, generally that there is a large amount of data available. And there needs to be an analysis done on that. What differential privacy does is when you normally analyze the data set that is available and give an output on that, they add some random noise and add certain amount of data to make sure that you cannot trace it back. You do this so that you cannot identify the persons in the data set,” she said.
“To give you a very simple example, say you have a data set which consists of the names of students who have enrolled in a certain college and their household incomes. Now if someone publishes that data – say you have 200 students have an X amount of income. There are 199 students with that amount of income. If you know which student has dropped out, you know how much the student earns. So the idea of differential privacy is that you inject some amount of randomness which would say 205 or 190. So it’s as close to the truth or as accurate as possible. If you’re doing statistical analysis it shouldn’t impact the result. But it’s not the actual number so that you can’t identify who the person was,” Prasad added.
Laws and standards around differntial privacy
“So if you look at the way our data protection framework works right now, it’s mostly based on the nature of data that’s being shared. So you have personally identifiable information or sensitive identifiable information. Differential privacy provides a system where you are taking that kind of data and only giving the result of an analysis which does not contain that would be personally identifiable. The problem with differential privacy is that it is not a fool-proof system. Our law does not deal with a system like this. It doesn’t anticipate a something like this. It doesn’t even anticipate data analysis the way it is being done right now. What you could look at is some of the standards which are there in other countries where they deal with specifically on how you deal with data for research. So say, for medical research, you have different standards whether you can publish the data or not, whether you can publish de-identified data or not. So there are some standards but those will need to be developed to deal with a system where you’re conducting data analysis on such a large scale,” Prasad added.
National health and credit registry
Note that the government is looking to build public registries for National Health and credit histories. The government is looking at using the registry to identify and analyze the pain points in the system to deliver better services there. But along with Aadhaar, the amount of large scale data gathering does give pause for concern. The participants discussed if differntial privacy could be applied in this case.
“The issue with differential privacy I would say is that you would need to trust the person who is holding the data and controlling the data. So, say you have private companies which are holding on to this aggregated data. Maybe each hospital have their own patients and what you could use differential privacy for would be, if the State is looking for analysis of the prevalence of diseases, perhaps they could use differential privacy saying that they will not take the data from the hospital but just take the results of these analysis that they can perform on the data already holds. The problem is that the control lies with the hospital and whether or not you’re giving the data to be used as part of this analysis. Beyond that, it is the person who holds the data who has the control,” Prasad explained.
The #NAMAprivacy conference was supported by Google, Facebook and Microsoft. To support/sponsor #NAMAprivacy discussions, contact email@example.com