Leading up to our #NAMAprivacy conference on The Future of User Data – for which we’re all full up – we’ve put together a list of things to read, to understand the environment around privacy, and where the conversation is headed. MediaNama is hosting this session with support from Facebook, Google and Microsoft.
Frankly, we’ve never had an event fill up this fast, and this is an indication of important and hot this topic is.
Some of these links are based on recommendations from our speakers, and we’ll probably have another reading list tomorrow as more recommendations come in. Some that I’ve written. This is for just the first two sessions, on differential privacy, blockchain, anonymisation, app permissions and Internet company databases.
If you’d like to suggest things that we ought to look at, please feel free to share links in the comments, or to me on twitter (I’m @nixxin on Twitter) or firstname.lastname@example.org.
For Session 1: Approaches to personal, identification, behavioral and anonymous data
- On differential privacy: Apple tries to find a balance between collecting data about groups – what they do, like, want, to create better software, but not collect and store data in a manner that anyone can extract data on a single, specific users. Read more in Wired. The book that the article refers to is about the Algorithmic Foundations of Differential Privacy [pdf]
- Is anonymisation even possible? “[in 2006] Netflix, as part of a public contest to devise a better movie-recommendation algorithm, released a data set of 100 million movie ratings made by 480,000 of its customers. The online DVD provider anonymized the data before releasing it to contestants, by replacing names with random unique identifying numbers to protect the privacy of its customers. But Narayanan and Vitaly Shmatikov were able to unmask some Netflix users simply by taking the anonymized movie ratings – along with timestamps showing when customers submitted them – and comparing them against non-anonymized movie ratings posted at the Internet Movie Database web site.” Read more in Wired.
- The failure of anonymisation: “regulators can protect privacy in the face of easy reidentification only at great cost. Because the utility and privacy of data are intrinsically connected, no regulation can increase data privacy without also decreasing data utility. No useful database can ever be perfectly anonymous, and as the utility of data increases, the privacy decreases. Thus, easy, cheap, powerful reidentification will cause significant harm that is difficult to avoid. [Read: The broken promise of privacy, PDF]
- On distributed ledgers: “Bitcoin pseudonymity allowed both rapid adoption (by avoiding dependencies on non-existent or fragmented identity infrastructures), and also preserves important aspects of Bitcoin as a currency (ie its status as an unconditional store of value). This pseudonymous relation between users and wallets is, however, not full or perfect anonymity. Chains of transactions in and out of wallets, and from wallet to wallet, are visible to all, and can be traced and tracked in public.”…”This approach could be used to enforce some Know Your Customer rules, because once a particular wallet address is identified and linked with a physical person, it is possible to uncover all of their transactions”…” Most jurisdictions do not have any strong way of linking realworld identities to online transactions, and thus a reliance on the existence of such mechanism would have prevented the deployment of Bitcoin at the time, and even today. Furthermore, given the international nature of the Bitcoin network, it is unclear which jurisdiction would have been entrusted with certifying identity information, and how one could establish whether a legal jurisdiction is entitled to identify a certain user.” [PDF].
- Blockchain and privacy: Privacy and confidentiality are hard to establish on a public blockchain, because any member of the public can obtain a full copy of the whole transaction history and use it without restriction. Even if parties try to use pseudonyms, the contents of a transaction are publicly visible, and reuse or connection of addresses through transfer of digital currency can provide opportunities for linkage attacks to re-identify participants. Read: Risks and opportunities for systems using blockchains and smart contracts [PDF]
For Session 2: Startups, Apps, Permissions and understanding the need for data
- How companies like Amazon use Big Data to make you love them: In order for interactions to feel individualized and human, they must be well informed. That makes data about the customer you’re talking to right now the most useful data of all. (read)
- On App store permissions and the data Indian apps collect: Wallets (read), Banking apps (read), UPI apps (read).
- Perils in the era of dynamic pricing (and the role of data): “The lending ecosystem is likely to shift to flow based lending”…”the idea that money can be lent based on credit scoring, based on data points such as the persons monthly cash-flow data, and additional data points such as their mobile balance”…”what stops a company from using the same data and patterns for doing something like what the infamous payday loan companies do: target people who are desperate and vulnerable, and use that estimation of consumer surplus to maximise profits. This is predatory, and a problem in the making, and I won’t be surprised if it hits Indian users in a few years.” Read here.
- How Tala Mobile is using phone data to revolutionlise microfinance: “The app gives Tala access to a range of data, from basic biographical information to the number of people loan applicants contact on a daily basis. Tala can see the size of the applicant’s network and support system. The data even reveal where the applicant goes during the day, whether she demonstrates consistency, like making a daily call to her parents, and whether she pays her bills on time. The revelation: a person’s routine habits are more meaningful than traditional credit scoring. Once approved, a borrower can get money downloaded onto her smart phone in two minutes.” [read]
- A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps (2015): “We found that many mobile apps transmitted potentially sensitive user data to third-party domains, especially a user’s current location, email, and name. In general, iOS apps were less likely to share sensitive data of nearly every type with third-party domains than were Android apps, except for location data”…”…we saw more iOS apps (47%) sending location data to third parties than Android apps (33%)”…”The average Android app sent sensitive data to 3.1 third-party domains, and the average iOS app connected to 2.6 third-party domains. The top domains that received sensitive data from the most apps belonged to Google and Apple” [read]