“…if data is the new oil, (then) the internet is the new oil field, privacy technologies are the new refineries, and AI systems (are) the new engines. And today, unfortunately, we are missing the refinery layer (of privacy tech), and that is what we are intending to build for businesses,” Abilash Soundararajan, Founder & CEO, PrivaSapien, explained while giving a demo of his company’s products at MediaNama’s PrivacyNama conference.
PrivaSapien is a deep-tech privacy research firm that enables businesses to meet regulatory requirements and protect the privacy of their customers while sharing and analysing user data. It has filed five patents and is a B2B firm instead of B2C.
Soundararajan stated that the company deals with regulatory requirements across the globe, such as Europe’s GDPR (General Data Protection Regulation) as it visualises the privacy risk in data sets of businesses and helps mitigate them. He pointed out that businesses require technologies to meet their daily operational requirements but there were very few tools to meet their needs.
“We are building first-of-its-kind privacy threat modelling and privacy risk visualisation in data,” Soundararajan said in his address.
PrivaSapien offers two products— Privacy X-Ray and Event Horizon. Soundararajan said that they try to address concerns such as linkage, re-identification, gap between regulatory requirements and technical requirements with the help of Privacy X-Ray.
The demo was a part of day three of PrivacyNama 2022 which saw companies provide an insight into privacy tech. The day also saw demos by two other companies:
- Disecto: Manav Mahajan, Co-founder and CEO
- Doosra: Aditya Vuchi, Founder & CEO
You can check out these presentations in the link below along with Doosra’s demo.The presentation was edited for purposes of clarity and brevity.
MediaNama is hosting these discussions with support from Mozilla, Meta, Walmart, Amazon, the Centre for Communication Governance at NLU Delhi, Access Now, the Centre for Internet and Society, and the Advertising Standards Council of India.
FREE READ of the day by MediaNama: Click here to sign-up for our free-read of the day newsletter delivered daily before 9 AM in your inbox.
Key takeaways from the demo
How does Privacy X-Ray work?
Let us take the example of a healthcare product. The data is very sensitive by nature, and this data can be useful for planning the supply chain of medicines, accelerating drug discovery, or improving health ecosystems to prevent spread of diseases. But in all these scenarios, data needs to be shared with downstream parties. There is a lot of risk involved and re-identification is possible with contemporary methods of masking tokenization etc. The DPO (Data Protection Officer) or the CISO (Chief Information Security Officer) can run it on Privacy X-Ray and visualise the privacy risk in the data, understand what kind of attacks are possible on the data set in a report. It will also contain reasons as to why this risk is present in the data. Once this is done, the product team or the data controller can decide how to privacy preserve the data.
What is Event Horizon?
It is used for privacy preservation of data. However, there can be other modes of privacy preservation as well. It uses pet technologies, including differential privacy, and generalisation methods to protect the data. The protected data can once again be given to the DPO and he can visualise the risk which will have gone down significantly. They can then say: “this is acceptable for business collaboration for this specific use case and the data starts flowing downstream”. The companies can balance between the privacy risk in data ecosystems and the utility of data significantly, and empower their executives in unlocking data without too many restrictions. It’s a huge value proposition.
How does it work?
You can design the requirements for privacy preservation and anonymisation in a given context first. It runs multiple AI-based models and algorithms and transforms the data based on the requirement of the company set to receive the data. For example, if it’s a marketing agency which wants to understand patterns in the target audience, like age group, segmentation, wants, needs, etc., so it can be understood from the data without it being personally identifiable or re-identifiable information. Event Horizon can be also plugged into data pipelines of organisations thus automating the entire privacy flow within data ecosystems. This is automated so it reduces the time taken for decision-making.
“We are transforming data pipelines, which are privacy blind, and making them privacy aware and privacy conscious in the process,”
What kind of a reduction are we talking about?
Data protection impact assessment is a critical requirement but it takes two to four weeks today because it is being done manually. The product can accelerate the process and reduce it to almost five minutes where it looks into privacy attacks in detail and provides a report. Moreover, cross-border transfer is protected across the globe and this can be protected once the data is privacy preserved, then the data can flow across boundaries.
What is unified privacy risk score?
For example, a hospital has data that it wants to share with a party. They have location-specific health information. They can run Privacy X-Ray, and then a report would be generated after the processing is completed. It gives a unified risk score in the report. We also do a lot of privacy attack simulation.
What if there is no personally identifiable information (PII)?
It does not mean that your data is secure if there is no PII. The AI classifies different kinds of data, quasi identifiers, statistical and numerical data, etc. and provides a heat map of which attributes are leaking maximum risk in different scenarios.
What if the data is not clean as it’s quite often improperly tagged?
The data is taken care of even if there are a lot of blank values in the data set. The algorithm can identify privacy risk even when there is missing information but it is only applicable for structured data.
Is it necessary to upload CSV files directly?
We deploy this system on customer premises. We don’t recommend this to be an external service typically. We integrate our algorithm with the data so that the data may not leave customer premises or their cloud. The algorithm goes and meets the data in the customer ecosystem when the data is made available. We don’t get access to any data.
What are the classifiers used by Privacy X-Ray for identifying and checking privacy risks?
We have our proprietary algorithm where we look at attributes of PII defined by GDPR, quasi identifier statistical data as per Working Party 29. We identify attributes on a spectrum of risk. We build the model, simulate privacy attacks in the ecosystem and provide the risk score based on that spectrum.
Does a low score mean that the identification is harder?
Correct. The risk of re-identification is much higher if the score is high.
Is the anonymisation of data the only privacy-protecting mechanism used? What about the dangers of re-identification?
Anonymisation and re-identification are umbrella terms. They are not specific terms and that’s where some of the anonymization or de-identification methods are vulnerable to re-identification. We use the Expert Determination method requirements of the US as well as GDPR’s anonymization requirements, and we make this available as a configurable option of what level of aggregation is required on what level of privacy preservation. We also use differential privacy and local differential privacy which preserves privacy of the data for downstream collaboration.
What are the factors you take into account for the privacy risk scale?
A lot of things are proprietary but what I can disclose at this point is we simulate multiple attacks. We do this classification of attributes which is cutting edge today because the majority of deployments understand only try to identify (risk) if PII is present in the data. We go much beyond this because regulatory requirements do not stop with PII. We try to capture the risk present in indirect identifiers as well and provide a holistic risk score.
Do you assist organisations in data mapping?
No. We don’t do discovery at this point.
Is your primary key always kept in an identifiable manner to maintain referential integrity?
We don’t do anything, including keeping a private key. We look at the data and see the risk. We don’t create a primary key for users or try to identify people uniquely. We cater to all kinds of data.
Is there a zero score?
No. It’s an interesting question. There is no zero score because information which is shared always carries a risk but it can be very minimal.
This post is released under a CC-BY-SA 4.0 license. Please feel free to republish on your site, with attribution and a link. Adaptation and rewriting, though allowed, should be true to the original.
Also read:
- Doosra seeks to simplify business communication with a layer of privacy #PrivacyNama2022
- How Does Geopolitics Shape Cross-Border Data Flows, And What is India’s Diplomatic Stance? #PrivacyNama2022
- Government says “trust us” with data but must a democracy be expected to trust the government? #PrivacyNama2022
