modakModak Analytics, a Hyderabad-based web analytics company, claims to have created a “big data based Electoral data repository” after scraping information of 81.4 crore voters from Election Commission website. The company now plans to analyze this data to help parties or candidates “raise funds, design a tailored communication to target a select few voters, rework advertisements and create detailed models for voter engagement in battleground states as well as in gender and voter clusters to increase the power of micro-targeted strategy,” the company said in a statement to the Economic Times.

A sample of the information is on its website (pdf ). For a constituency, Modak Analytics was able provide a caste-based split, the number or percentages of Muslim voters in a constituency, show a break-up in terms of age, and list constituencies with the most celebrities. The company claims it used in-house automation technologies to scan through nine lakh PDFs, with 2.5 crore pages get details of all the others. Its biggest challenge? The extraction and transliteration of the information, so that it could be merged with other systems.

“Data from multiple sources like Census, Economic and Social surveys were mapped to polling booths. Simultaneously, external and propriety data sources had to be fused with individual voters’ data. Because of this complex nature, no big IT company ever ventured into this”, Aarti Joshi, EVP and co-founder of Modak Analytics, told ET.

Why is this a problem

That idea of using election data for marketing is not entirely new and has been suggested in the past by Netcore founder Rajesh Jain. It is also worth noting that Cobrapost had touched upon the issue of micro-targeting when they released Operation Blue Virus last year.

Modak can now use this data to sort out the population on the basis of caste, religion, age, gender, among other demographic information. While micro-targeting sounds good on paper from a marketing perspective, we need to remember that they have this information without the consent of the voter. Does Modak have the right to use the information scraped from the EC website to offer such services to political parties? What is the guarantee that political parties will micro-target audiences using the data only for good purposes?

From a privacy perspective, who has the rights to the data: Election Commission or the individual? Shouldn’t the Election Commission have looked at privacy issues before making this data so freely available online?

What’s even more shocking is that Modak claims this information is in public domain and it’s not clear if Election Commission ever wanted  private companies to use data of all Indian citizens in such micro-targeting campaigns. Then there is also he risk of such information being added to Aadhaar or National Population Register (NPR). While Aadhaar might have had several set backs there is a chance that BJP will enforce NPR, which has most of the privacy issues UIDAI’s project had.

UIDAI had said that it would only share the information that is pertinent when businesses use Aadhaar to authenticate, but with such scraped data floating around, what is the guarantee that businesses won’t link the two? If something like that happens, there is nothing you can do, since India does not have a privacy law yet.

How did they do it

Election Commission had set up a tool to search for your name or voter id and find out the voting booth assigned to you. Turns out, there was a way to use this tool to get voter rolls for every state and union territory of India. A 17-year-old developer Raghav Sood had pointed out these issues a while back on Medium. He had also written about how he managed to write a script to scrape this data from the Election Commission’s website.

From the look of it, Modak Analytics also did the same thing, except they are now offering the data to companies, politicians and parties that want to use this information.

Who is to blame: Election Commission or Modak?

This is a question that needs to be answered now. Was it responsible of Election Commission to put up all the voter data online in a format that could be exploited by a bot? There is not even a captcha in place to stop such an activity. There is no process in the backend monitoring scraping either and these are things EC should have put in place before putting out all this information in the open.

Can we blame Modak for scraping all this data and offering services around it? Of course, but did Election Commission give the impression that the data is actually open source by not putting any security hurdles in place? How long do we have to wait before the EC decided to fix this issue or before it makes a public announcement against the use of its data for marketing purposes?