In January, the Election Commission ordered all states’ chief electoral offers to put voter rolls listed on ECI’s websites behind a captcha, ETTech reports. The January order also said that voter rolls must be displayed as an image as opposed to text. These changes were made to make it harder for people to access voters’ personal information, and to frustrate attempts at crawling this information automatically. “It has been decided that electoral rolls should be published on (the) website in image PDF only. If presently-available PDF electoral rolls are not image PDF, then the same shall be done immediately,” ET quoted the circular as saying.
Om Prakash Rawat, India’s Chief Election Commissioner, said that these steps were taken in the wake of Facebook’s Cambridge Analytica controversy, where many users’ data was shared without explicit consent and used for election ad targeting. Rawat told ETTech that the combination of no longer storing this information in machine-readable text, and putting it behind a captcha, would “protect voters’ data from data harvesting and data manipulation”.
Captchas prevent people from accessing webpages that are not supposed to be accessed by automated bots. For example, a website may want to protect against automated user logins by generating a random unique code every single time that a human has to type in. Image PDFs prevent automated extraction and processing of data directly.
MediaNama’s take: will this work?
When MediaNama tried to access the voter rolls in Delhi, the captcha was a static set of six numbers which did not vary in presentation at all — it is possible to use OCR software to read every single captcha and automatically pull data from the EC website. Not that this will be necessary since the PDF files on the Delhi EC’s server are not protected — which means that once you access a voter roll file, you can share that URL with someone else and they will be able to access the PDF without encountering a captcha. What’s more, the PDF files are named sequentially, which means that you can just change the number of the roll on the webpage’s URL, and you’ll get a whole new voter roll.
As for converting all the rolls into an image format, OCR technology has existed since the 20th century to convert printed text into a machine-readable format with highly precise results.
This bare-minimum form of securing personal data on voters is fundamentally ineffective, to begin with, as it just makes the process a little bit harder overall. But as we saw with the Delhi voter rolls, even this bare-minimum technology to secure voter data has not been implemented effectively. Rather than attempting to restrict automated analysis of voter rolls in the public domain, the EC is better off questioning the presence of such lists on the internet in the first place. Especially when electoral calculations are involved, putting up entire voter rolls on the internet — instead of just letting users verify their presence online — can have multiple undesired consequences.