How’s this for timing… a significant part of the discussion at the Technology, Transparency and Accountability Camp yesterday, organized by Accountability Initiative, was around making data public, and the challenges faced with both acquiring and collating data; and today, the Economic Times reports that Data.gov.in is being launched, once the National Data Sharing and Accessibility policy is approved by India’s Cabinet of Ministers. This is in line with what US CIO Vivek Kundra has pushed through in the US with Data.gov. Both Kundra and US CTO Aneesh Chopra were in India in March this year, as well as Tim Berners-Lee, and from what I’ve heard, they met with Indian government representatives – those working with the UID project, as well as Sam Pitroda – on Open Data initiatives. We’d made a case for Open Data earlier.

Benefits: The Powercuts.in Example

The Centre for Internet and Society has outlined benefits and challenges in its report on Open Government Data , but I’ll illustrate the opportunity/benefits with example – one that I’ve been personally involved with (as a volunteer):

A month ago, taking cue from an innocuous twitter discussion on power cuts between Shefaly YogendraNetra Parikh and me, Ajay Kumar set up PowerCuts.in (within 28 minutes of the first tweet, incidentally), using the Ushahidi platform. The Ushahidi platform has been used previously for Vote Report India. A crowdsourced document with suggestions and objectives with suggestions was soon set up, and using twitter, reports of power cuts have been coming in via twitter over the past month or so.
The thing that surprises me about these reports, is that the southern states in India also have power cuts; there are more reports of power cuts in Southern India than north, because more people are reporting from there, but staying in Delhi, I had assumed that it was primarily a north India phenomenon, where power theft may be high. For me, this is about creating awareness, and busting some myths.

To What End, and How Can Data.Gov.in or RTI Help?

The question that people have been asking of PowerCuts.in consistently is – what are you going to do with this information? For me, it’s primarily about highlighting the power cut problem in India and busting some myths; In India, we’ve almost taken power cuts for granted, and I think we should stop using words like ‘Load Shedding’ and ‘Planned Cuts’, because it almost makes power cuts acceptable. Now if we had government data online, where’s how PowerCuts.in (or journalists) could use it:

1. Transformers and Wiring: Get data on age, replacement and upkeep of transformers across cities and towns across India. I’ve noticed that in North Delhi (where I stay), power cuts have come down considerably after North Delhi Power Ltd (NDPL) took over many years ago. Once they took over, they changed wiring across the area, changed transformers, and there has been very little cause for complaint. Why just journalists or PowerCuts.in – as citizens, you should be able to query that information.
2. Calculate cost of backup power: Given the state of power cuts in India, we all have backup power and generators:
– Cost of Imports: Query the Customs and Excise departments on import/manufacture of inverters and generators, to determine how much these imports cost the country each year.
– Running Cost: Determine the efficiency of specific inverter and generator sets, and estimate the additional running cost of inverters. Remember that the efficiency of running inverters and generators will be much lower than that of power mains – generators include additional fuel cost, while inverters draw extra load from the mains. If we don’t draw that extra power from the mains, then there would be more power for distribution to other areas.

The question I’d like answered: can we dispense with all the inverters in the country, and still maintain the status quo of availability of electricity? If we have the data, then someone can use that to answer this question; I’m sure there are many many more: For example, some might be interested in information spent on money spent so far on cleaning up the rivers Ganga and Yamuna (related TEDxDelhi video; disclosure: I was a co-organizer at TEDxDelhi).

If you give access to useful data, you can never tell what good can come of it. This data, incidentally, can still be acquired via the Right To Information Act, but that takes time, and often – as we’ve experienced at MediaNama – departments are not particularly willing to share data. That brings us to challenges.

Challenges

1. Digitization: Availability of data in a digitized format is a key issue: government has records on just about every transaction it does, and almost all of this is paperwork.

2. Willingness to share information: There appears to also be a reticence towards providing information that is already digitized, and we find that sometimes, if the question isn’t worked the right way, the answers are vague.

3. Copyright of information: As you may be aware, we received a takedown notice from a government organization, claiming copyright of data, on the basis of which we had published some reports. The data was publicly accessible, but government departments claim copyright to their information, which means that we cannot release the data. I’m not sure if the raw data we receive from various government departments via RTI can be shared, so we use them for analysis at MediaNama Charts. Our opinion is that the Indian government should not claim copyright over data. Internationally, governments have adopted open data licenses.

4. Government Looking To Monetize Information: At the Technology, Transparency and Accountability camp yesterday, during a discussion, Chakshu Roy of Parliamentary Research Initiative mentioned that the Census data is available online, but they’re charging around Rs. 50 for access to each table: his broad estimate was that if you take all the tables online, then the cost of accessing that information online might work out to Rs 11 crore. You have to ask this question – why isn’t is free to use? For example, we could use the census data and compare it with mobile subscriber base in cities, as well as with availability of spectrum per citizen.

The CIS India report mentions other issues:

4. Reliability of Information: if there are inaccuracies in collecting information, then the information shared by the government will be inaccurate as well.

5. Sematic Interoperability: Different governance units might have been used in measuring related data, and this isn’t just an issue with the government. Deepak Shenoy mentioned yesterday that on the National Stock Exchange, companies report dividend in multiple formats: some as ‘div of 40%’, others as ‘dividend of 40%’, ‘div Rs 5 per share’, ‘dividend-Rs.5 Pr Share’. Parsing all of this information is next to impossible. The CIS India report states: “Or the same term might be used with different meanings in different departments’ reports (or even the same department’s in different points of time). Thus making sense of the data becomes difficult if not impossible.”

Many More: CIS India also mentions issues related to “lack of transparency and interdepartmental coordination in data collection, Lack of good internal recordkeeping practices, Lack of interconnections between data sets by different departments, and crossverification, Lack of interoperability between the different formats in which data is published, Bottlenecks in web publishing, especially due to not using content management systems and centralizing web publishing authority within a department. This results in delays and often in nonpublication of information. Thus, one department will often not get to find out what other departments are doing, leading to different departments working in silos.” More here.

So, we’ll probably get to know in a months time how the data is being released, and in what format. Hope we get something to use for MediaNama Charts.