“Parity is the word”, said Venkatesh Hariharan, at #NAMA: The Digital Future of Indic Languages, explaining that “If I use a device in English, we should make it as easy for anyone to use it in Hindi, Marathi, Gujarati. To arrive at that word, what are the things that need to be done at the back end? Some of the fundamental issues such as fonts are going away. Thankfully, one the good things is that Google has released some exceptionally high quality fonts in the public domain.”
Hariharan mentioned that the Indian government has invested Rs 50-60 crore in creating language resources – fonts, dictionaries, thesaurus, and OCR tools and technologies, and there is no real way of monetizing this. It needs to be open sourced, because they are not accessible to people outside government and academic institutions.
“A lot of those investments will become redundant very soon. If you look at the license of the fonts, you can download it for your use. The challenge is how many people download and install it? So far, computing in Indian languages is a niche product. There are only two categories of people who do this: the writers and publishers. The other is the hardcore maniac. There is a category of people who are early adopters, and they will install the fonts etc. Once you cross that group, there is a huge chasm, which it will take a huge effort to cross. Late adopters are not interested in fiddling around. It just has to work.”
There is a surge of hope, because of the BJP’s focus on the e-Bhasha project, which Hariharan believes is being given a mission mode status.
How to measure the success of government investment in Indic
Hariharan believes that “it is time to move from investing in technology to making sure it impacts the lives of millions. The government has done a great job of investing in fundamental technologies in translation and OCR, but they need to measure its success by how many people use it on a daily basis. We have to count it in millions of people. The need for this is going to become increasingly acute. The govt of India has published close to 9000 datasets, but no surprise that all of them are in English. People who know Indian languages will soon be second or third class citizens of the digital world. Is that the kind of future we want to build?”
Summit Information Systems MD Rakesh Kapoor said he wants the government to look into five things:
1. Government should have a policy that they will not buy a device that is not localised in all Indian languages. They may not be able to enforce a policy of mandating that all devices should support Indic, but they themselves should not purchase any device that is not localised.
2. All government tenders should be published in Indic languages
3. All the e-governance should be done using a 3 language formula: state language, Hindi and English.
4. Government funded technology should be made open source
5. Any law that is not published in the language that I understand should not be applicable to me. If I dont understand the law, how can I abide by it?
He added that in most of the places where translation work has progressed, it has been done through proceedings of Parliament. “If the same thing can be done for Indian government Parliament proceedings, in parallel languages.
The Digital Library of India has a lot of scanned info, and it that can be made available outside.” Kapoor also called for the government, which has legacy data which it wants to convert to indexable, searchable data. “They haven’t found good quality solutions for that. We are work with 4-5 governments to convert their legacy data into digital formats.”
Sriram Hebbar, CEO of OneIndia, called for Indic publishers to get advertising support from the government via DAVP, saying that “Today, DAVP says you need 5 million pageviews to be empaneled. That needs to be reduced for Indic languages, so that those sites get a boost. Government should say that 70% of DAVP (online) advertisng should be on Indic languages for the next 3 years, and 30% to English. State DAVP’s should allocate 90% of spends on local language sites. Today we get zero from the state government.” Oneindia is empaneled with DAVP.
Arvind Pani, Founder of Reverie Technologies, believes that Indic has been approached in far too fragmented a manner, and tools and technologies are still being derived from English, or the approach to English. “Swype will not necessarily work with an Indian language keyboard. Indian characters come in different shapes and sizes.” That needs to change.
Manorama Online COO Mariam Mathew pointed out that the Kerala government insists on having their content in Malayalam, while the Karnataka government isn’t doing that. That’s one place to start.
One key wish for Google: it has more access to data on Indic than any publisher, or Comscore. Indic publishers want Google to release an Indic Languages report.
#NAMA Indic: The Digital Future of Indic Languages, was supported by Google India