This January, MediaNama held an open house discussion on the future of Indic languages online, supported by Google. This is Part 4 of our coverage of the discussion. Read Part 1 herePart 2 here, and Part 3 here.

What is a wishlist for getting indic languages online? This is what Vivekanand Pani from Reverie Language Technologies had to say.

[On discoverability of content,] China has been a very good example of not doing small things like having the domain name in Chinese — because that feature has never existed — their Internet has grown phenomenally without using domain names in Chinese. But the entire Internet in China practically runs in Chinese. Most of them don’t even have email addresses — they have worked around everything, from payments to messaging and everything else, it’s pretty well done.

Content creation

I won’t go into what helped China solve all these problems — I wrote a blog about that after visiting. Some of the things we should probably be focusing on is this. Google says that all the content on the Internet – 56% of it – is in English. How did this much content become English? Who put this gun on everyone’s head and said ‘create content in English’? That didn’t happen. All that content was created in English because people voluntarily created it. It is because it was an ecosystem that had a solution [for English].

When the Internet was in English and people were working on many different things, there were different solutions people were building, the fonts were beautiful. Everyone was working on one part of the problem to solve, so the ecosystem was growing, and people kept on creating. So similarly there are creative people for every language. Everybody wants to write poetry or art or a blog; people love to express and create. So facilitating content creation is the most important thing. And focusing on that has remained elusive for us for Indian language.

I agree with the point that speech is definitely one key solution to it. The second is that even if we have auxiliary support for content creation, we cannot forget the basics. The fundamental comparison of English with Indian languages has been one of the biggest blunders. How do we write English? We learn the 26 alphabets, learn spelling, and start writing. But it’s not that easy in Indian languages.

If we want to write in any Indian language, it’s a three step process in learning itself, because that’s the property of our languages’ writing. Our script is like this — we first learn the alphabet, and after that, we learn the barah khadi, and after that we learnt to join and combine the vowel accents with the consonants. After this, we study conjuncts. Only then our writing is known to us. If there’s a three step process in learning how to write, how do you expect that one would end up learning how to type all on his own? That’s the expectation we have of people. Therefore we keep saying ‘it’s difficult’.

Teaching indic in school

The fundamental thing is to start teaching in school. My son who started school three or four years ago — as soon as he crossed pre-K, they were teaching him how to use a computer, they were teaching him MS Word and Paint. What do people do on MS Word? They type. So they taught Hindi too. But they never taught typing in Hindi. So you have assignments in Hindi, you do have writing sessions, why don’t you have a typing session? Why can’t you have a poem or question session being written in Hindi?

At least if the kid learns the fundamentals of typing, he will learn it forever. We don’t have this. We know it is very difficult and still we don’t cross these barriers.

Technical missteps

The third massive deterrent that we had is, I think, that the majority of good quality content got created in the ‘90s, and from the middle of the 2000s till now, the content on the Internet has been practically bad, not really that good — in Indian languages. I’ll give you reasons for that. Because until the ‘90s, publishers were entirely dependent on the digital DTP, and that industry flourished. How did it flourish so much? Because they depended on 8-bit fonts. When font standards moved to Open Type, all the designers who were great at what they did, did not also have the ability to do Open Type tables, because that required programming knowledge.

We have spent a lot of money and time and yet haven’t gotten great quality, because they’re not passionate people who have created this. They’ve created these fonts because they’re paid to. Those things died, because we didn’t have a vision towards backward compatibility. We didn’t want to move the technology so that we can also take along everything which was recreated. The Open Type standard itself was so bad.

We must teach how to type, we must facilitate content creation, we must be able to hear; the government can take some measures to go back on standards and correct certain things and enforce backward compatibility. There’s an enormous amount of great content that’s already been created which is undiscoverable because of this. So to use legacy content there needs to be some incentivisation. Although people have created tools to help legacy discovery and conversion, a lot of that is missing. A degree of incentivisation is also missing.

Interoperability issue is being solved incorrectly

The last point is about the mandate. The most recent mandate with the mobile phones [being in local languages] is that we have mandated something that’s appearing primitive. Who uses SMS? Most people don’t use SMS anymore, they use WhatsApp. If the mandate has to be there, it has to support the will. Therefore it has to be 100% localization plus compatibility. So if you send a message from, or if you look at the same content on an Android phone, a Windows phone, and an iPhone, they’re likely to look different because they are not interoperable.

The interoperability problem is also getting solved incorrectly. Those same phones are not going to be interoperable even with the feature phones where local languages are being mandated.

To explain [why this interoperability problem happened] I’ll have to go slightly technical. We had already solved all the script and language display issues back in the 1970s and in the ’80s we released standards which described how the script display should work in the digital medium. After Open Type was introduced by Microsoft and Adobe, they focused on the glif grammar instead of script grammar. And because of that, if you use your Android phone or Windows device or even Apple, you can link two or three matras with a single consonant. So what happens is that if I have to write औरत (woman), I’ll have four or five different ways of writing it, but the underlying characters will be different. How would you search that? And if you sent that, one system that actually takes care of all of that would show it as औरत correctly, and another would show you something else. The interoperability fails out there. Search fails, sorting fails.

So you end up exactly where you started: you don’t know. You get screwed. We have fundamentally screwed a lot of things for our languages already. On our own. We did it ourselves. Without fixing some of these mistakes, it’s going to be tough.

This is my wishlist.