Back in 2014, industry heads assembled under one roof at the #NAMAIndic event. They discussed in detail the nature of the ecosystem for digital media in Indian languages and the underlying issues it faces. These issues can be broadly categorized under incompatibility across devices, fonts and typography, and lack of standardization.
Two years and multiple meetings later, the Government of India initiated a directive for all handheld devices to support all 22 official Indian languages. This was to be effective from July 1, 2017, a date that was further shifted to Feb 1, 2018, since the specific guidelines for the mandate needed some more time to be delineated. This mandate ensures that any Indian who is buying a phone or a similar digital device must be able to receive content, messages, or notifications in their own language, and communicate in it as well. This means, extension, that every literate Indian should be able to read or type in their own language, using the letters as they learned at school. These sets of letters for each language are termed character sets.
A supposedly conclusive meeting held by the Bureau of Indian Standards (BIS) regarding character sets was held on 12th October 2017. However, when the directive was released, the BIS prodded industry players to implement a standard which included characters that no native literate had ever studied or learnt, and this was for all the 22 official Indian languages. Moreover, the document did not specify how to implement the behaviour patterns for characters.
So why does it matter that character sets and character behaviour be defined properly?
Non-digitally, with slate and chalk, or even pen and paper, learning the basics of one’s ABCs or ka kha ga requires muscle memory in addition to just remembering A for Apple or ka se kamal. While the Latin script based languages like English, are linear i.e. written and typed one after another from left to right, Indian scripts aren’t. Characters interact with each other in multiple ways.
Learners memorize character sets but learn to draw their shapes and repeat them orally to remember how each character looks and sounds. They also learn about how characters behave with other characters, including how characters change their original shape and join others, or get replaced by a different looking character. This is an important part of language learning because the person also learns phonetics and script grammar which aids in forming words and sentences. While typing of Indian languages takes away the muscle memory of shapes, it doesn’t take away the language’s nuances like the distinct sound of each character, and their behaviour. This means that, when languages are implemented on a digital platform, text should not present users with characters or character behaviour they are unfamiliar with.
While most Indians are polyglots, only 10% of the country is English speaking. Although In English, the alphabet is limited to 26 characters, including vowels and consonants, Indian languages come with additional intricacies in the form of vowel signs or matra(s), diacritics, and an inherent tendency to combine with other characters. Moreover, the properties of vowel signs make Indian scripts more complex than English. These rules are varied, and follow different logical patterns. Vowel signs when attached to a character do not always follow the left to right rule. Some of them go behind the character, some go above it and some below. This increases the complexity of providing Indian languages with a digital presence, as the underlying system needs to be constantly aware of cursor positions and how the resultant syllable will look like if a delete or backspace key is pressed. This awareness is still incomplete on devices.
The problem doesn’t end there. Let’s assume for a minute that we’ve built the Indian internet, but on this flawed base. Even if content and services were easily available, basic computing like searching and sorting, which work so seamlessly with English, would break when used in Indian languages. It all boils down to the characters we ought to use, using them correctly, and eliminating the ones which are absolutely unnecessary from our character sets.
The mandating body – the BIS – has taken into consideration the views and opinions of the industry, has weighed the pros and cons of recommendations made by the industry experts and yet defined character sets with seldom used or unknown characters under ‘optional’ tag. This is as good as retaining the problems raised by the industry in 2014.
This option to implement the optional characters not only falls flat in the name of standardisation but also confuses the phone manufacturers when it comes to which characters to use for a language. The Unicode consortium, which assigns unique numbers to characters for their interoperability with devices, has also assigned numbers to the optional characters. As a result, display or input of such characters won’t be an issue as long as they are implemented uniformly.
Let’s say one device manufacturer chooses to implement optional characters while others don’t. A text message sent from the former’s phone won’t render on the latter’s. The receiver won’t be able to read the message part where the optional characters are used, and will see boxes instead. Now let’s assume that all mobile manufacturers choose to implement optional characters as well. This ends up confusing users as they would either see similar looking extraneous characters or ones that no one knows about, hence no one uses.
The original intention of the mandate, I repeat, to allow any Indian buying a phone or such device to be able to receive an intimation, message or notification in their own language or share communications in said language, goes for a toss.
For an Indian user, mobile phones are the cheapest source of going digital. Variety in phone specifications and price ranges offer a decent catalogue to choose a device as per one’s need. It then becomes important for the Digital India initiative to not ignore the native language adopters of digital platforms. According to a report, by 2021, 536 million internet users will be Indian language users as opposed to 199 million English users. Clearly, these 500 million plus users will fuel the internet with user generated content via least resistive channels, read social media and chat apps, before availing other internet based services. It then becomes a matter of utmost importance that the upcoming users be encouraged and engaged through correct display, correct input, and not discouraged by creating another barrier of being unable to comfortably read and input content in their own language.
Vivekanand Pani is the Co-founder and CTO of Reverie Language Technologies Pvt Ltd, and is the key driver behind Reverie’s language as a service platform and its consumer apps initiatives, and leads his teams towards the goal of delivering unparalleled language experience across devices.