Hot on the heels of the Wall Street Journal’s “The End of Typing: The Next Billion Mobile Users Will Rely on Video and Voice” article, Google has announced voice search in 8 additional Indian languages: Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu, and Urdu. This voice based dictation and search will be available on Gboard on Android, and in search using the Google App, and more importantly, via Google’s Cloud speech API. They will be launched across other Google apps and products, including the Translate app.

Where this development leads us is the introduction of Google Home in India, which is entirely voice based.

Note that Gboard supports 22 Indian languages, so that’s still 13 languages to go, for enabling voice search/typing.

Why is this important?

Voice search is essentially a combination of two process: the first is converting voice to text. This becomes complex as voices, tonality, phrasing and dialects change across a language. As Google indicated in its blog post:

“To incorporate the new language varieties, we worked with native speakers to collect speech samples, asking them to read common phrases. This process trained our machine learning models to understand the sounds and words of the new languages and to improve their accuracy when exposed to more examples over time. And voice input for each of these language will get better over time, as more and more native speakers are making use of the product.”

This part essentially covers the voice to text bit, but what improves results for Google is understanding context. It’s easier to transliterate words, but translation involves an almost human level understanding of the sequence of words. While Google hasn’t indicated this, but what has really sped up for Google now is translation.

In April this year, Google had announced neural network translation in 9 Indian languages: Hindi, Bengali, Punjabi, Marathi, Gujarati, Tamil, Telugu, Malayalam and Kannada. At that event, Melvin Johnson, Engineer at Google Translate, had said that they’ve used neural machine translation to bridge the gap between phrase based (less efficient) and human (more efficient) translation. “Neural machine translation allows the translation of the entire sentence, instead of on a piecemeal basis.”

Google builds language models, which allows translation from one language to the other, and with neural network processing, they’re able to translate without building one-to-one models. How does Google build these models? “Google has access to the entire web. We look for parallel documents on the web, which are trying to say the same thing in two different languages. For example, the BBC. We break it down and feed in examples of sentences in one language and in another, and it learns to do this mapping.”…“With neural systems, it takes 2-3 weeks to train per model, for hundreds for FPUs. For it to work really well, it needs hundreds of millions of examples for it to work really well.

What does this combination of voice-to-text and translation enable? Contextual voice search across Google’s products and services, all of which integrate the Google Assistant: the Pixel phone, Google Home, along with supported devices like Chromecast. More on Google Assistant and its link with devices here. Give it a few more years, and all Android/linked devices and apps should be able to support usage in Indian languages, both search and commands.