wordpress blog stats
Connect with us

Hi, what are you looking for?

Microsoft releases speech datasets for three Indian languages

Microsoft has opened up speech data in Gujarati, Telugu and Tamil to allow academics and researchers to use this data for building speech-recognition system for Indian languages. The corpus will be open for speech training and to test data under its open data initiative, which is aimed at advancing developments in natural language processing, computer vision and so on.

Microsoft is working on real-time translations for Hindi, Bengali, Tamil and Telugu; the company is integrating AI and Deep Neural Networks to improve real-time translation and enable users to more easily access the internet and Microsoft’s services in Indian languages.

Microsoft also announced support for email in Indian languages, including, Hindi, Bodo, Dogri, Konkani, Maithili, Marathi, Nepali, Sindhi, Bengali, Gujarati, Manipuri, Punjabi, Tamil, Telugu, and Urdu.

  • Three days ago, Amazon India launched its Hindi website and app, and indicated that it is keen on expanding into other regional languages such as Bengali and Tamil after a year. An internal team within Amazon called ‘Reach’ is working on how to tap the next million users of the internet, per Livemint.
  • No other e-commerce platform has its services available in a Indian language as yet. However, it is worth noting that In August, Walmart-owned Flipkart acquired speech-recognition startup Liv.ai, which focuses on Indian languages; Hindi, Bengali, Punjabi, Marathi, Gujarati, Kannada, Tamil and Telugu.

Written By

I cover health, policy issues such as intermediary liability, data governance, internet shutdowns, and more. Hit me up for tips.

MediaNama’s mission is to help build a digital ecosystem which is open, fair, global and competitive.



Due to the scale of regulatory and technical challenges, transparency reporting under the IT Rules has gotten off to a rocky start.


Here are possible reasons why Indians are not generating significant IAP revenues despite our download share crossing 30%.


This article addresses the legal and practical ambiguities in understanding the complex crypto ecosystem in India.


It is widely argued that the PDP Bill report seeks to discard the intermediary status of social media platforms but that may not be...


Looking at the definition of health data, it is difficult to verify whether health IDs are covered by the Bill.

You May Also Like


Google has released a Google Travel Trends Report which states that branded budget hotel search queries grew 179% year over year (YOY) in India, in...


135 job openings in over 60 companies are listed at our free Digital and Mobile Job Board: If you’re looking for a job, or...


Rajesh Kumar* doesn’t have many enemies in life. But, Uber, for which he drives a cab everyday, is starting to look like one, he...


By Aroon Deep and Aditya Chunduru You’re reading it here first: Twitter has complied with government requests to censor 52 tweets that mostly criticised...

MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ

Subscribe to our daily newsletter
Your email address:*
Please enter all required fields Click to hide
Correct invalid entries Click to hide

© 2008-2021 Mixed Bag Media Pvt. Ltd. Developed By PixelVJ