With the hope of building a universal language translator, Meta on August 22 released a new AI model called SeamlessM4T that can perform translation and transcription services in dozens of languages. The model currently supports: Speech recognition for nearly 100 languages. Speech-to-text translation for nearly 100 input and output languages. Speech-to-speech translation, supporting nearly 100 input languages and 36 output languages. Text-to-text translation for nearly 100 languages. Text-to-speech translation, supporting nearly 100 input languages and 36 output languages. You can try out a demo of the AI model here or get an idea of how it works from the video below: [video width="1920" height="1080" mp4="https://www.medianama.com/wp-content/uploads/2023/08/01_Seamless-Multitasking-video.mp4"][/video] The model is available publicly under a research license for researchers and developers. It cannot be used for commercial reasons under this license. Meta also released the metadata of SeamlessAlign, a multimodal translation dataset totalling 270,000 hours of mined speech and text alignments. For more on the technical details of the model, check out the post by Meta here. Why does this matter: Translation and transcription tech that are currently available publicly are either limited in features (such as only allowing text-to-text, speech-to-text, etc.) or limited in the number of languages they support. SeamlessM4T is multimodal in that it supports various transcription and translation services (as listed above) in a single model and also supports a large number of languages already. Separately, as pointed out by Meta in its announcement, "The world we live in has never been more interconnected, giving people access to…
