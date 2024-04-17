Artificial Intelligence (AI) company XAI recently unveiled the preview of Grok 1.5V, their first generation multi-modal model. Multimodal AI is a type of artificial intelligence (AI) that can process, understand and/or generate outputs for more than one type of data. In a blog post published on April 12, XAI revealed that this model is capable of processing a wide variety of visual information including documents, diagrams, charts, screenshots and photographs, alongside its existing text capabilities. The blog post claims that the model is competitive with existing frontier multimodal models and even outperforms others in real world spatial analysis. The blog post contained several examples of Grok 1.5V processing and interpreting various images such as nutrition labels, memes and a child’s drawing. XAI has also introduced a new benchmark for judging a model’s performance in real world spatial analysis, referred to as RealWorldQA. This benchmark involves testing the model with real world images and asking it questions such as “Which object is bigger?” or “Is there enough space to drive around the grey car in front of us?” Grok performed significantly better than its competitors on this metric. RealWorldQA currently consists of over 700 images, with a question and easily verifiable answer for each image. The dataset consists of anonymized images taken from vehicles, in addition to other real-world images, and has been released to the public. Last month, Elon Musk had announced that he would be open-sourcing parts of his model Grok-1 and releasing the Grok chatbot to all premium…
XAI unveils preview of Grok 1.5V, their new multimodal model
