Elon Musk launches Grok 1.5 Vision: What is it and can it compete with GPT-4, Gemini 1.5 Pro

Elon Musk's AI company, xAI, has introduced an upgraded version of its Grok 1.5 model, known as Grok 1.5 Vision. This enhanced model now incorporates computer vision capabilities, enabling it to process visual content and respond to related queries.

By: HT TECH
| Updated on: Apr 16 2024, 08:27 IST

Elon Musk launches Grok 1.5 Vision: What is it and can it compete with GPT-4, Gemini 1.5 Pro — Elon Musk's xAI unveils Grok 1.5 Vision, a new AI model with integrated computer vision features, competing directly with GPT-4 Vision and Gemini 1.5 Pro. (Bloomberg)

Elon Musk's AI venture, xAI, has recently introduced an upgraded version of its Grok 1.5 model – the Grok 1.5 Vision. This new model integrates computer vision capabilities, allowing it to interpret visual content and respond to questions about images. This development comes shortly after OpenAI presented its GPT-4 model, which also boasts computer vision features.

xAI announced this upgrade through their official X account (formerly Twitter), sharing insights into the model's capabilities via a blog post. While the core features of Grok 1.5 remain consistent with this updated version, the added vision capabilities promise to open new horizons in AI interaction with the real world.

https://t.co/A12vgTpnTb
— xAI (@xai) March 29, 2024

Also read: Apple to give a major AI boost with iOS 18 update: Check what AI features your iPhone may get

Benchmark Scores and Performance

Benchmark tests were conducted by xAI, showcasing Grok 1.5 Vision's performance against various metrics, including the company's proprietary RealWorldQA benchmark. This benchmark evaluates the model's "real-world spatial understanding." Additionally, the model was assessed in other tests like MMMU and ChartQA. Impressively, in RealWorldQA, Grok surpassed OpenAI's GPT-4 with Vision and Google's Gemini 1.5 Pro, although it lagged behind in other tests.

Also read: OpenAI announces new Tokyo office, hires former Amazon staffer to spearhead AI push

Understanding Computer Vision

Computer vision is an exciting field in computer science focused on enabling computers, including AI models, to recognize and interpret real-world objects through images and videos. Essentially, it aims to empower machines with human-like vision capabilities.

Several leading tech companies are investing heavily in developing vision-centric AI models. Google's Gemini 1.5 Pro and OpenAI's GPT-4 with Vision are notable competitors in this space.

The potential applications for computer vision are vast and transformative. For instance, Healthify, an Indian platform for calorie tracking and nutrition, recently integrated a feature named 'Snap'. Here, users can photograph food items, and the AI suggests healthier recipe modifications and exercise regimens to offset calorie intake. Beyond this, computer vision holds promise for medical diagnostics, autonomous vehicles, and more.

One more thing! We are now on WhatsApp Channels! Follow us there so you never miss any updates from the world of technology. ‎To follow the HT Tech channel on WhatsApp, click here to join now!

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.