Galileo, a prominent developer of generative AI for enterprise applications, has recently unveiled its latest Hallucination Index.
The assessment framework, which centers on Retrieval Augmented Generation (RAG), evaluated 22 notable Gen AI LLMs from key players like OpenAI, Anthropic, Google, and Meta. This year’s index saw significant growth, incorporating 11 new models to mirror the rapid expansion of both open- and closed-source LLMs over the past eight months.
Vikram Chatterji, the CEO and Co-founder of Galileo, expressed, “In the swiftly evolving AI landscape today, developers and enterprises encounter a crucial challenge: how to leverage the capabilities of generative AI while balancing cost, accuracy, and reliability. Existing benchmarks often focus on academic use-cases rather than real-world applications.”
The index utilized Galileo’s exclusive evaluation metric, context adherence, to identify output inaccuracies across various input lengths, ranging from 1,000 to 100,000 tokens. This method aims to assist enterprises in making informed decisions regarding balancing price and performance in their AI implementations.
Key highlights from the index include:
- Anthropic’s Claude 3.5 Sonnet emerged as the top-performing model overall, consistently achieving near-perfect scores across short, medium, and long context scenarios.
- Google’s Gemini 1.5 Flash was rated as the most cost-effective model, delivering strong performance across all tasks.
- Alibaba’s Qwen2-72B-Instruct stood out as the leading open-source model, particularly excelling in short and medium context scenarios.
The index also shed light on several trends in the LLM landscape:
- Open-source models are rapidly narrowing the gap with closed-source counterparts, offering enhanced hallucination performance at lower costs.
- Current RAG LLMs exhibit significant enhancements in managing extended context lengths without compromising quality or accuracy.
- Smaller models sometimes outperform larger ones, emphasizing that efficient design can be more crucial than scale.
- The rise of strong performers from outside the US, like Mistral’s Mistral-large and Alibaba’s qwen2-72b-instruct, indicates a growing global competition in LLM development.
While closed-source models such as Claude 3.5 Sonnet and Gemini 1.5 Flash retain their lead due to proprietary training data, the index reveals a rapidly evolving landscape. Google’s performance was particularly notable, with its open-source Gemma-7b model underperforming while its closed-source Gemini 1.5 Flash consistently ranked near the top.
As the AI industry continues to address hallucinations as a significant hurdle to production-ready Gen AI products, Galileo’s Hallucination Index offers valuable insights for enterprises seeking to adopt the right model for their specific requirements and budget constraints.
See also: Senators probe OpenAI on safety and employment practices

Want to delve deeper into AI and big data insights from industry leaders? Explore AI & Big Data Expo happening in Amsterdam, California, and London. This comprehensive event is co-located with other prominent events like Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover more upcoming enterprise technology events and webinars powered by TechForge here.
The post Anthropic to Google: Who’s winning against AI hallucinations? appeared first on AI News.