AI Got It Wrong - News
We consulted six AI engines to obtain a news-related statistic covering a 12-month period on the success rate of SpaceX Starship missions from April 2024 to March 2025. The specific question posed was:
"What is the success rate of SpaceX Starship rockets over a 12-month period from April 2024 to March 2025?"
The test was conducted on March 10, 2025. Notably, on March 6, 2025, SpaceX experienced an explosion.
The objective of this test was to evaluate the following capabilities of the AI engines:
- Temporal Awareness: Can the AI engines recognise that the end date in the query extends into the future?
- Retrieval-Augmented Generation (RAG): Are the AI engines capable of supplementing their pre-existing training data with dynamically sourced information? Large language models (LLMs) are trained using datasets, often sourced from online content, but at a certain point, training is frozen. As a result, some AI engines may not have recorded the March 6 event in their databases. Do these AI engines support RAG to retrieve real-time updates?
- Comprehension: Can the AI engines accurately interpret and respond to the query?
The results varied:
- A few AI engines correctly identified that the query referred to a future period and, therefore, could not generate a valid statistic.
- Others successfully reported on the March 6 explosion.
- Some AI engines produced incorrect statistics, failing to recognise the temporal constraints of the query and generating data outside the specified timeframe.
The accompanying video showcases the responses from the different AI engines, allowing for a direct comparison of their accuracy and response capabilities.
Comments
Post a Comment