Grounded agentic systems and check-ride.ai
The article itself makes for some interesting reading, but the gist of it is that ungrounded LLM usage (chatGPT, Claude, Grok) hallucinates around 15-20% of the time, of course this is for the specific model versions tested in the article, and that percentage varies wildly depending on the simplicity of the nature of the conversation. But I encourage you to try having aviation related discussions with any of the aforementioned tools and I think you'll find the quoted percentages hold (at the time of writing anyway).
The article also further goes on to explain that when these language models are provided the appropriate contexts for the same queries (the process of injecting this 'correct' context is referred to as grounding), hallucination rates drop to less than 2%. Grounding, however is not some toggle to be flipped that clamps down on hallucination. It is a technique, and how well it is implemented dictates how much you can improve (or regress when done wrong) an AI.
check-ride.ai is actually an offshoot of the core grounding of agentic systems work we do. You can review more technical details of the agentic search we are developing, but it's objective is just that, formulating the context the language models are provided such that we minimize p(hallucination).
We're constantly working on improving the agentic system, and I've personally found it very motivating on developing it specifically for this purpose, since I was told the day I passed my PPL checkride that it's a license to learn. And so in that respect I see building superhuman search as the way we can make this license maximally useful, for all of us.