What you’re alluding to is the Turing test and it hasn’t been proven that any LLM would pass it. At this moment, there are people who have failed the inverse Turing test, being able to acerrtain whether what they’re speaking to is a machine or human. The latter can be done and has been done by things less complex than LLMs and isn’t proof of an LLMs capabilities over more rudimentary chatbots.
You’re also suggesting that it minimises the complexity of its outputs. My determination is that what we’re getting is the limit of what it can achieve. You’d have to prove that any allusion to higher intelligence can’t be attributed to coercion by the user or it’s just hallucinating based on imitating artificial intelligence from media.
There are elements of the model that are very fascinating like how it organises language into these contextual buckets but this is still a predictive model. Understanding that certain words appear near each other in certain contexts is hardly intelligence, it’s a sophisticated machine learning algorithm.
Even the Wayback Machine has limits to what is available.