Imagine you are stranded on a deserted island. You have never been on such island before. To survive you have to learn to do everything from scratch. You discover new plants and over time understand what is edible and what is not, what can heal wounds and what is good for building tools. You are keeping a diary where sometimes you leave notes on plants that you seen so far. Later, another person comes in and reads your records. Even with your help, they will have hard time getting all information about the plants. Some of it you had but did not keep. Some of you did not had in the first place because you lacked the right tools to collect it. Where does it grow? How does it change with seasons? Is there an animal who eats that plant as well? Each new plant in the diary requires an effort bringing back to life memories about it, so that you can answer as much questions as possible. It is certainly not possible to reconstruct it in full without the necessary context from you.
Now you are a robot stranded on an island. You are experimenting with plants and collecting all observations. Similarly, another robot comes in and reads your records. We are not sure how machine imagination works and what machine version of reading would mean, but imagine robot sees a plant identifier and extra information about it encoded right in its word. It sees plant's identifier and at the same time its colour, shape, density, chemical composition, relationship to other entities such animals that feed on it and types of soil it grows on. The robot can also see precise records on how this knowledge has been collected and can judge its reliability.
Today, if you read an article and see word like “business” you are left on your own to understand what it is. If you are interested only in movies, you would imagine "The Wolf of Wall Street". If you have experience in finance or have worked with data about businesses before you may know that: there are X businesses in country Y; businesses are best compared by metrics A,B, and C; each metric varies in ranges [A-A*], [B-B*] and [C-C*]; certain regions in this embedding space means T and others mean F. If so, you see information in article augmented by your prior knowledge. Now, this would be really great if you can also perceive all of it by reading some “hyper”-word that contains that extra information, something like “business:0001100101”. What if all the words are like that? Whenever you read article you get very rich understanding of issue at hand. That would be great.
But there is a problem. Each time you read this “hyper”-word you have to carry extract information. Basically, it is kind of wiki-page with as much important details as possible being shirked into some understandable by human mind embedding. Let's assume each word on in this wiki-page is a normal world and thus page is already has the smallest size. This “hyper”-word now is “business:<raw text from wiki page here>”. Carrying this blob of text is extremely redundant and makes reading slow. But is there a way to keep the very minimal amount of information necessary for identifying object of the word, something like log2 of number of words out there? Thats's just the world itself. There is some redundancy (50% in English) in encoding, but that seems to be for historical reasons. It contains minimum amount of information necessary to identify word out of all other words in language. Turns out languages has developed to minimise amount of information you have to carry around to describe word. This way, once agent has attained necessary starting knowledge, its throughput of communication is very high.
Now, are we learning just statistics of words? No, if so we would be a monkeys typing nonsense words. It is more complex than that. Instead, even the most rare, complex sentences with loose grammar and unseen words can make total sense. To make it even weirder, we learn languages incredibly quickly. More to that, languages that are developed in disconnected groups share similar traits, are equally powerful and easy to learn. Did we develop some innate predisposition to languages or did we develop languages that can be learned quickly? It must be both. The class of languages we have is shaped by our evolutionary constraints. And yet, the power of languages makes it believe that there is value on its own that we just happen to discover. An artifact that goes far beyond scope of functions imposed by constraints and objectives at time of its initial development.
The world we are living in is the experiences of physical world summarised in the language. Once you learn it, you have access to direct experiences of others and achieve temporal awareness yourself. We can find description of external physical phenomena that is able to capture its nature well enough to accurately predict what will happen in future. There is a vast amount of laws and phenomena we can discover in math, pure language. And we are just scraping the surface. What marvels are hidden beyond our reach? The space we discovered so far is as tiny as pale blue dot we call our home in cosmic scale of universe, and language is a highway entrance to it.