by Marlon Barrios Solano
February 9th 2016
Large language models (LLMs) trained purely on text have demonstrated surprising capacities that hint at an internal “world model” of sorts. A recent paper by Wes Gurnee and Max Tegmark, Language Models Represent Space and Time, tackles a key question: do LLMs merely memorize superficial word correlations, or do they form coherent representations that reflect real-world structures? The authors provide evidence for the latter, showing that LLMs encode aspects of space and time in their latent representations. In other words, even without direct perception or embodiment, a language model can develop an internal map of the world and a timeline of history. This finding challenges assumptions in cognitive science and AI that grounded sensory experience is necessary for spatial and temporal understanding. It appears that through the process of compressing vast textual data, LLMs learn compact, interpretable structures corresponding to geographic and chronological knowledge.
This essay expands on the key arguments and implications of Gurnee and Tegmark’s work. We explore how LLMs develop latent geometric representations of space (for example, clustering city names in ways that mirror their physical locations), and how they encode temporal sequences and historical order. From these technical results, we delve into broader implications: the philosophical puzzle of representation without embodiment, the notion of latent space as a conceptual manifold of knowledge, and how narrative prediction in LLMs may encode causal-like structures (noting the difference between narrative time and physical time). We then consider cognitive science perspectives—what it means for an intelligence to have structure without direct perception—and examine political, cultural, and metaphysical implications. Finally, we discuss the limits of these representations, emphasizing that LLMs, despite impressive latent structures, still lack perception, embodiment, and conscious understanding.
One of the striking findings of Gurnee and Tegmark’s study is that LLMs can learn a geometric representation of physical space purely from text. By probing a trained model’s internal activations for place names, the researchers showed that the model encodes information correlated with geographic coordinates. They constructed datasets of locations at various scales (world cities, U.S. places, and even points of interest within New York City) and associated each name with its real latitude and longitude. When these location names were fed into the LLM (specifically the LLaMA-2 model family), certain neurons and activation patterns reliably corresponded to the places’ coordinates. A simple linear regression could recover latitude and longitude from the model’s activations.
This implies the model organized its internal state space such that semantically or geographically related places ended up near each other in latent space. European capitals cluster together; Pacific islands cluster elsewhere. The representation is:
Spatial world model extracted from LLaMA-2-70B. Each point represents a place projected onto learned “latitude” and “longitude” directions derived from internal activations. The projection reconstructs a recognizable map of the world.
How does this happen without vision or maps?
Language itself encodes geographic structure:
Over billions of tokens, statistical constraints accumulate. “Paris” and “London” co-occur frequently; “Paris” and “Sydney” rarely do in proximity. The model compresses these correlations into geometry.
The result is not rote memorization but emergent structure. When regions were held out during probe training, the model still placed unseen cities in approximately correct locations—evidence of generalization rather than lookup-table recall.
In effect, LLMs form a latent geometric cognitive map.
Parallel to spatial mapping, LLMs encode temporal structure. Gurnee and Tegmark tested models on:
Linear probes could predict years from internal activations far above chance.
For example:
Spearman correlations exceeded 0.9 in some datasets—indicating strong preservation of ordering.
The representation:
These neurons act like internal dials for recency.
Projection of model activations onto a learned time axis shows close alignment between predicted and true historical dates.
How does the model learn time?
Language embeds temporal cues:
The model extracts ordering from repeated narrative patterns.
However, while it encodes chronology well, it sometimes struggles with relational temporal reasoning unless prompted step-by-step. The knowledge exists, but is not always automatically deployed.
These findings challenge strict embodied cognition theories, which argue that spatial and temporal understanding require sensorimotor grounding.
LLMs:
Yet they develop structured representations of both.
This suggests:
This resonates with Immanuel Kant’s idea of space and time as organizing forms of intuition. While LLMs do not possess innate intuitions, their architecture and prediction objective pressure them to organize knowledge spatially and temporally.
However, limitations remain:
| LLM Knowledge | Human Embodied Knowledge |
|---|---|
| Descriptive | Experiential |
| Correlational | Sensorimotor grounded |
| Structural | Phenomenological |
The LLM has the map, not the journey.
LLM latent space functions as a conceptual manifold.
Peter Gärdenfors’ theory of Conceptual Spaces argues that thought has geometric structure: similarity equals proximity. LLM embeddings operationalize this idea computationally.
Emergent dimensions include:
Latent space becomes an epistemic manifold: movement along directions alters conceptual attributes.
This aligns with predictive processing frameworks (Friston, Clark):
Space and time become compressive variables that minimize prediction error.
LLMs are narrative prediction engines. They absorb:
Narrative time differs from physical time:
The model infers chronological order even in flashbacks. It tracks causal cues:
However:
It encodes proto-causality but not interventionist reasoning.
LLMs demonstrate that intelligence can arise from compression.
Prediction pressure yields:
This supports:
It suggests that:
World structure is encoded in language deeply enough to reconstruct reality second-hand.
However, structure without perception lacks:
Latent spatial and temporal models mirror training data biases.
Bias sources include:
Consequences:
The LLM’s world model is:
A mirror of language, not a neutral atlas.
Bias mitigation requires:
The emergence of space/time in LLMs resonates with structural realism:
Max Tegmark’s Mathematical Universe Hypothesis posits reality is mathematical structure. LLMs reconstruct structure from text correlations.
Analogy:
The model possesses structural knowledge, not material presence.
It knows relations, not intrinsic properties.
Despite impressive structure, LLMs lack:
They do not automatically use their internal maps for reasoning.
They hallucinate. They lack physical sanity checks. They do not experience time passing.
Their world model is:
They possess shadows of understanding—not full illumination.
Gurnee and Tegmark’s work demonstrates that large language models internalize structured representations of space and time purely through text prediction. LLMs:
These findings challenge simplistic “stochastic parrot” critiques. LLMs compress language into structured latent variables that reflect world topology.
Yet their understanding is structural, not experiential. They lack embodiment, grounding, and guaranteed reasoning reliability.
The discovery reframes LLMs as:
Space and time—the scaffolds of experience—appear within high-dimensional vector space. Through compression, structure emerges.
But the map is not the territory.
The LLM has drawn the world inside itself.
It has not lived in it.
This essay synthesizes arguments from:
These works collectively illuminate the implications of emergent spatial and temporal structure in large language models.