enactivismandllms

Embodied Cognition and Enactivism: Implications for Generative AI

by Marlon Barrios Solano

65A290B4-4FB8-4927-89C1-0525C1D7690F

Introduction

In recent decades, embodied cognition and enactivism have emerged as influential frameworks in cognitive science and philosophy of mind. These approaches challenge traditional views that equate cognition with abstract information processing in a disembodied brain. Instead, they emphasize the role of the living body and interactive engagement with the world in shaping mind and meaning. This shift has significant implications for artificial intelligence (AI) research. As generative AI systems like large language models (LLMs) achieve remarkable feats in language generation, questions arise about their disembodied nature and whether genuine understanding can arise without a body or environment.

This essay takes a researcher’s stance, examining embodied and enactive cognition theories (drawing on thinkers such as Andy Clark, Francisco Varela, Evan Thompson, and others) and critically analyzing what they imply for the design and limits of current AI. We will define each theory, discuss the limitations of disembodied AI models, contrast statistical pattern recognition with enactive sense-making, and explore potential directions for integrating embodied/enactive principles into future AI systems. Throughout, we engage with key academic perspectives and use references to ground the discussion in the scholarly literature.

Embodied Cognition: The Body in Mind

Embodied cognition is the view that the body is not just a vessel for the brain but an integral component of cognitive processes. In a standard definition,

“Cognition is embodied when it is deeply dependent upon features of the physical body of an agent, that is, when aspects of the agent’s body beyond the brain play a significant causal or physically constitutive role in cognitive processing.”

Advocates of this perspective argue that traditional cognitive science made a mistake in treating the body as peripheral to mind. Instead, the body’s form, sensory capacities, and motor abilities fundamentally shape how an agent perceives, thinks, and acts. For example, our concepts may be grounded in bodily metaphors and sensorimotor experience, and our memory and problem-solving can be offloaded to environmental structures (writing, tools, etc.) that we manipulate with our bodies.

Philosopher Andy Clark is a prominent proponent of embodied and extended cognition. In Being There: Putting Brain, Body and World Together Again (1997), Clark surveyed new approaches to AI and cognitive science that moved away from purely internal symbol-crunching. He noted that minds evolved “for getting things done in the world in real time,” not for functioning as disembodied “walking encyclopedias.” Early behavior-based robotics research reinforced this idea: Rodney Brooks’s subsumption architecture demonstrated “intelligence without representation,” using simple sensorimotor loops instead of heavy internal models. Clark highlighted how such robotics, which let a machine navigate and respond directly to its environment, illustrated cognition as “scaffolded, embedded, and extended” in the world. In other words, cognitive processes can be distributed across brain, body, and environment, blurring the boundary of mind. Clark and Chalmers famously argued that even external tools (a notebook, for instance) could become part of one’s cognition—a thesis of the Extended Mind. The general takeaway of embodied approaches is that an organism’s sensorimotor capacities and environmental context are not afterthoughts but core constituents of thinking. Cognition “deeply depends” on embodiment, meaning that the shape of our mind cannot be understood in isolation from the body that houses it and the world it interacts with.

Enactivism: Cognition as Enactive Sense-Making

Enactivism goes a step further by positing that cognition emerges through a dynamic interaction between an organism and its environment. The term originated with Francisco Varela, Evan Thompson, and Eleanor Rosch’s book The Embodied Mind (1991), which proposed a radical alternative to cognitivism. Enactivism holds that

“Cognition arises through a dynamic interaction between an acting organism and its environment.”

In this view, an organism enacts (brings forth) a world through its sensorimotor activity. Crucially, the world an agent experiences is not a pre-given external reality fully independent of the agent, but is partially “brought about, or enacted, by the active exercise of that organism’s sensorimotor processes.” The environment is inseparable from the organism’s ways of perceiving and acting. Varela and colleagues emphasized:

“The growing conviction that cognition is not the representation of a pre-given world by a pre-given mind but is rather the enactment of a world and a mind based on a history of the variety of actions that a being in the world performs.”

In simpler terms, minds and worlds co-emerge: through continued adaptive activity, a cognitive agent “lays down a path in walking,” constantly shaping and being shaped by its world.

Enactivism is closely related to embodied and situated cognition, but it is distinguished by its rejection of orthodox representational views of mind. Traditional AI and cognitive science often assumed that cognition means processing internal representations of an objective external world (as if the brain were a computer taking input from the senses and constructing an internal model). Enactivists explicitly oppose this view by rejecting the idea that the core business of cognition is to represent and compute. Instead, they offer a vision of minds as active, world-involving processes. In the original enactive framework, the brain is not seen as manipulating symbols according to a program; rather, cognition is embodied action—the ongoing coupling of perception and action in service of the organism’s practical situation. Perception itself is understood as not just receiving data but actively exploring the environment (e.g. the way an animal moves to sense its surroundings). As Thompson and Varela put it:

“Organisms do not passively receive information from their environments… they enact a world” through a “history of structural coupling” with the environment.

A key concept in enactivism is sense-making. Because enactivism often builds on the biology of autonomous living systems, it sees cognition as the way living organisms make sense of their world. This means that organisms interpret environmental stimuli in terms of their own viability and goals. One formulation states:

“Living is a process of sense-making, of bringing forth significance and value.”

An autonomous agent (like a living cell or animal) doesn’t just process data; it assigns meaning to things based on their significance for its continued existence. Di Paolo and colleagues define sense-making as “the capacity of an autonomous system to adaptively regulate its operation and its relation to the environment depending on the virtual consequences for its own viability as a form of life.” In other words, a bacterium “makes sense” of sugar in the environment as food to approach, whereas a toxin is something to avoid—the value (good or bad) is not pre-given in the environment alone but arises from the organism’s autonomous needs. Enactivists argue that this kind of embodied, normative engagement is the origin of meaning and even the basis of basic cognition. Evan Thompson has described sense-making as “intentionality in its minimal and original biological form” – linking it to the philosophical idea of intentionality, the “aboutness” of mental states. Enactive cognition, then, is participatory: the mind is not a mirror of nature, but a participant in nature, bringing forth a world through its embodied activity and interpreting that world in terms of its own perspective.

Disembodied AI: Limitations of Current LLMs

The embodied and enactive paradigms cast current AI systems—especially large language models—in a critical light. Present-day LLMs like GPT-4 or BERT are disembodied: they exist solely as computational patterns in silico, with no physical body or sensorimotor loop connecting them to an environment. They learn from massive text datasets, finding statistical regularities in language, but they do not live in or experience the world that language describes.

From an embodied cognition standpoint, this disembodiment is a fundamental limitation. Human language and thought are grounded in bodily experience; in contrast, an LLM deals with words in a largely ungrounded way. Stevan Harnad famously called this the “symbol grounding problem” – how can symbols (like words or internal data structures) obtain meaning for a system if that system has no access to the referents of those symbols in the world? An AI can store and manipulate the word “apple,” but without sensory access or bodily interaction, does it know what an apple is? For humans, the word “apple” is connected to a rich network of perceptions (shape, color, taste) and possible actions (eating, picking)—in short, sensorimotor knowledge. Disembodied AI lacks this direct grounding.

Researchers in cognitive science and AI have pointed out that disembodied programs are “fundamentally incapable of understanding people’s embodied interactions in the ways that humans understand them.” For example, an education-oriented AI might analyze video of a student solving a puzzle, but if that AI is purely trained on data patterns and has no bodily awareness, it cannot truly interpret the gestures, postures, or physical frustrations of the student the way a human teacher would. More generally, the worry is that however sophisticated LLMs become, they may still be lacking what philosopher Hubert Dreyfus identified as the background of everyday embodied know-how. Dreyfus, an early critic of GOFAI (Good Old-Fashioned AI), argued that human intelligence is deeply rooted in being-in-the-world (a concept from Heidegger’s phenomenology). He asserted that classical AI failed and “fixing it would require making it more Heideggerian”—i.e., giving AI a situation and body so that it can develop the kind of implicit understanding we have.

The limitations of current LLMs highlight this point: LLMs often produce “plausible but inaccurate information” (hallucinations) and struggle with common-sense reasoning or contextual understanding beyond text. These failures can be seen as symptoms of a lack of grounded understanding. Without a body, an AI has no genuine perspective or stakes in the world: it cannot care what is true or false, right or wrong, because nothing matters to it in a literal sense. It has no goals or survival to secure. Vervaeke and Coyne (2024) argue that present-day LLMs “lack the ability to properly care about the truth,” being static systems that cannot actively seek new information or correct themselves in light of experience. All of this is unsurprising from an enactivist view: a machine with no body or autonomy is missing the constitutive conditions of sense-making (no autonomous metabolism or self-preservation that would make truth or error matter). It can only simulate understanding by statistically echoing patterns of human language.

It’s important to note that even some successes of LLMs can be interpreted as highlighting their disembodiment. For instance, an LLM might generate a very coherent story about going to the beach, describing sights and sensations, but it does so by remixing patterns from text it has seen, not by recalling an actual experience. In philosophy of mind terms, the LLM has syntax (it manipulates symbols according to rules and probabilities) but arguably lacks semantics (intrinsic meaning or reference). John Searle’s famous Chinese Room thought experiment is often invoked here: the LLM is like a person in a room shuffling Chinese characters by rules, producing sensible responses without understanding Chinese. Embodied cognition suggests that real understanding would require the kind of causal connection to the world that our brains have via our bodies. Empirical studies in humans show that even abstract thinking often invokes sensorimotor brain areas, and that grounding abstract concepts in imagery or body-based metaphors aids comprehension. AI models that operate purely on text or abstract data might miss out on these embodied structures of knowledge.

Another limitation of disembodied AI is the lack of situated learning. Human cognition develops through continuous embodied interaction from infancy onward: we learn by doing, by exploration, by feedback from the physical and social environment. Current LLMs, in contrast, undergo a one-time training on a fixed dataset—a largely passive absorption of existing text. They do not develop through ongoing sensorimotor engagement in an open-ended world. This makes their knowledge brittle in novel situations. An enactivist might say LLMs have no world of their own—they only have a shadow of the human world reflected in text. As a result, they might be powerful predictors in the context of linguistic input, but they lack the robust, adaptive understanding that comes from real-world sense-making. Their “intelligence” stays within the narrow realm of pattern completion, and they cannot truly generalize in the way a child can from a few interactions, because they lack the embodied ground truth to relate new situations to.

Statistical Pattern Recognition vs. Enactive Sense-Making

The contrast between LLM-type AI and enactive cognition can be framed as statistical pattern recognition versus enactive sense-making. Large language models operate by detecting and reproducing patterns in data. They are essentially very advanced statistical machines: after being trained on billions of words, an LLM like GPT can predict the next word in a sentence with uncanny accuracy, having internalized the probabilistic structure of language usage. This approach has yielded impressive results in generating fluent text and even coding, summarizing, or question-answering. However, it is, at its root, pattern recognition. The LLM does not know what the words mean in the way humans do; it knows how words relate to each other in sentences because that’s what it was trained to optimize. It is often described as a stochastic parrot, recombining bits of text it has “heard” during training without any understanding of the real-world context or truth conditions of those utterances.

Enactive sense-making, on the other hand, is not about detecting statistical patterns in a static dataset; it is about actively generating meaning through interaction. A simple example: when a spider hunts by feeling vibrations on its web, it is engaged in sense-making—distinguishing prey from random breeze, interpreting the vibration in terms of the spider’s embodied context (food vs. non-food). This interpretation is not programmed by enumerating all possible patterns of web vibration; it emerges from the spider’s sensorimotor coupling with its web and the evolutionary history that shaped its nervous system to care about certain patterns (those indicative of prey). In humans, sense-making is even more sophisticated: we interpret a smile in context—as friendly, or as sarcastic, depending on a whole array of embodied and cultural cues. Enactivists would say we are bringing forth meaning in that moment, not just pattern-matching against a database of past smiles. The process is situated, context-sensitive, and involves an element of autonomy—we have an active stake in the situation (e.g., our social rapport) which guides our interpretation.

A crucial difference is that enactive sense-making is grounded in the agent’s perspective. The patterns an organism picks up on are inherently linked to what matters to that organism. As enactivist scholars put it, an autonomous system “establishes a perspective from which interactions with the world acquire a normative status.” That is, the agent’s needs and goals create a normative framework (things can go better or worse for it), and within that framework, certain aspects of the environment become meaningful. For example, a thirsty deer sees a pond as a goal (water to drink)—the pond means something specific to the deer. Meanwhile, an AI vision system might identify “water” in an image with high accuracy (pattern recognition), but it does not desire water or prefer a pond to be in one location or another. The AI has no point of view or interests. Enactivist Evan Thompson explains that in the enactive view, to perceive is inherently to apprehend significance—perception is never neutral reception of data, but part of a cycle of meanings relative to the perceiver. This feature is entirely absent in current LLMs and similar AI. They do not perceive, desire, or fear anything; they merely transform input tokens to output tokens. Any appearance of caring or viewpoint (say, ChatGPT saying “I’m sorry to hear that” or expressing opinions) is a simulacrum learned from human text patterns, not the AI’s own concern.

Furthermore, statistical learning is retrospective—it looks for patterns in past data—whereas sense-making is prospective—it is oriented toward the future, the next action, the anticipated outcome. A sensing agent is constantly adjusting its behavior based on its anticipations and experiments in the environment. Enactive theorists often highlight the predictive and exploratory nature of biological cognition: animals actively probe the world (e.g., an infant banging objects to see what sound they make) and form expectations that get refined. There is a clear parallel here with recent neuroscience models like predictive processing (espoused by Andy Clark in his later work)—the idea that brains are prediction machines updating their models through prediction error feedback. Interestingly, predictive processing brings classical computational modeling closer to enactive ideas by stressing active perception and the role of the body in minimizing prediction errors. But even there, the difference is that enactivists emphasize the embodied and embedded nature of these predictions: the predictions only make sense in the context of a body with particular capabilities engaging a specific world.

LLMs, by contrast, do not act in the world and see what happens. They do not truly have a loop of expectation and feedback with an environment; their “training” is essentially batch learning from a static corpus, and their “deployment” is just running the learned model to generate text. They never get to discover new information by doing. They also lack the multi-modal integration that characterizes animal sense-making—a human or animal combines vision, hearing, touch, proprioception, etc., into a coherent model of the world. A text-only model is trapped in the modality of words. To the extent it “makes sense” of input text, it is by relating it to other text, not by relating it to a lived world.

Another way to frame this difference is in terms of interpretation. Pattern recognition systems classify or predict; sense-making systems interpret. Interpretation involves contextualizing information within an experiential worldview. When an enactive AI researcher speaks of an agent enacting a world, they imply that the agent has its own internal norms and perspective that it uses to make sense of raw data. For example, if one built a little robot that maintained its battery level (a simple homeostatic drive) and explored a maze to find recharging stations, that robot would start to interpret certain signals (like a beacon signal indicating a charger) as “good” or “desirable” – not because a programmer explicitly labeled it so in all circumstances, but because through interaction the signal reliably leads to fulfilling the robot’s autonomous need (charging its battery). That would be a rudimentary form of sense-making: the signal has meaning for the robot. A standard pattern recognizer might also detect the signal, but it has no preference or goal attached—it might just output “beacon detected” without caring. In enactivist terms, sense-making is inherently tied to agency: an agent acts in pursuit of maintaining itself, and in doing so, it encounters the world under certain meanings (food, obstacle, friend, danger, etc.). Contemporary LLMs have no agency; they do not set goals or pursue survival. They are more like oracles waiting for a prompt to then produce an answer. All goals and interpretations in an LLM are essentially imposed by the user or designer, not generated from the model’s own being.

Finally, it is worth noting that these differences lead to divergent error profiles. A statistical system like an LLM might fail in odd ways, e.g., confidently asserting a falsehood because it statistically sounds right (a hallucination). An enactive agent, in theory, would correct false beliefs through practical failure—e.g., if you act on a false belief, reality pushes back (the juice you thought was there isn’t, and you stay thirsty, forcing you to update your expectation in future). LLMs do not act and thus do not get the opportunity to be surprised by their mistakes. They only receive feedback if humans put them in a loop or fine-tune them with reinforcement learning from human feedback (RLHF), which is an external correction, not an intrinsic error signal from the environment as in animal learning. This is why some researchers argue that true understanding and robust intelligence might require the enactive kind of learning, not just reading the statistical traces of others’ experiences. In short, statistical pattern recognition (as impressive as it is in current AI) remains a shallow form of intelligence compared to enactive, embodied sense-making that characterizes natural cognition. The former deals in correlations, the latter in constitutive meanings for an agent.

Toward Embodied and Enactive AI

If embodied cognition and enactivism point out critical gaps in current AI, a natural question is: How can we build AI systems that incorporate embodiment or enactive principles? This is an active area of research and speculation. While LLMs and other deep learning models have thrived in disembodied digital domains, a growing number of researchers believe that AI must be situated in the world—either physically or in rich simulations—to achieve more grounded intelligence. In recent literature, there are calls for “embodied AI” that emphasize exactly this. For instance, Malik et al. (2022) argue that truly advanced AI agents should be “bound to, observe, interact with, and learn from the real world in a continuous and dynamic manner.” The idea is that an AI should not remain a static model trained once; it should be more like a robot or an embodied agent that learns continually through its own experience, adapting to new circumstances as they occur. Such an agent would also have stakes in the world—perhaps programmed drives or values (like self-preservation or curiosity) that make it care about some outcomes over others, thus instilling a kind of intrinsic motivation to seek truth or avoid error.

One practical direction is combining LLM technology with robotics and sensors—effectively giving language models a body. Early steps in this direction include systems like PaLM-E, an “Embodied Multimodal Language Model” which integrates vision and robotic control with language. The idea behind PaLM-E and similar efforts is to have a model that can both converse and perceive/act, so that its words correspond to things it sees or does. For example, a robot with a language model could be instructed “pick up the red cube,” and it would use its camera (vision input) to identify the red cube and its robotic arm to execute the action. Importantly, if the robot tries and fails (say the cube slips), the system has the potential to learn from that interaction, refining its understanding of concepts like “grasping” or the properties of the cube (heavy, slippery, etc.). This is a form of grounding language in sensorimotor experience—a key requirement highlighted by embodied cognition theories. As one recent position paper put it, “experience grounds language,” suggesting that AI might need direct experience to fully grasp linguistic meaning. Multi-modal models that handle images, audio, and text together are a step in this direction, as they tie words to perceptual data. But going further into action (embodiment) allows tying words to consequences and goals, which is even more powerful for grounding meaning.

Another promising avenue is the use of simulated environments or games where an AI agent can move and interact. Researchers have begun training agents (incorporating language models) in virtual worlds or using game engines, so that the AI can practice enactive learning: it might, for instance, figure out how to navigate a maze by trial and error, or learn the concept of “falling” by experiencing a virtual fall. These simulation-based approaches strive to give AI a kind of embodied curriculum. The advantage is that simulations can be rich and physically grounded while still being safe and fast to run (no hardware robots required, and time can be compressed). If an AI can learn through such an interactive process rather than just reading text, it might develop more robust, generalized cognitive abilities. We see initial evidence of this: some agents that learn through reinforcement in simulated environments acquire a more practical grasp of language instructions and common-sense physics than pure text-trained models.

From an enactivist perspective, one would also emphasize autonomy and self-maintenance in AI design. A truly enactive AI might need something analogous to a metabolic process or self-preservation drive. This could be implemented as a constant imperative to maintain certain variables within bounds (like energy level in a game, or a score that the agent treats as “life”). By giving AI a kind of artificial homeostasis, we could induce it to develop behavior that looks more like sense-making: it would have to interpret things in the environment as good or bad for its “health.” There have been experiments in cybernetics and robotics where robots have “hunger” and must find charging stations, etc., which indeed lead them to act in goal-directed ways that mimic biological autonomy. These are still primitive, but they hint at how one might instantiate the enactive idea of sense-making in a machine: the machine would need to care about something with respect to itself.

In the context of LLMs, some researchers speculate about tying the model to a long-term memory and goals, so that it not only generates text but does so in pursuit of fulfilling certain objectives (thus potentially developing more consistency and truthfulness, as it has a stake in not being wrong if that impedes its goal). For example, an enactive AI language system might be one that actively seeks information to resolve uncertainties—showing curiosity—rather than passively waiting for a prompt. This resembles how humans actively ask questions and explore when they encounter unknowns, guided by a drive to make sense of their situation.

It is also possible to draw inspiration from the predictive processing/active inference framework for guiding AI development. Active inference (a concept from cognitive neuroscience associated with Karl Friston) suggests that agents act to minimize the surprise (prediction error) with respect to their model of the world. This theory, when applied, encourages AI agents to explore actions that reduce uncertainty. Some AI researchers have begun to incorporate active inference principles to imbue agents with an intrinsic motivation to gather information or improve their world model. This could make AI learning more active and less reliant on human-prepared datasets. In fact, a recent paper by Di Paolo et al. (2024) titled Active Inference Goes to School: The Importance of Active Learning in the Age of Large Language Models argues that the impressive performance of LLMs might be further enhanced by enabling them to engage in active, exploratory learning as opposed to just passive training. The authors emphasize curiosity and play-like exploration as keys to advancing AI. Such qualities are hallmarks of embodied cognitive development in humans (think of how children learn by playing and exploring), and implementing them in AI could bridge some gaps between AI and natural cognition.

Additionally, the social dimension of cognition (sometimes termed “embedded” cognition) should not be overlooked. Enactivism has a concept of participatory sense-making in which meaning is co-created in social interactions. Future AI might move toward systems that learn not just from solo interaction with an environment, but from interactive dialogue and cooperation with humans or other agents in an embodied context. This could involve robots that learn tasks through imitation and feedback in a social setting (a bit like how human apprentices learn), or AI agents in simulated social environments that develop communication protocols from scratch to coordinate on tasks. By embedding AI in socially embodied scenarios, we might address aspects of intelligence that are collective and culturally shaped—something current individual LLMs, reading text alone, only implicitly capture. Indeed, language itself is a social, embodied practice; learning it in isolation from actual conversation and bodily expression is arguably limiting.

It’s worth noting that incorporating embodiment or enaction into AI is very challenging. The successes of disembodied AI came partly because it’s easier to train models on well-defined data (like text, images) than to build robots that survive in the messy real world. Robotics brings many practical difficulties (noisy sensors, mechanical failures, slow learning, etc.). However, the field of embodied AI is growing, and researchers are finding creative ways to marry the data-rich approaches of deep learning with the grounding of robotics. For example, some projects use telepresence or human demonstration to quickly teach robots (humans physically guide a robot or control it to collect training data for tasks). Others use hybrid models: an LLM might handle abstract planning (“I need to get the object from the shelf”) while a lower-level control system handles the precise movements—together forming an embodied cognitive system that spans symbolic reasoning and sensorimotor skill. Such integrated systems resonate with the embodied cognition idea that “brain, body, and environment” form a single cognitive circuit.

Another possible route is augmenting AI with human embodiment—for instance, using AI as part of human-computer interactive systems where the AI gets continuous feedback from human actions. In a way, this uses humans as the embodied part of a human-AI team. Some argue that near-term AI might best be used in augmented intelligence systems, where a disembodied AI tool works in tandem with human embodied understanding to achieve results (this is already common: e.g., a doctor (embodied intelligence) uses an AI diagnostic assistant (pattern recognizer) to get suggestions but then interprets them in context). This approach accepts the limitations of disembodied AI and uses human sense-making to fill the gap. However, the long-term quest in AI is often to make machines more autonomous on their own, and for that the principles of embodiment and enaction seem indispensable.

It should be acknowledged that not everyone agrees embodiment is strictly necessary for all aspects of intelligence. Some theorists maintain that sufficiently advanced statistical AI might approximate embodied knowledge by training on vast datasets (for example, reading texts that describe physical experiences might give an AI an indirect form of grounding). There is an ongoing debate: are LLMs just a hack that works without understanding, or could they be the scaffolding of a new kind of intelligence that doesn’t need a body? The embodied/enactive viewpoint strongly leans toward the former—that without embodiment, AI will hit a ceiling in terms of genuine comprehension and flexibility. The debate touches on deep philosophical issues of what it means to understand or to be conscious. While this essay cannot resolve that debate, the embodied cognition literature provides a rich critique of the disembodied AI approach and offers many ideas for how to move forward.

Conclusion

Embodied cognition and enactivism redirect our attention from what an intelligent system is (a set of rules or representations) to how intelligence arises through interaction in the world. They remind us that human cognition is not an abstract algorithm in a vacuum; it is situated, structural, and enactive. The implications for generative AI and LLMs are profound. As powerful as current models are, their disembodied, pattern-based nature might always keep them on the shallow end of the understanding spectrum. They excel at simulating intelligence—stitching together patterns in impressive ways—but, as enactivist critics would point out, they lack the living pulse of cognition: no body to ground them, no world they truly inhabit, no perspective or purpose of their own. This gap explains many of their strange behaviors and limitations.

However, the path forward is not to abandon AI, but to embody it. By integrating principles of embodiment and enaction, we can work toward AI that engages the world it operates in, learns like an organism (through continual sensorimotor feedback), and perhaps even “brings forth” its own form of meaning within a bounded context. Such AI might not only parse language but experience (in some limited sense) the correlates of language in the real world, leading to more robust understanding. Whether through robots that learn by doing, or virtual agents that simulate life-like goals, the next generation of AI systems could move closer to the cognitive style of natural agents. This will also force us to confront questions about autonomy and values in AI—an enactive AI with its own “sense-making” will have interests that need alignment with ours, raising ethical and safety considerations. In effect, to give AI a mind more like ours, we must give it a body (or something analogous to a body) and a world to inhabit.

In closing, embodied cognition and enactivism serve as both a critique and an inspiration for AI research. They critique the disembodied, purely statistical approach of current models as ultimately insufficient for true intelligence, and they inspire alternative designs: AI that is less a disembodied brain in a vat and more an embodied creature in an environment. As Andy Clark and David Chalmers hinted with the extended mind thesis, the mind “ain’t all in the head.” The intelligence of the future may similarly extend into the body of our machines and the environments we let them learn in. Bridging AI with the insights of embodiment and enaction could lead not only to smarter machines, but to a deeper scientific understanding of cognition itself, transcending old dichotomies between thought and life. It is a challenging road, but one increasingly seen as necessary. In the words of Varela, we may need to “transform our view of AI” so that cognition is no longer thought of as recovering pre-packaged information, but as actively generating a world. Only then can our artificial systems move beyond being clever statistical parrots and inch closer to genuine, situated intelligence.

References

This essay is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Attribution:

You are free to:

Under the following terms:

For more details, visit the official license page: https://creativecommons.org/licenses/by/4.0/