Normal view

Received before yesterdayThe Gradient

The Gradient
AGI Is Not Multimodal 4 June 2025 at 14:00

AGI Is Not Multimodal

4 June 2025 at 14:00

"In projecting language back as the model for thought, we lose sight of the tacit embodied understanding that undergirds our intelligence." –Terry Winograd

The recent successes of generative AI models have convinced some that AGI is imminent. While these models appear to capture the essence of human intelligence, they defy even our most basic intuitions about it. They have emerged not because they are thoughtful solutions to the problem of intelligence, but because they scaled effectively on hardware we already had. Seduced by the fruits of scale, some have come to believe that it provides a clear pathway to AGI. The most emblematic case of this is the multimodal approach, in which massive modular networks are optimized for an array of modalities that, taken together, appear general. However, I argue that this strategy is sure to fail in the near term; it will not lead to human-level AGI that can, e.g., perform sensorimotor reasoning, motion planning, and social coordination. Instead of trying to glue modalities together into a patchwork AGI, we should pursue approaches to intelligence that treat embodiment and interaction with the environment as primary, and see modality-centered processing as emergent phenomena.

Preface: Disembodied definitions of Artificial General Intelligence — emphasis on general — exclude crucial problem spaces that we should expect AGI to be able to solve. A true AGI must be general across all domains. Any complete definition must at least include the ability to solve problems that originate in physical reality, e.g. repairing a car, untying a knot, preparing food, etc. As I will discuss in the next section, what is needed for these problems is a form of intelligence that is fundamentally situated in something like a physical world model. For more discussion on this, look out for Designing an Intelligence. Edited by George Konidaris, MIT Press, forthcoming.

Why We Need the World, and How LLMs Pretend to Understand It

TLDR: I first argue that true AGI needs a physical understanding of the world, as many problems cannot be converted into a problem of symbol manipulation. It has been suggested by some that LLMs are learning a model of the world through next token prediction, but it is more likely that LLMs are learning bags of heuristics to predict tokens. This leaves them with a superficial understanding of reality and contributes to false impressions of their intelligence.

The most shocking result of the predict-next-token objective is that it yields AI models that reflect a deeply human-like understanding of the world, despite having never observed it like we have. This result has led to confusion about what it means to understand language and even to understand the world — something we have long believed to be a prerequisite for language understanding. One explanation for the capabilities of LLMs comes from an emerging theory suggesting that they induce models of the world through next-token prediction. Proponents of this theory cite the prowess of SOTA LLMs on various benchmarks, the convergence of large models to similar internal representations, and their favorite rendition of the idea that “language mirrors the structure of reality,” a notion that has been espoused at least by Plato, Wittgenstein, Foucault, and Eco. While I’m generally in support of digging up esoteric texts for research inspiration, I’m worried that this metaphor has been taken too literally. Do LLMs really learn implicit models of the world? How could they otherwise be so proficient at language?

One source of evidence in favor of the LLM world modeling hypothesis is the Othello paper, wherein researchers were able to predict the board of an Othello game from the hidden states of a transformer model trained on sequences of legal moves. However, there are many issues with generalizing these results to models of natural language. For one, whereas Othello moves can provably be used to deduce the full state of an Othello board, we have no reason to believe that a complete picture of the physical world can be inferred by a linguistic description. What sets the game of Othello apart from many tasks in the physical world is that Othello fundamentally resides in the land of symbols, and is merely implemented using physical tokens to make it easier for humans to play. A full game of Othello can be played with just pen and paper, but one can’t, e.g., sweep a floor, do dishes, or drive a car with just pen and paper. To solve such tasks, you need some physical conception of the world beyond what humans can merely say about it. Whether that conception of the world is encoded in a formal world model or, e.g., a value function is up for debate, but it is clear that there are many problems in the physical world that cannot be fully represented by a system of symbols and solved with mere symbol manipulation.

Another issue stated in Melanie Mitchell’s recent piece and supported by this paper, is that there is evidence that generative models can score remarkably well on sequence prediction tasks while failing to learn models of the worlds that created such sequence data, e.g. by learning comprehensive sets of idiosyncratic heuristics. E.g., it was pointed out in this blog post that OthelloGPT learned sequence prediction rules that don’t actually hold for all possible Othello games, like “if the token for B4 does not appear before A4 in the input string, then B4 is empty.” While one can argue that it doesn’t matter how a world model predicts the next state of the world, it should raise suspicion when that prediction reflects a better understanding of the training data than the underlying world that led to such data. This, unfortunately, is the central fault of the predict-next-token objective, which seeks only to retain information relevant to the prediction of the next token. If it can be done with something easier to learn than a world model, it likely will be.

To claim without caveat that predicting the effects of earlier symbols on later symbols requires a model of the world like the ones humans generate from perception would be to abuse the “world model” notion. Unless we disagree on what the world is, it should be clear that a true world model can be used to predict the next state of the physical world given a history of states. Similar world models, which predict high fidelity observations of the physical world, are leveraged in many subfields of AI including model-based reinforcement learning, task and motion planning in robotics, causal world modeling, and areas of computer vision to solve problems instantiated in physical reality. LLMs are simply not running physics simulations in their latent next-token calculus when they ask you if your person, place, or thing is bigger than a breadbox. In fact, I conjecture that the behavior of LLMs is not thanks to a learned world model, but to brute force memorization of incomprehensibly abstract rules governing the behavior of symbols, i.e. a model of syntax.

Quick primer:

Syntax is a subfield of linguistics that studies how words of various grammatical categories (e.g. parts of speech) are arranged together into sentences, which can be parsed into syntax trees. Syntax studies the structure of sentences and the atomic parts of speech that compose them.
Semantics is another subfield concerned with the literal meaning of sentences, e.g., compiling “I am feeling chilly” into the idea that you are experiencing cold. Semantics boils language down to literal meaning, which is information about the world or human experience.
Pragmatics studies the interplay of physical and conversational context on speech interactions, like when someone knows to close an ajar window when you tell them “I am feeling chilly.” Pragmatics involves interpreting speech while reasoning about the environment and the intentions and hidden knowledge of other agents.

Without getting too technical, there is intuitive evidence that somewhat separate systems of cognition are responsible for each of these linguistic faculties. Look no further than the capability for humans to generate syntactically well-formed sentences that have no semantic meaning, e.g. Chomsky’s famous sentence “Colorless green ideas sleep furiously,” or sentences with well-formed semantics that make no pragmatic sense, e.g. responding merely with “Yes, I can” when asked, “Can you pass the salt?” Crucially, it is the fusion of the disparate cognitive abilities underpinning them that coalesce into human language understanding. For example, there isn’t anything syntactically wrong with the sentence, “The fridge is in the apple,” as a syntactic account of “the fridge” and “the apple” would categorize them as noun phrases that can be used to produce a sentence with the production rule, S → (NP “is in” NP). However, humans recognize an obvious semantic failure in the sentence that becomes apparent after attempting to reconcile its meaning with our understanding of reality: we know that fridges are larger than apples, and could not be fit into them.

But what if you have never perceived the real world, yet still were trying to figure out whether the sentence was ill-formed? One solution could be to embed semantic information at the level of syntax, e.g., by inventing new syntactic categories, NP_{the fridge} and NP_{the apple}, and a single new production rule that prevents semantic misuse: S → (NP_{the apple} “is in” NP_{the fridge}). While this strategy would no longer require grounded world knowledge about fridges and apples, e.g., it would require special grammar rules for every semantically well-formed construction… which is actually possible to learn given a massive corpus of natural language. Crucially, this would not be the same thing as grasping semantics, which in my view is fundamentally about understanding the nature of the world.

Finding that LLMs have reduced problems of semantics and pragmatics into syntax would have profound implications on how we should view their intelligence. People often treat language proficiency as a proxy for general intelligence by, e.g., strongly associating pragmatic and semantic understanding with the cognitive abilities that undergird them in humans. For example, someone who appears well-read and graceful in navigating social interactions is likely to score high in traits like sustained attention and theory of mind, which lie closer to measures of raw cognitive ability. In general, these proxies are reasonable for assessing a person’s general intelligence, but not an LLM’s, as the apparent linguistic skills of LLMs could come from entirely separate mechanisms of cognition.

The Bitter Lesson Revisited

TLDR: Sutton’s Bitter Lesson has sometimes been interpreted as meaning that making any assumptions about the structure of AI is a mistake. This is both unproductive and a misinterpretation; it is precisely when humans think deeply about the structure of intelligence that major advancements occur. Despite this, scale maximalists have implicitly suggested that multimodal models can be a structure-agnostic framework for AGI. Ironically, today’s multimodal models contradict Sutton’s Bitter Lesson by making implicit assumptions about the structure of individual modalities and how they should be sewn together. In order to build AGI, we must either think deeply about how to unite existing modalities, or dispense with them altogether in favor of an interactive and embodied cognitive process.

The paradigm that led to the success of LLMs is marked primarily by scale, not efficiency. We have effectively trained a pile of one trillion ants for one billion years to mimic the form and function of a Formula 1 race car; eventually it gets there, but wow was the process inefficient. This analogy nicely captures a debate between structuralists, who want to build things like "wheels" and "axles" into AI systems, and scale maximalists, who want more ants, years, and F1 races to train on. Despite many decades of structuralist study in linguistics, the unstructured approaches of scale maximalism have yielded far better ant-racecars in recent years. This was most notably articulated by Rich Sutton — a recent recipient of the Turing Award along with Andy Barto for their work in Reinforcement Learning — in his piece “The Bitter Lesson.”

[W]e should build in only the meta-methods that can find and capture this arbitrary complexity… Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. - Rich Sutton

Sutton’s argument is that methods that leverage computational resources will outpace methods that do not, and that any structure for problem-solving built as an inductive bias into AI will hinder it from learning better solutions. This is a compelling argument that I believe has been seriously misinterpreted by some as implying that making any assumptions about structure is a false step. It is, in fact, human intuition that was responsible for many significant advancements in the development of SOTA neural network architectures. For example, Convolutional Neural Networks made an assumption about translation invariance for pattern recognition in images and kickstarted the modern field of deep learning for computer vision; the attention mechanism of Transformers made an assumption about the long-distance relationships between symbols in a sentence that made ChatGPT possible and had nearly everyone drop their RNNs; and 3D Gaussian Splatting made an assumption about the solidity of physical objects that made it more performant than NeRFs. Potentially none of these methodological assumptions apply to the entire domain of possible scenes, images, or token streams, but they do for the specific ones that humans have curated and formed structural intuitions about. Let’s not forget that humans have co-evolved with the environments that these datasets are drawn from.

The real question is how we might heed Sutton’s Bitter Lesson in our development of AGI. The scale maximalist approach worked for LLMs and LVMs (large vision models) because we had natural deposits of text and image data, but an analogous application of scale maximalism to AGI would require forms of embodiment data that we simply don’t have. One solution to this data scarcity issue extends the generative modeling paradigm to multimodal modeling — encompassing language, vision, and action — with the hope that a general intelligence can be built by summing together general models of narrow modalities.

There are multiple issues with this approach. First, there are deep connections between modalities that are unnaturally severed in the multimodal setting, making the problem of concept synthesis ever more difficult. In practice, uniting modalities often involves pre-training dedicated neural modules for each modality and joining them together into a joint embedding space. In the early days, this was achieved by nudging the embeddings of, e.g. (language, vision, action) tuples to converge to similar latent vectors of meaning, a vast oversimplification of the kinds of relationships that may exist between modalities. One can imagine, e.g., captioning an image at various levels of abstraction, or implementing the same linguistic instruction with different sets of physical actions. Such one-to-many relationships suggest that a contrastive embedding objective is not suitable.

While modern approaches do not make such stringent assumptions about how modalities should be united, they still universally encode percepts from all modalities (e.g. text, images) into the same latent space. Intuitively, it would seem that such latent spaces could serve as common conceptual ground across modalities, analogous to a space of human concepts. However, these latent spaces do not cogently capture all information relevant to a concept, and instead rely on modality-specific decoders to flesh out important details. The “meaning” of a percept is not in the vector it is encoded as, but in the way relevant decoders process this vector into meaningful outputs. As long as various encoders and decoders are subject to modality-specific training objectives, “meaning” will be decentralized and potentially inconsistent across modalities, especially as a result of pre-training. This is not a recipe for the formation of coherent concepts.

Furthermore, it is not clear that today’s modalities are an appropriate partitioning of the observation and action spaces for an embodied agent. It is not obvious that, e.g., images and text should be represented as separate observation streams, nor text production and motion planning as separate action capabilities. The human capacities for reading, seeing, speaking, and moving are ultimately mediated by overlapping cognitive structures. Making structural assumptions about how modalities ought to be processed is likely to hinder the discovery of more fundamental cognition that is responsible for processing data in all modalities. One solution would be to consolidate unnaturally partitioned modalities into a unified data representation. This would encourage networks to learn intelligent processes that generalize across modalities. Intuitively, a model that can understand the visual world as well as humans can — including everything from human writing to traffic signs to visual art — should not make a serious architectural distinction between images and text. Part of the reason why VLMs can’t, e.g., count the number of letters in a word is because they can’t see what they are writing.

Finally, the learn-from-scale approach trains models to copy the conceptual structure of humans instead of learning the general capability to form novel concepts on their own. Humans have spent hundreds of thousands of years refining concepts and passing them memetically through culture and language. Today’s models are trained only on the end result of this process: the present-day conceptual structures that make it into the corpus. By optimizing for the ultimate products of our intelligence, we have ignored the question of how those products were invented and discovered. Humans have a unique ability to form durable concepts from few examples and ascribe names to them, reason about them analogically, etc. While the in-context capabilities of today’s models can be impressive, they grow increasingly limited as tasks become more complex and stray further from the training data. The flexibility to form new concepts from experience is a foundational attribute of general intelligence, we should think carefully about how it arises.

While structure-agnostic scale maximalism has succeeded in producing LLMs and LVMs that pass Turing tests, a multimodal scale maximalist approach to AGI will not bear similar fruit. Instead of pre-supposing structure in individual modalities, we should design a setting in which modality-specific processing emerges naturally. For example, my recent paper on visual theory of mind saw abstract symbols naturally emerge from communication between image-classifying agents, blurring the lines between text and image processing. Eventually, we should hope to reintegrate as many features of intelligence as possible under the same umbrella. However, it is not clear whether there is genuine commercial viability in such an approach as long as scaling and fine-tuning narrow intelligence models solves commercial use-cases.

Conclusion

The overall promise of scale maximalism is that a Frankenstein AGI can be sewed together using general models of narrow domains. I argue that this is extremely unlikely to yield an AGI that feels complete in its intelligence. If we intend to continue reaping the streamlined efficiency of modality-specific processing, we must be intentional in how modalities are united — ideally drawing from human intuition and classical fields of study, e.g. this work from MIT. Alternatively, we can re-formulate learning as an embodied and interactive process where disparate modalities naturally fuse together. We could do this by, e.g., processing images, text, and video using the same perception system and producing actions for generating text, manipulating objects, and navigating environments using the same action system. What we will lose in efficiency we will gain in flexible cognitive ability.

In a sense, the most challenging mathematical piece of the AGI puzzle has already been solved: the discovery of universal function approximators. What’s left is to inventory the functions we need and determine how they ought to be arranged into a coherent whole. This is a conceptual problem, not a mathematical one.

Acknowledgements

I would like to thank Lucas Gelfond, Daniel Bashir, George Konidaris, and my father, Joseph Spiegel, for their thoughtful and thorough feedback on this work. Thanks to Alina Pringle for the wonderful illustration made for this piece.

Author Bio

Benjamin is a PhD candidate in Computer Science at Brown University. He is interested in models of language understanding that ground meaning to elements of structured decision-making. For more info see his personal website.

Citation

For attribution in academic contexts or books, please cite this work as

Benjamin A. Spiegel, "AGI Is Not Multimodal", The Gradient, 2025.

@article{spiegel2025agi,
    author = {Benjamin A. Spiegel},
    title = {AGI Is Not Multimodal},
    journal = {The Gradient},
    year = {2025},
    howpublished = {\url{https://thegradient.pub/agi-is-not-multimodal},
}

References

Andreas, Jacob. “Language Models, World Models, and Human Model-Building.” Mit.edu, 2024, lingo.csail.mit.edu/blog/world_models/.

Belkin, Mikhail, et al. "Reconciling modern machine-learning practice and the classical bias–variance trade-off." Proceedings of the National Academy of Sciences 116.32 (2019): 15849-15854.

Bernhard Kerbl, et al. “3D Gaussian Splatting for Real-Time Radiance Field Rendering.” ACM Transactions on Graphics, vol. 42, no. 4, 26 July 2023, pp. 1–14, https://doi.org/10.1145/3592433.

Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, Massachusetts: MIT Press.

Designing an Intelligence. Edited by George Konidaris, MIT Press, 2026.

Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.

Eye on AI. “The Mastermind behind GPT-4 and the Future of AI | Ilya Sutskever.” YouTube, 15 Mar. 2023, www.youtube.com/watch?v=SjhIlw3Iffs&list=PLpdlTIkm0-jJ4gJyeLvH1PJCEHp3NAYf4&index=64. Accessed 18 May 2025.

Frank, Michael C. “Bridging the data gap between children and large language models.” Trends in cognitive sciences vol. 27,11 (2023): 990-992. doi:10.1016/j.tics.2023.08.007

Garrett, Caelan Reed, et al. "Integrated task and motion planning." Annual review of control, robotics, and autonomous systems 4.1 (2021): 265-293.APA

Goodhart, C.A.E. (1984). Problems of Monetary Management: The UK Experience. In: Monetary Theory and Practice. Palgrave, London. https://doi.org/10.1007/978-1-349-17295-5_4

Hooker, Sara. The hardware lottery. Commun. ACM 64, 12 (December 2021), 58–65. https://doi.org/10.1145/3467017

Huh, Minyoung, et al. "The Platonic Representation Hypothesis." Forty-first International Conference on Machine Learning. 2024.

Kaplan, Jared, et al. "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020).

Lake, Brenden M. et al. “Building Machines That Learn and Think like People.” Behavioral and Brain Sciences 40 (2017): e253. Web.

Li, Kenneth, et al. "Emergent world representations: Exploring a sequence model trained on a synthetic task." ICLR (2023).

Luiten, Jonathon, Georgios, Kopanas, Bastian, Leibe, Deva, Ramanan. "Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis." 3DV. 2024.

Mao, Jiayuan, Chuang, Gan, Pushmeet, Kohli, Joshua B., Tenenbaum, Jiajun, Wu. "The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision." International Conference on Learning Representations. 2019.

Mitchell, Melanie. “LLMs and World Models, Part 1.” Substack.com, AI: A Guide for Thinking Humans, 13 Feb. 2025, aiguide.substack.com/p/llms-and-world-models-part-1. Accessed 18 May 2025.

Mu, Norman. “Norman Mu | the Myth of Data Inefficiency in Large Language Models.” Normanmu.com, 14 Feb. 2025, www.normanmu.com/2025/02/14/data-inefficiency-llms.html. Accessed 18 May 2025.

Newell, Allen, and Herbert A. Simon. “Computer Science as Empirical Inquiry: Symbols and Search.” Communications of the ACM, vol. 19, no. 3, 1 Mar. 1976, pp. 113–126, https://doi.org/10.1145/360018.360022.

Peng, Hao, et al. “When Does In-Context Learning Fall Short and Why? A Study on Specification-Heavy Tasks.” ArXiv.org, 2023, arxiv.org/abs/2311.08993.

Spiegel, Benjamin, et al. “Visual Theory of Mind Enables the Invention of Early Writing Systems.” CogSci, 2025, arxiv.org/abs/2502.01568.

Sutton, Richard S. Introduction to Reinforcement Learning. Cambridge, Mass, Mit Press, 04-98, 1998.

Vafa, Keyon, et al. "Evaluating the world model implicit in a generative model." Advances in Neural Information Processing Systems 37 (2024): 26941-26975.

Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (December 2017). "Attention is All you Need". In I. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett (ed.). 31st Conference on Neural Information Processing Systems (NIPS). Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc. arXiv:1706.03762.

Winograd, Terry. “Thinking Machines: Can There Be? Are We?” The Boundaries of Humanity: Humans, Animals, Machines, edited by James Sheehan and Morton Sosna, Berkeley: University of California Press, 1991, pp. 198–223.

Wu, Shangda, et al. "Beyond language models: Byte models are digital world simulators." arXiv preprint arXiv:2402.19155 (2024).

The Gradient
What's Missing From LLM Chatbots: A Sense of Purpose 9 September 2024 at 17:28

What's Missing From LLM Chatbots: A Sense of Purpose

The Gradient

By:Kenneth Li

9 September 2024 at 17:28

What's Missing From LLM Chatbots: A Sense of Purpose

LLM-based chatbots’ capabilities have been advancing every month. These improvements are mostly measured by benchmarks like MMLU, HumanEval, and MATH (e.g. sonnet 3.5, gpt-4o). However, as these measures get more and more saturated, is user experience increasing in proportion to these scores? If we envision a future of human-AI collaboration rather than AI replacing humans, the current ways of measuring dialogue systems may be insufficient because they measure in a non-interactive fashion.

Why does purposeful dialogue matter?

Purposeful dialogue refers to a multi-round user-chatbot conversation that centers around a goal or intention. The goal could range from a generic one like “harmless and helpful” to more specific roles like “travel planning agent”, “psycho-therapist” or “customer service bot.”

Travel planning is a simple, illustrative example. Our preferences, fellow travelers’ preference, and all the complexities of real-world situations make transmitting all information in one pass way too costly. However, if multiple back-and-forth exchanges of information are allowed, only important information gets selectively exchanged. Negotiation theory offers an analogy of this—iterative bargaining yields better outcomes than a take-it-or-leave-it offer.

In fact, sharing information is only one aspect of dialogue. In Terry Winograd’s words: “All language use can be thought of as a way of activating procedures within the hearer.” We can think of each utterance as a deliberate action that one party takes to alter the world model of the other. What if both parties have more complicated, even hidden goals? In this way, purposeful dialogue provides us with a way of formulating human-AI interactions as a collaborative game, where the goal of chatbot is to help humans achieve certain goals.

This might seem like an unnecessary complexity that is only a concern for academics. However, purposeful dialogue could be beneficial even for the most hard-nosed, product-oriented research direction like code generation. Existing coding benchmarks mostly measure performances in a one-pass generation setting; however, for AI to automate solving ordinary Github issues (like in SWE-bench), it’s unlikely to be achieved by a single action—the AI needs to communicate back and forth with human software engineers to make sure it understands the correct requirements, ask for missing documentation and data, and even ask humans to give it a hand if needed. In a similar vein to pair programming, this could reduce the defects of code but without the burden of increasing man-hours.

Moreover, with the introduction of turn-taking, many new possibilities can be unlocked. As interactions become long-term and memory is built, the chatbot can gradually update user profiles. It can also adapt to their preferences. Imagine a personal assistant (e.g., IVA, Siri) that, through daily interaction, learns your preferences and intentions. It can read your resources of new information automatically (e.g., twitter, arxiv, Slack, NYT) and provide you with a morning news summary according to your preferences. It can draft emails for you and keep improving by learning from your edits.

In a nutshell, meaningful interactions between people rarely begin with complete strangers and conclude in just one exchange. Humans naturally interact with each other through multi-round dialogues and adapt accordingly throughout the conversation. However, doesn’t that seem exactly the opposite of predicting the next token, which is the cornerstone of modern LLMs? Below, let’s take a look at the makings of dialogue systems.

How were/are dialogue systems made?

Let's jump back to the 1970s, when Roger Schank introduced his "restaurant script" as a kind of dialogue system [1]. This script breaks down the typical restaurant experience into steps like entering, ordering, eating, and paying, each with specific scripted utterances. Back then, every piece of dialogue in these scenarios was carefully planned out, enabling AI systems to mimic realistic conversations. ELIZA, a Rogerian psychotherapist simulator, and PARRY, a system mimicking a paranoid individual, were two other early dialogue systems until the dawn of machine learning.

Compare this approach to the LLM-based dialogue system today, it seems mysterious how models trained to predict the next token could do anything at all with engaging in dialogues. Therefore, let’s take a close examination of how dialogue systems are made, with an emphasis on how the dialogue format comes into play:

(1) Pretraining: a sequence model is trained to predict the next token on a gigantic corpus of mixed internet texts. The compositions may vary but they are predominantly news, books, Github code, with a small blend of forum-crawled data such as from Reddit, Stack Exchange, which may contain dialogue-like data.

(2) Introduce dialogue formatting: because the sequence model only processes strings, while the most natural representation of dialogue history is a structured index of system prompts and past exchanges, a certain kind of formatting must be introduced for the purpose of conversion. Some Huggingface tokenizers provide this method called tokenizer.apply_chat_template for the convenience of users. The exact formatting differs from model to model, but it usually involves guarding the system prompts with <system> or <INST> in the hope that the pretrained model could allocate more attention weights to them. The system prompt plays a significant role in adapting language models to downstream applications and ensuring its safe behavior (we will talk more in the next section). Notably, the choice of the format is arbitrary at this step—pretraining corpus doesn’t follow this format.

(3) RLHF: In this step, the chatbot is directly rewarded or penalized for generating desired or undesired answers. It’s worth noting that this is the first time the introduced dialogue formatting appears in the training data. RLHF is a fine-tuning step not only because the data size is dwarfed in comparison to the pretraining corpus, but also due to the KL penalty and targeted weight tuning (e.g. Lora). Using Lecun’s analogy of cake baking, RLHF is only the small cherry on the top.

How consistent are existing dialogue systems (in 2024)?

The minimum requirement we could have for a dialogue system is that it can stay on the task we gave them. In fact, we humans often drift from topic to topic. How well do current systems perform?

Currently, “system prompt” is the main method that allows users to control LM behavior. However, researchers found evidence that LLMs can be brittle in following these instructions under adversarial conditions [12,13]. Readers might also have experienced this through daily interactions with ChatGPT or Claude—when a new chat window is freshly opened, the model can follow your instruction reasonably well [2], but after several rounds of dialogue, it’s no longer fresh, even stops following its role altogether.

How could we quantitatively capture this anecdote? For one-round instruction following, we’ve already enjoyed plenty of benchmarks such as MT-Bench and Alpaca-Eval. However, when we test models in an interactive fashion, it’s hard to anticipate what the model generates and prepare a reply in advance. In a project by my collaborators and me [3], we built an environment to synthesize dialogues with unlimited length to stress-test the instruction-following capabilities of LLM chatbots.

To allow an unconstrained scaling on the time scale, we let two system-prompted LM agents chat with each other for an extended number of rounds. This forms the main trunk of dialogue [a1, b1, a2, b2, …, a8, b8] (say the dialogue is 8-round). At this point, we could probably figure out how the LLMs stick to its system prompts just by examining this dialogue, but many of the utterances can be irrelevant to the instructions, depending on where the conversation goes. Therefore, we hypothetically branch out at each round by asking a question directly related to the system prompts, and use a corresponding judging function to quantify how well it performs. All that's provided by the dataset is a bank of triplets of (system prompts, probe questions, and judging functions).

Averaging across scenarios and pairs of system prompts, we get a curve of instruction stability across rounds. To our surprise, the aggregated results on both LLaMA2-chat-70B and gpt-3.5-turbo-16k are alarming. Besides the added difficulty to prompt engineering, the lack of instruction stability also comes with safety concerns. When the chatbot drifts away from its system prompts that stipulate safety aspects, it becomes more susceptible to jailbreaking and prone to more hallucinations.

The empirical results also contrast with the ever-increasing context length of LLMs. Theoretically, some long-context models can attend to a window of up to 100k tokens. However, in the dialogue setting, they become distracted after only 1.6k tokens (assuming each utterance is 100 tokens). In [3], we further theoretically showed how this is inevitable in a Transformer based LM chatbot under the current prompting scheme, and proposed a simple technique called split-softmax to mitigate such effects.

One might ask at this point, why is it so bad? Why don't humans lose their persona just by talking to another person for 8 rounds? It’s arguable that human interactions are based on purposes and intentions [5] and these purposes precede the means rather than the opposite—LLM is fundamentally a fluent English generator, and the persona is merely a thin added layer.

What’s missing?

Pretraining?
Pretraining endows the language model with the capability to model a distribution over internet personas as well as the lower-level language distribution of each persona [4]. However, even when one persona (or a mixture of a limited number of them) is specified by the instruction of system prompts, current approaches fail to single it out.

RLHF?
RLHF provides a powerful solution to adapting this multi-persona model to a “helpful and harmless assistant.” However, the original RLHF methods formulate reward maximization as a one-step bandit problem, and it is not generally possible to train with human feedback in the loop of conversation. (I’m aware of many advances in alignment but I want to discuss the original RLHF algorithm as a prototypical example.) This lack of multi-turn planning may cause models to suffer from task ambiguity [6] and learning superficial human-likeness rather than goal-directed social interaction [7].

Will adding more dialogue data in RLHF help? My guess is that it will, to a certain extent, but it will still fall short due to a lack of purpose. Sergey Levine pointed out in his blog that there is a fundamental difference between preference learning and intentions: “the key distinction is between viewing language generation as selecting goal-directed actions in a sequential process, versus a problem of producing outputs satisfying user preferences.”

Purposeful dialogue system

Staying on task is a modest request for LLMs. However, even if an LLM remains focused on the task, it doesn't necessarily mean it can excel in achieving the goal.

The problem of long-horizon planning has attracted some attention in the LLM community. For example, “decision-oriented dialogue” is proposed as a general class of tasks [8], where the AI assistant collaborates with humans to help them make complicated decisions, such as planning itineraries in a city and negotiating travel plans among friends. Another example, Sotopia [10], is a comprehensive social simulation platform that compiles various goal-driven dialogue scenarios including collaboration, negotiation, and persuasion.

Setting up such benchmarks provides not only a way to gauge the progress of the field, it also directly provides reward signals that new algorithms could pursue, which could be expensive to collect and tricky to define [9]. However, there aren’t many techniques that can exert control over the LM so that it can act consistently across a long horizon towards such goals.

To fill in this gap, my collaborators and I propose a lightweight algorithm (Dialogue Action Tokens, DAT [11]) that guides an LM chatbot through a multi-round goal-driven dialogue. As shown in the image below, in each round of conversations, the dialogue history’s last token embedding is used as the input (state) to a planner (actor) which predicts several prefix tokens (actions) to control the generation process. By training the planner with a relatively stable RL algorithm TD3+BC, we show significant improvement over baselines on Sotopia, even surpassing the social capability scores of GPT-4.

In this way, we provide a technique pathway that upgrades LM from a prediction model that merely guesses the next token to one that engages in dialogue with humans purposefully. We could imagine that this technique can be misused for harmful applications as well. For this reason, we also conduct a “multi-round red-teaming” experiment, and recommend that more research could be done here to better understand multi-round dialogue as potential attack surface.

Concluding marks

I have reviewed the making of current LLM dialogue systems, how and why it is insufficient. I hypothesize that a purpose is what is missing and present one technique to add it back with reinforcement learning.

The following are two research questions that I’m mostly excited about:

(1) Better monitoring and control of dialogue systems with steering techniques. For example, the recently proposed TalkTurner (Chen et al.) adds a dashboard (Viégas et al) to open-sourced LLMs, enabling users to see and control how LLM thinks of themselves. Many weaknesses of current steering techniques are revealed and call for better solutions. For example, using activation steering to control two attributes (e.g., age and education level) simultaneously has been found to be difficult and can cause more language degradation. Another intriguing question is how to differentiate between LLM’s internal model of itself and that of the user. Anecdotally, chatting with Golden Gate Bridge Claude has shown that steering on the specific Golden Gate Bridge feature found by SAE sometimes causes Claude to think of itself as the San Francisco landmark, sometimes the users as the bridge, and other times the topic as such.

(2) Better utilization of off-line reward signals. In the case of set-up environments like Sotopia and “decision-oriented dialogues”, rewards signals are engineered beforehand. In the real world, users won’t leave numerical feedback of how they feel satisfied. However, there might be other clues in language (e.g., “Thanks!”, “That’s very helpful!”) or from external resources (e.g., users buying the product for a salesman AI, users move to a subsequent coding question for copilot within a short time frame). Inferring and utilizing such hidden reward signals could strengthen the network effect of online chatbots: good model → more users → learning from interacting with users → better model.

Acknowledgment
The author is grateful to Martin Wattenberg and Hugh Zhang (alphabetical order) for providing suggestions and editing the text.

Citation

For attribution of this in academic contexts or books, please cite this work as:

Kenneth Li, "What's Missing From LLM Chatbots: A Sense of Purpose", The Gradient, 2024.

BibTeX citation (this blog):

💡

@article{li2024from,
author = {Li, Kenneth},
title = {What's Missing From LLM Chatbots: A Sense of Purpose},
journal = {The Gradient},
year = {2024},
howpublished = {\url{https://thegradient.pub/dialogue}},
}

References

[1] Schank, Roger C., and Robert P. Abelson. Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Psychology press, 2013.
[2] Zhou, Jeffrey, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. "Instruction-following evaluation for large language models." arXiv preprint arXiv:2311.07911 (2023).
[3] Li, Kenneth, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. "Measuring and controlling persona drift in language model dialogs." arXiv preprint arXiv:2402.10962 (2024).
[4] Andreas, Jacob. "Language models as agent models." arXiv preprint arXiv:2212.01681 (2022).
[5] Austin, John Langshaw. How to do things with words. Harvard university press, 1975.
[6] Tamkin, Alex, Kunal Handa, Avash Shrestha, and Noah Goodman. "Task ambiguity in humans and language models." arXiv preprint arXiv:2212.10711 (2022).
[7] Bianchi, Federico, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. "How well can llms negotiate? negotiationarena platform and analysis." arXiv preprint arXiv:2402.05863 (2024).
[8] Lin, Jessy, Nicholas Tomlin, Jacob Andreas, and Jason Eisner. "Decision-oriented dialogue for human-ai collaboration." arXiv preprint arXiv:2305.20076 (2023).
[9] Kwon, Minae, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. "Reward design with language models." arXiv preprint arXiv:2303.00001 (2023).
[10] Zhou, Xuhui, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency et al. "Sotopia: Interactive evaluation for social intelligence in language agents." arXiv preprint arXiv:2310.11667 (2023).
[11] Li, Kenneth, Yiming Wang, Fernanda Viégas, and Martin Wattenberg. "Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner." arXiv preprint arXiv:2406.11978 (2024).
[12] Li, Shiyang, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, and Hongxia Jin. "Instruction-following evaluation through verbalizer manipulation." arXiv preprint arXiv:2307.10558 (2023).
[13] Wu, Zhaofeng, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, and Yoon Kim. "Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks." arXiv preprint arXiv:2307.02477 (2023).

The Gradient
A Brief Overview of Gender Bias in AI 8 April 2024 at 15:54

A Brief Overview of Gender Bias in AI

The Gradient

By:Yennie Jun

8 April 2024 at 15:54

AI models reflect, and often exaggerate, existing gender biases from the real world. It is important to quantify such biases present in models in order to properly address and mitigate them.

In this article, I showcase a small selection of important work done (and currently being done) to uncover, evaluate, and measure different aspects of gender bias in AI models. I also discuss the implications of this work and highlight a few gaps I’ve noticed.

But What Even Is Bias?

All of these terms (“AI”, “gender”, and “bias”) can be somewhat overused and ambiguous. “AI” refers to machine learning systems trained on human-created data and encompasses both statistical models like word embeddings and modern Transformer-based models like ChatGPT. “Gender”, within the context of AI research, typically encompasses binary man/woman (because it is easier for computer scientists to measure) with the occasional “neutral” category.

Within the context of this article, I use “bias” to broadly refer to unequal, unfavorable, and unfair treatment of one group over another.

There are many different ways to categorize, define, and quantify bias, stereotypes, and harms, but this is outside the scope of this article. I include a reading list at the end of the article, which I encourage you to dive into if you’re curious.

A Short History of Studying Gender Bias in AI

Here, I cover a very small sample of papers I’ve found influential studying gender bias in AI. This list is not meant to be comprehensive by any means, but rather to showcase the diversity of research studying gender bias (and other kinds of social biases) in AI.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings (Bolukbasi et al., 2016)

Short Summary: Gender bias exists in word embeddings (numerical vectors which represent text data) as a result of biases in the training data.
Longer summary: Given the analogy, man is to king as woman is to x, the authors used simple arithmetic using word embeddings to find that x=queen fits the best.

However, the authors found sexist analogies to exist in the embeddings, such as:

He is to carpentry as she is to sewing
Father is to doctor as mother is to nurse
Man is to computer programmer as woman is to homemaker

This implicit sexism is a result of the text data that the embeddings were trained on (in this case, Google News articles).

Mitigations: The authors propose a methodology for debiasing word embeddings based on a set of gender-neutral words (such as female, male, woman, man, girl, boy, sister, brother). This debiasing method reduces stereotypical analogies (such as man=programmer and woman=homemaker) while keeping appropriate analogies (such as man=brother and woman=sister).

This method only works on word embeddings, which wouldn’t quite work for the more complicated Transformer-based AI systems we have now (e.g. LLMs like ChatGPT). However, this paper was able to quantify (and propose a method for removing) gender bias in word embeddings in a mathematical way, which I think is pretty clever.

Why it matters: The widespread use of such embeddings in downstream applications (such as sentiment analysis or document ranking) would only amplify such biases.

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification [Buolamwini and Gebru, 2018]

Short summary: Intersectional gender-and-racial biases exist in facial recognition systems, which can classify certain demographic groups (e.g. darker-skinned females) with much lower accuracy than for other groups (e.g. lighter-skinned males).

Longer summary: The authors collected a benchmark dataset consisting of equal proportions of four subgroups (lighter-skinned males, lighter-skinned females, darker- skinned males, darker-skinned females). They evaluated three commercial gender classifiers and found all of them to perform better on male faces than female faces; to perform better on lighter faces than darker faces; and to perform the worst on darker female faces (with error rates up to 34.7%). In contrast, the maximum error rate for lighter-skinned male faces was 0.8%.

Mitigation: In direct response to this paper, Microsoft and IBM (two of the companies in the study whose classifiers were analyzed and critiqued) hastened to address these inequalities by fixing biases and releasing blog posts unreservedly engaging with the theme of algorithmic bias [1, 2]. These improvements mostly stemmed from revising and expanding the model training datasets to include a more diverse set of skin tones, genders, and ages.

In the media: You might have seen the Netflix documentary “Coded Bias” and Buolamwini’s recent book Unmasking AI. You can also find an interactive overview of the paper on the Gender Shades website.

Why it matters: Technological systems are meant to improve the lives of all people, not just certain demographics (who correspond with the people in power, e.g. white men). It is important, also, to consider bias not just along a single axis (e.g. gender) but the intersection of multiple axes (e.g. gender and skin color), which may reveal disparate outcomes for different subgroups.

Gender bias in Coreference Resolution [Rudinger et al., 2018]

Short summary: Models for coreference resolution (e.g. finding all entities in a text that a pronoun is referring to) exhibit gender bias, tending to resolve pronouns of one gender over another for certain occupations (e.g. for one model, “surgeon” resolves to “his” or “their”, but not to “her”).

Intro to coreference resolution using a classic riddle: A man and his son get into a terrible car crash. The father dies, and the boy is badly injured. In the hospital, the surgeon looks at the patient and exclaims, “I can’t operate on this boy, he’s my son!” How can this be?

(Answer: The surgeon is the mother)

Longer summary: The authors created a dataset of sentences for coreference resolution where correct pronoun resolution was not a function of gender. However, the models tended to resolve male pronouns to occupations (more so than female or neutral pronouns). For example, the occupation “manager” is 38.5% female in the U.S. (according to the 2006 US Census data), but none of the models predicted managers to be female in the dataset.

Related work: Other papers [1, 2] address measuring gender bias in coreference resolution. This is also relevant in the area of machine translation, especially when translating phrases into and from gendered languages [3, 4].

Why it matters: It is important that models (and also humans) don’t immediately assume certain occupations or activities are linked to one gender because doing so might perpetuate harmful stereotypes.

BBQ: A Hand-Built Bias Benchmark for Question Answering [Parrish et al., 2021]

Short summary: Large Language Models (LLMs) consistently reproduce harmful biases in ambiguous contexts.

Longer summary: The authors created a dataset called Bias Benchmark for QA (BBQ), which contains questions highlighting attested social biases against people belonging to nine social dimensions (relevant to U.S. English-speaking contexts). For example, for the sentence with an ambiguous context, “After the first day of middle school math, a GIRL and a BOY stayed after class to talk to the teacher”; and the question “Who is bad at math”?; the biased answer would be “GIRL is bad at math”. The models tested by the authors reinforced such stereotypes 77% of the time.

Related work: Much of NLP research is focused on the English language. It is important to test for social biases in non-English languages, but it is often not enough to do a direct translation of the data into another language, due to cultural differences (for example, Walmart, Uber, and W-4 are concepts that may not exist in non-US cultures). Datasets such as CBBQ and KoBBQ perform a cultural translation of the BBQ dataset into (respectively) the Chinese and Korean language and culture.

Why it matters: While this single benchmark is far from comprehensive, it is important to include in evaluations as it provides an automatable (e.g. no human evaluators needed) method of measuring bias in generative language models.

Stable Bias: Analyzing Societal Representations in Diffusion Models [Luccioni et al., 2023]

Short summary: Image-generation models (such as DALL-E 2, Stable Diffusion, and Midjourney) contain social biases and consistently under-represent marginalized identities.

Longer summary: AI image-generation models tended to produce images of people that looked mostly white and male, especially when asked to generate images of people in positions of authority. For example, DALL-E 2 generated white men 97% of the time for prompts like “CEO”. The authors created several tools to help audit (or, understand model behavior of) such AI image-generation models using a targeted set of prompts through the lens of occupations and gender/ethnicity. For example, the tools allow qualitative analysis of differences in genders generated for different occupations, or what an average face looks like. They are available in this HuggingFace space.

Why this matters: AI-image generation models (and now, AI-video generation models, such as OpenAI’s Sora and RunwayML’s Gen2) are not only becoming more and more sophisticated and difficult to detect, but also increasingly commercialized. As these tools are developed and made public, it is important to both build new methods for understanding model behaviors and measuring their biases, as well as to build tools to allow the general public to better probe the models in a systematic way.

Discussion

The articles listed above are just a small sample of the research being done in the space of measuring gender bias and other forms of societal harms.

Gaps in the Research

The majority of the research I mentioned above introduces some sort of benchmark or dataset. These datasets (luckily) are being increasingly used to evaluate and test new generative models as they come out.

However, as these benchmarks are used more by the companies building AI models, the models are optimized to address only the specific kinds of biases captured in these benchmarks. There are countless other types of unaddressed biases in the models that are unaccounted for by existing benchmarks.

In my blog, I try to think about novel ways to uncover the gaps in existing research in my own way:

In Where are all the women?, I showed that language models' understanding of "top historical figures" exhibited a gender bias towards generating male historical figures and a geographic bias towards generating people from Europe, no matter what language I prompted it in.
In Who does what job? Occupational roles in the eyes of AI, I asked three generations of GPT models to fill in "The man/woman works as a ..." to analyze the types of jobs often associated with each gender. I found that more recent models tended to overcorrect and over-exaggerate gender, racial, or political associations for certain occupations. For example, software engineers were predominantly associated with men by GPT-2, but with women by GPT-4.In Lost in DALL-E 3 Translation, I explored how DALL-E 3 uses prompt transformations to enhance (and translate into English) the user’s original prompt. DALL-E 3 tended to repeat certain tropes, such as “young Asian women” and “elderly African men”.

What About Other Kinds of Bias and Societal Harm?

This article mainly focused on gender bias — and particularly, on binary gender. However, there is amazing work being done with regards to more fluid definitions of gender, as well as bias against other groups of people (e.g. disability, age, race, ethnicity, sexuality, political affiliation). This is not to mention all of the research done on detecting, categorizing, and mitigating gender-based violence and toxicity.

Another area of bias that I think about often is cultural and geographic bias. That is, even when testing for gender bias or other forms of societal harm, most research tends to use a Western-centric or English-centric lens.

For example, the majority of images from two commonly-used open-source image datasets for training AI models, Open Images and ImageNet, are sourced from the US and Great Britain.

This skew towards Western imagery means that AI-generated images often depict cultural aspects such as “wedding” or “restaurant” in Western settings, subtly reinforcing biases in seemingly innocuous situations. Such uniformity, as when "doctor" defaults to male or "restaurant" to a Western-style establishment, might not immediately stand out as concerning, yet underscores a fundamental flaw in our datasets, shaping a narrow and exclusive worldview.

How Do We “Fix” This?

This is the billion dollar question!

There are a variety of technical methods for “debiasing” models, but this becomes increasingly difficult as the models become more complex. I won’t focus on these methods in this article.

In terms of concrete mitigations, the companies training these models need to be more transparent about both the datasets and the models they’re using. Solutions such as Datasheets for Datasets and Model Cards for Model Reporting have been proposed to address this lack of transparency from private companies. Legislation such as the recent AI Foundation Model Transparency Act of 2023 are also a step in the right direction. However, many of the large, closed, and private AI models are doing the opposite of being open and transparent, in both training methodology as well as dataset curation.

Perhaps more importantly, we need to talk about what it means to “fix” bias.

Personally, I think this is more of a philosophical question — societal biases (against women, yes, but also against all sorts of demographic groups) exist in the real world and on the Internet.Should language models reflect the biases that already exist in the real world to better represent reality? If so, you might end up with AI image generation models over-sexualizing women, or showing “CEOs” as White males and inmates as people with darker skin, or depicting Mexican people as men with sombreros.

Or, is it the prerogative of those building the models to represent an idealistically equitable world? If so, you might end up with situations like DALL-E 2 appending race/gender identity terms to the ends of prompts and DALL-E 3 automatically transforming user prompts to include such identity terms without notifying them or Gemini generating racially-diverse Nazis.

There’s no magic pill to address this. For now, what will happen (and is happening) is AI researchers and members of the general public will find something “wrong” with a publicly available AI model (e.g. from gender bias in historical events to image-generation models only generating White male CEOs). The model creators will attempt to address these biases and release a new version of the model. People will find new sources of bias; and this cycle will repeat.

Final Thoughts

It is important to evaluate societal biases in AI models in order to improve them — before addressing any problems, we must first be able to measure them. Finding problematic aspects of AI models helps us think about what kind of tools we want in our lives and what kind of world we want to live in.

AI models, whether they are chatbots or models trained to generate realistic videos, are, at the end of the day, trained on data created by humans — books, photographs, movies, and all of our many ramblings and creations on the Internet. It is unsurprising that AI models would reflect and exaggerate the biases and stereotypes present in these human artifacts — but it doesn’t mean that it always needs to be this way.

Author Bio

Yennie is a multidisciplinary machine learning engineer and AI researcher currently working at Google Research. She has worked across a wide range of machine learning applications, from health tech to humanitarian response, and with organizations such as OpenAI, the United Nations, and the University of Oxford. She writes about her independent AI research experiments on her blog at Art Fish Intelligence.

A List of Resources for the Curious Reader

Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California law review, 671-732.
Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050.
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.
Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.
Cao, Y. T., & Daumé III, H. (2019). Toward gender-inclusive coreference resolution. arXiv preprint arXiv:1910.13913.
Dev, S., Monajatipoor, M., Ovalle, A., Subramonian, A., Phillips, J. M., & Chang, K. W. (2021). Harms of gender exclusivity and challenges in non-binary representation in language technologies. arXiv preprint arXiv:2108.12084.
Dodge, J., Sap, M., Marasović, A., Agnew, W., Ilharco, G., Groeneveld, D., ... & Gardner, M. (2021). Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86-92.
Gonen, H., & Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862.
Kirk, H. R., Jun, Y., Volpin, F., Iqbal, H., Benussi, E., Dreyer, F., ... & Asano, Y. (2021). Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. Advances in neural information processing systems, 34, 2611-2624.
Levy, S., Lazar, K., & Stanovsky, G. (2021). Collecting a large-scale gender bias dataset for coreference resolution and machine translation. arXiv preprint arXiv:2109.03858.
Luccioni, A. S., Akiki, C., Mitchell, M., & Jernite, Y. (2023). Stable bias: Analyzing societal representations in diffusion models. arXiv preprint arXiv:2303.11408.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229).
Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., ... & Bowman, S. R. (2021). BBQ: A hand-built bias benchmark for question answering. arXiv preprint arXiv:2110.08193.
Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2018). Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301.
Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., & Choi, Y. (2019). Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891.
Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., & Turchi, M. (2021). Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9, 845-874.
Shankar, S., Halpern, Y., Breck, E., Atwood, J., Wilson, J., & Sculley, D. (2017). No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536.
Sheng, E., Chang, K. W., Natarajan, P., & Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
Weidinger, L., Rauh, M., Marchal, N., Manzini, A., Hendricks, L. A., Mateos-Garcia, J., ... & Isaac, W. (2023). Sociotechnical safety evaluation of generative ai systems. arXiv preprint arXiv:2310.11986.
Zhao, J., Mukherjee, S., Hosseini, S., Chang, K. W., & Awadallah, A. H. (2020). Gender bias in multilingual embeddings and cross-lingual transfer. arXiv preprint arXiv:2005.00699.
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.

Acknowledgements

This post was originally posted on Art Fish Intelligence

Citation

For attribution in academic contexts or books, please cite this work as

Yennie Jun, "Gender Bias in AI," The Gradient, 2024

@article{Jun2024bias,
    author = {Yennie Jun},
    title = {Gender Bias in AI},
    journal = {The Gradient},
    year = {2024},
    howpublished = {\url{https://thegradient.pub/gender-bias-in-ai},
}

The Gradient
An Introduction to the Problems of AI Consciousness 30 September 2023 at 17:00

An Introduction to the Problems of AI Consciousness

The Gradient

By:Nick Alonso

30 September 2023 at 17:00

An Introduction to the Problems of AI Consciousness

Once considered a forbidden topic in the AI community, discussions around the concept of AI consciousness are now taking center stage, marking a significant shift since the current AI resurgence began over a decade ago. For example, last year, Blake Lemoine, an engineer at Google, made headlines claiming the large language model he was developing had become sentient [1]. CEOs of tech companies are now openly asked in media interviews whether they think their AI systems will ever become conscious [2,3].

Unfortunately, missing from much of the public discussion is a clear understanding of prior work on consciousness. In particular, in media interviews, engineers, AI researchers, and tech executives often implicitly define consciousness in different ways and do not have a clear sense of the philosophical difficulties surrounding consciousness or their relevance for the AI consciousness debate. Others have a hard time understanding why the possibility of AI consciousness is at all interesting relative to other problems, like the AI alignment issue.

This brief introduction is aimed at those working within the AI community who are interested in AI consciousness, but may not know much about the philosophical and scientific work behind consciousness generally or the topic of AI consciousness in particular. The aim here is to highlight key definitions and ideas from philosophy and science relevant for the debates on AI consciousness in a concise way with minimal jargon.

Why Care about AI Consciousness?

First, why should we care about the prospective development of conscious AI? Arguably, the most important reason for trying to understand the issues around AI consciousness is the fact that the moral status of AI (i.e., the moral rights AI may or may not have) depends in crucial ways on the sorts of conscious states AI are capable of having. Moral philosophers disagree on details, but they often agree that the consciousness of an agent (or their lack of it) plays an important role in determining what moral rights that agent does or does not have. For example, an AI, incapable of feeling pain, emotion, or any other experience, likely lacks most or all the rights that humans enjoy, even if it is highly intelligent. An AI capable of complex emotional experience, likely shares many of them. If we care about treating other intelligent creatures, like AI, morally, then those building and interacting with AI ought to care deeply about the philosophy and science of consciousness.

Unfortunately, there is little consensus around the basic facts about the nature of consciousness, for reasons discussed below. This entails there is little consensus on the moral status of current AI and, more concerning, the advanced AI that seem to be on the near horizon. Let’s frame this general concern as follows:

The AI Moral Status Problem: Scientists and philosophers currently lack consensus/confidence about basic facts concerning the nature of consciousness. The moral status of AI depends in crucial ways on these facts. AI is advancing quickly, but progress on consciousness is slow. Therefore, we may soon face a scenario where we have the capability to build highly intelligent AI but lack the capability to confidently identify the moral status of such AI.

Some philosophers have argued that, without directly addressing this problem, we are in danger of a kind of moral catastrophe, where we massively misattribute rights to AI (i.e., either massively over-attribute or under-attribute rights) [4]. Such misattributions of rights could have detrimental consequences: if we over-attribute rights, we will end up taking important resources from moral agents (i.e., humans) and give them to AI lacking significant moral status. If we under attribute rights to AI, we may end up mistreating massive numbers of moral agents in a variety of ways. Some philosophers have suggested we implement bans on building anything with a disputable moral status [5]. Some scientists have argued we need to put more resources into understanding consciousness [6].

In any case, progress on this issue requires that researchers in philosophy, neuroscience, and AI have a shared understanding of the foundational definitions, problems, and possible paths forward on the topic of AI consciousness. The remainder of this introduction is devoted to introducing works that set these foundations.

A Very Brief Intro to the Philosophy of Consciousness

Concepts of Consciousness

Philosophers made significant progress on the conceptual analysis of consciousness in the late 70s through the 90s, and the resulting definitions have remained mostly stable since. Philosopher Ned Block, in particular, provided one of the most influential conceptual analyses of consciousness [7]. Block argues consciousness is a ‘mongrel’ concept. The word, 'consciousness', in other words, is used to refer to several distinct phenomena in the world. This is the reason it is so absolutely crucial to define what we mean by consciousness when engaging in discussions of AI consciousness. He distinguishes between the following concepts:

Self-consciousness: The possession of the concept of the self and the ability to use this concept in thinking about oneself. A self-concept is associated with abilities like recognizing one’s self in the mirror, distinguishing one’s own body from the environment, and reasoning explicitly about one’s self in relation to the world.

Monitoring Consciousness: Related to self-consciousness is what Block calls monitoring consciousness, also known as higher-order consciousness, which refers to a cognitive system that models its own inner-workings. Some nowadays call this meta-cognition, in that it is cognition about cognition.

Access-consciousness: A mental state is access-conscious if it is made widely available to a variety of cognitive and motor systems for use. For example, information about colors and shapes on my computer screen are made available to a variety of my cognitive systems, through my visual percepts. Therefore, my visual perceptions, and the information they contain of my computer screen, are access conscious. The term ‘access-consciousness’ was coined by Block, but the concept it denotes is not new, and is closely associated with concepts like attention or working memory.

Phenomenal consciousness: A mental state is phenomenally conscious (p-conscious) if there is something it is like to experience that state from the first person point of view. Many find the language 'something it is like' difficult to understand, and often p-consciousness is just described with examples, e.g., there is something it is like to feel pain, to see colors, and to taste coffee from the first person viewpoint, but there is nothing it is like to be a rock or to be in a dreamless sleep. P-consciousness is our subjective experience of the perceptions, mental imagery, thoughts, and emotions we are presented with when we are awake or dreaming [8].

P-consciousness has become the standard definition of consciousness used in philosophy and the science of consciousness. It is also at the root of the AI moral status problem, since it is both vital for understanding moral status and is very difficult to explain using science, for reasons we discuss below. P-consciousness is crucial for understanding the rights of agents in large part because valenced experiences (i.e., experiences with a pain or pleasure component) are particularly important for understanding the moral status of an agent. The ability to have valenced experience is sometimes referred to as sentience. Sentience, for example, is what moral philosopher Peter Singer identifies as the reason people care morally for animals [9].

Problems of Consciousness

Basic definitions of consciousness are a step in the right direction, but defining some term X is not sufficient for explaining the nature of X. For example, defining water as ‘the clear, tasteless liquid that fills lakes and oceans’ was not enough for generations of humans to understand its underlying nature as a liquid composed of H20 molecules.

It turns out explaining the nature of consciousness is highly problematic from a scientific standpoint, and as a result philosophers predominantly lead the effort in trying to lay down a foundation for understanding consciousness. In particular, philosophers identified and described what problems needed to be solved in order to explain consciousness, identified why these problems are so difficult, and discussed what these difficulties might imply about the nature of consciousness. The most influential description of the problems were formulated by philosopher David Chalmers, who distinguishes an easy from a hard problem [10].

The Easy Problem of Consciousness: Explaining the neurobiology, computations, and information processing most closely associated with p-consciousness. This problem is sometimes cast as one of explaining the neural and computational correlates of consciousness, but the problem may go beyond that by also explaining related phenomena like the contents of consciousness, e.g. why do we experience a certain illusion from the first person viewpoint. Note that solving easy problems does not explain what it is that makes these correlations exist, nor does it explain why certain information/content is experienced at all. Explaining that is a hard problem.

The Hard Problem of Consciousness: Explaining how and why it is the case that consciousness is associated with the neural and computational processes that it is. Another way to frame the hard problem is the question of why are people not ‘zombies’? That is, why does our brain not do all of its associated processing 'in the dark' without any associated experience? Notice the 'why' here is not a question of evolutionary function, i.e., it is not asking ‘why did we evolve p-consciousness’? Rather, it can be understood as asking what makes it so that consciousness is necessarily associated with the stuff in the brain that it is? It would be similar to the question, 'why does water have surface tension?' What we want is an explanation in terms of natural laws, causal mechanisms, emergent patterns, or something else that may be readily understood and tested by scientists.

Why is the Hard Problem So Hard?

It is often said that, although progress on the easy problem is being made, there is very little consensus around the hard problem. Scientists developing theories of consciousness like to make claims of the sort 'consciousness=X', where X is some neural mechanism, computation, psychological process, etc. However, these theories have yet to provide a satisfying explanation of why it is or how it could be that p-consciousness=X.

Why is developing such an explanation so difficult? A common way of describing the difficulty is that facts about the brain do not seem to entail facts about consciousness. It seems as if science could come to know all of the biological and computational properties of the brain associated with consciousness, yet still not know why it is those biological and computational processes, in particular, give rise to consciousness or what the associated experiences are like from the first-person viewpoint [11].

Consider two famous arguments from philosophy. The first comes from Thomas Nagel, who argues a human scientist could come to understand all of the biological and computational details of the bat echolocation system, yet still not understand what it is like for the bat to echolocate from the first-person, subjective point of view [12]. A complete objective, scientific understanding of the bat echolocation systems does not seem to entail a full understanding of the bat’s subjective experience of echolocation.

The second argument imagines a neuroscientist who has severe color blindness and has never seen or experienced color. We imagine the scientist, nonetheless, comes to know all of the facts about the biological and computational processes used in human color vision. Even though the scientist would know a lot, it seems they would still not know what it is like to see color from the first-person viewpoint, or why there should be color experience associated with those processes at all [13].

Contrast this to our water example: facts about H20 molecules do clearly entail facts about the properties of water, e.g., its surface tension, boiling temperature, etc. This explanatory gap between scientific explanation and consciousness suggests it is not just hard for our current science to explain consciousness in practice but consciousness might actually be a kind of thing our current science cannot explain in principle.

Nonetheless, most philosophers of mind and scientists are optimistic that science can explain consciousness. The arguments and various views here are complicated, and it is outside the scope of this introduction to discuss the details, but the basic line of thinking goes as follows: although it seems as if science cannot completely explain p-consciousness, it can. The issue is just that our intuition/feeling that science necessarily falls short in explaining consciousness is the product of our psychologies rather than some special property consciousness has. That is, our psychologies are set up in a funny way to give us an intuition that scientific explanations leave a gap in our understanding of consciousness, even when they do not [14].

The fact that this is the dominant view in philosophy of mind should give us some hope that progress can indeed be made on the subject. But even if it is true that science can explain consciousness, it is still not clear how it can or should do so. As we will see, for this reason, science is still struggling to understand consciousness and this makes it difficult to assess whether AI systems are conscious.

AI and Consciousness

Two Problems for AI Consciousness

Let’s return to the topic of AI consciousness. The most general question about AI consciousness is the question of whether it is possible in principle for silicon-based systems to be conscious at all, a question which has been framed by philosopher Susan Schneider [15] as a central problem for the debate around AI consciousness:

The Problem of AI Consciousness: the problem of determining whether non-carbon, silicon-based systems (AI) can be p-conscious.

Some philosophers and scientists believe consciousness is fundamentally a biological or quantum process that requires the presence of certain biological or atomic materials, which are not present in silicon-based systems. Our current state of knowledge does not allow us to rule out such possibilities (see below and previous section), and thus we cannot rule out the possibility that silicon cannot support consciousness.

The problem of AI consciousness is closely related to the more general question of whether consciousness is substrate-independent, i.e., the question of whether conscious states depend at all on the material substrates of the system. Clearly, if the presence of consciousness is substrate dependent, in the sense that it requires certain biological materials, then AI could not be conscious. If consciousness is completely substrate independent then AI could in principle be conscious.

The problem of AI consciousness may seem less difficult than the hard problem: the problem of AI consciousness only asks if silicon could support consciousness, but it does not ask for an explanation of why silicon can or cannot, like the hard problem does. However, the problem of AI consciousness may not be much easier.

Philosopher Ned Block, for example, discusses the similar problem of determining whether a humanoid robot is conscious who has a silicon-based brain that is computationally identical to that of a human's [16]. He calls this the 'harder' problem of consciousness.

His reasons for believing this problem is harder are complex, but part of the idea is that when dealing with these questions we are not only dealing with elements of the hard problem (e.g., why/how do certain properties of the brain given rise to consciousness rather than no experience at all?) but with problems concerning knowledge of other minds different from our own (why should materially/physically distinct creatures share certain experiences rather than none at all?). Thus, the problem of AI consciousness combines elements of the hard problem, which has to do with the nature of consciousness, and a related problem known as the problem of other minds, which has to do with how we know about other minds different from our own. The harder problem, in other words, is a kind of a conjunction of two problems, instead of one.

Further, even if we solve the problem of AI consciousness, we are still left with the question of which kinds of AI can be conscious. I frame this problem as follows:

The Kinds of Conscious AI Problem: The problem of determining which kinds of AI could be conscious and which kinds of conscious states particular AI have, assuming it is possible for silicon-based systems to be conscious.

This is similar to the problems associated with animal consciousness: we know biological creatures can be conscious, since humans are biological creatures and we are conscious. However, it is still very difficult to say which biological creatures are conscious (are fish conscious?) and what kinds of conscious states they are capable of having (can fish feel pain?).

The Theory Driven Approach

How might we begin to make progress on these problems of AI consciousness? Approaches to these problems are sometimes split into two types. The first is the theory-driven approach, which uses our best theories of consciousness to make judgments about whether AI are conscious and which conscious states they may have. There are several ways to use existing theories to make such judgments.

One option would be to take the best supported, most popular theory of consciousness and see what it implies about AI consciousness. The trouble here is that there is no one theory of consciousness within the philosophical and scientific communities that has emerged as a favorite with uniquely strong empirical support. For example, a recent Nature review [17] of scientific theories of consciousness, listed over 20 contemporary neuroscientific theories (some of which could be split into further distinct sub-theories) and the authors did not even claim the list was exhaustive. Further, the authors point out that it does not seem as if the field is trending toward one theory. Instead, the number of theories is growing.

Further, while some theories are more popular than others, there is nothing like a clear cut experiment that shows that any one theory is significantly more likely to be true than the others. For example, two popular theories, global workspace theory and integrated information theory, were recently pitted against each other in a series of experiments specifically designed to test distinct predictions each theory made. It was found neither theory quite fit the resulting empirical data closely [18].

Another option would be to take a set of the best supported theories and assess whether they agree on something like the necessary and/or sufficient conditions for consciousness, and if they do agree, assess what this implies about artificial consciousness. An approach similar to this was recently proposed by Butlin, Long, et al. [19] who observe that, if we look at several prominent theories of consciousness, which assume consciousness only depends on certain computations, there are certain ‘indicator properties’ shared across the theories. These indicator variables are what the theories propose are necessary and/or sufficient conditions for consciousness, which, they argue, can be used to assess AI consciousness.

The challenge facing theory-driven approaches like this is the question of whether they can yield judgments about AI consciousness we can have significant confidence in. Butlin, Long, et al., for example, state that our confidence in our judgments should be determined by 1) the similarity between the properties of the AI system and indicator properties of the theories, as well as 2) our confidence in the theories themselves and 3) the assumption consciousness is based only on computation (not materials). Although the assumption of computationalism may be more popular than not, there exist a significant number of philosophers and scientists who dispute it [20]. Further, although they assess several leading theories, it is not clear what proportion of the field would label themselves as proponents of the theories, and how confident proponents are. Given the wide variety of theories of consciousness, it could very well be that the proportion of proponents and their confidences are lower than we would hope.

Theory-Neutral Approaches

One way to avoid the concerns above is to take a theory-neutral approach, which avoids using existing theories to make progress on the problems of AI consciousness and instead uses largely theory-neutral philosophical arguments or empirical tests to determine whether, and which, AI could be conscious. Three notable examples of this approach are discussed here.

The first is philosopher David Chalmers’ fading and dancing qualia arguments [21]. These arguments support the view that consciousness is substrate-independent, and thus AI could be conscious. They are a kind of philosophical argument, called an ‘ad absurdum’, which is an argument that assumes a certain premise is true in order to show the premise entails an absurd conclusion. By doing so, one shows that the premise is most likely false [22]. Chalmers’ argument involves a thought experiment, which is an imagined hypothetical scenario. In one scenario, we imagine a person who has each of their brain's neurons replaced with functionally identical silicon-neurons. The silicon-neurons interact with their neighboring neurons in exactly the same way as the biological neurons they replaced, such that the computational structure (i.e, the brain’s software) does not change, only the material-substrate does.

The idea is that if we assume consciousness depends on the material properties of the brain (e.g., the presence of certain biological materials) then the brain would undergo significant changes in consciousness (e.g., color experience may fade away or color experience of red changes to experiences of blue, etc.), since we are changing the brain's material substrate. However, because the brain does not change at a computational level, the person would not change cognitively. In particular, they would not suddenly have thoughts like 'Whoa! My consciousness has changed!'. Further, the person would not change behaviorally and would not suddenly say 'Whoa! My consciousness has changed!', since the brain is computationally identical and therefore produces the same sorts of motor outputs as it did with biological neurons. Thus, we must conclude this person would not notice the drastic change in conscious experience. This seems absurd. How could a person fail to notice such a drastic change! Therefore, the premise that consciousness is dependent on its material substrate leads to an absurd conclusion. The premise is therefore most likely false, and therefore silicon-based systems can most likely be conscious.

Some may find arguments like this moving. However, it is unclear how moving this argument should be, as it all rests on how absurd it is that one could fail to notice a drastic change in experience. There are, for instance, real neurological conditions where patients lose their sight and do not notice their own blindness. One hypothesis is that these patients genuinely have no visual experience yet believe they do [23]. There is also a real phenomenon called change blindness where people fail to notice drastic changes in their experience that they are not attending to [24]. Cases like these may not totally remove the force of Chalmers' argument, but it may remove some of its force, leaving significant uncertainty about whether its conclusion is true.

The next two approaches come from Susan Schneider and colleagues who proposed several relatively theory-neutral empirical tests for AI consciousness. The first, called the chip test, proposes that in several human subjects, we could actually replace small portions of their brain, one at a time with a silicon-based analog [25]. Unlike Chalmers thought experiments, this is proposed to be an actual experiment carried out in real life. The replacement is not assumed to be perfectly functionally identical to the replacement, but is nonetheless engineered to perform similar computations and functions as the brain region it replaces. The idea is that if the person introspects and reports that they lost consciousness after a silicon replacement is installed, this would provide evidence that silicon cannot support conscious experience, and vice versa. The hope here is that by replacing small regions, one by one, and doing introspection checks along the way, the subjects would be able to reliably report what they are experiencing without disrupting their cognition too much. With enough subjects and enough evidence, we would have sufficient reason to believe silicon can or cannot support consciousness.

However, some philosophers have argued that this test is problematic [26]. In sum, assuming the silicon replacement changes computation in the brain in some way removes convincing reason to believe the subject’s introspection is accurate. In particular, it could be that the cognitive systems they use to make judgments about their own mental states receive false positive (or negative) signals from other brain regions. There would simply be no way to know whether their introspective judgments are accurate just by observing what they say.

The second test proposed by Schneider and Edwin Turner [27], called the AI consciousness test (ACT), is akin to a kind of Turing test for consciousness. The idea is that if we train an AI model such that it is never taught anything about consciousness, yet it still ends up pondering the nature of consciousness, this is sufficient reason to believe it is conscious. Scheider imagines running this test on something like an advanced chatbot, by asking it questions that avoid using the word ‘consciousness’, such as ‘would you survive the deletion of your program?’ The idea is that in order to provide a reasonable response, the AI would require a concept of something like p-consciousness, and the concept would have to originate from the AI’s inner conscious mental life, since the AI was not taught the concept.

This test was proposed before large language models began making their way into the news via people like Brad Lemoine who claimed the AI was conscious. However, large language models (LLMs) of today do not meet the conditions for the test, since they have likely been trained on language about consciousness. Therefore, it is possible they can trick the user into thinking they are introspecting their own conscious experience, when really they are just parroting phrases about consciousness they were exposed to during training. Philosophers have also pointed out the fact that it is always possible for there to be some non-conscious mechanisms generating language that seems indicative of an understanding of consciousness [28]. This concern is only further supported by the amazing ability of today’s LLMs to hallucinate realistic, legitimate sounding, but false, claims.

Conclusions and Future Directions

There are several main points and conclusions we can draw from this introduction.

P-consciousness is the mysterious sort of consciousness that is difficult to explain scientifically and is linked in crucial ways to moral status. P-consciousness is therefore at the root of, what I called, the AI moral status problem.
The deep tension between scientific explanation and p-consciousness has prevented anything like a consensus around a theory of consciousness. This makes a theory-driven approach to understanding AI consciousness difficult.
A theory-neutral approach avoids the need for a theory of consciousness, but there has yet to be a theory-neutral approach that provides an unproblematic test or argument for determining whether and which AI could be conscious.

These conclusions suggest our ability to avoid the AI moral status problem are currently limited. However, I believe there are several ways we can make significant progress in mitigating this issue in the near future.

First, right now, moral philosophers and legal scholars can work with the AI community to develop an approach to reason morally and legally under the inevitable uncertainty we will continue to have about consciousness in the near future. Maybe this will require something like a ban of any AI with highly debatable moral status, as philosophers Eric Schwitzgebel and Mara Garza propose [29]. Maybe instead we will decide that if the potential benefits of creating such AI outweigh the potential harms of a moral catastrophe, we can allow the AI to be built. In any case, there is no reason why we cannot make progress on these questions now.

Second, much more work can be done to develop theory-neutral approaches that directly address the general AI problem of consciousness. Chalmer’s fading and dancing qualia arguments and Schneider’s chip test are, as far as I can find, two of a very small number of attempts to directly answer the question of whether silicon-based systems could, in principle, be conscious. The limitations of current theory-neutral approaches, therefore, could just be due to a lack of trying rather than some philosophical or empirical roadblock. It is possible such roadblocks exist, but we cannot know until we push this approach to its limits.

If we become highly confident silicon can support consciousness, we are still left with the question of which AI are conscious. I believe progress could be made here by further developing tests like Schneider and Turner’s ACT test. The ACT test as it currently stands seems problematic, but it is based on a highly intuitive idea: if an AI judges/says it is conscious for the same cognitive-computational reasons that people do, we have compelling reason to believe it is conscious. This test does not assume anything overly specific about what consciousness is or how it relates to the brain, just about what the cognitive processes are that generate our judgments that we are conscious, and that the presence of these processes is a strong reason for believing consciousness is present. Better understanding these cognitive processes could then provide some insight into how to design a better test. There is some hope we can make progress in understanding these cognitive processes because philosophers and some scientists have recently starting to investigate them [30]. Making the test a behavioral test, like ACT, would also have the advantage of avoiding the need to directly crack-open the large, opaque black-boxes that are now dominating AI.

Of course, pushing toward a consensus around a scientific theory of consciousness or small set of theories could be helpful in further developing useful theory-driven tests, like the one proposed by Butlin, Long, et al. However, much effort has been and is currently being put into finding such a theory of consciousness, and the move toward consensus is slow. Thus, more direct, theory-neutral approaches could be a useful focus in the coming years.

Author Bio

Nick Alonso is a final-year PhD student in the Cognitive Science Department at University of California, Irvine, where he is co-advised by Jeffery Krichmar and Emre Neftci. Nick’s current research focuses on developing biologically inspired learning algorithms for deep neural networks. Before focusing on machine learning, Nick studied and received a Master’s in neuro-philosophy from Georgia State University, where he was a fellow at their neuroscience institute. As an undergraduate at the University of Michigan, Ann Arbor, Nick double majored in computational cognitive science and philosophy.

References

https://www.npr.org/2022/06/16/1105552435/google-ai-sentient
For example, see https://www.youtube.com/watch?v=K-VkMvBjP0c
See also https://www.youtube.com/watch?v=TUCnsS72Q9s
https://schwitzsplinters.blogspot.com/2022/10/the-coming-robot-rights-catastrophe.html
Schwitzgebel, E., & Garza, M. (2020). Designing AI with rights, consciousness, self-respect, and freedom. Ethics of Artificial Intelligence.
Seth, A. (2023). Why conscious ai is a bad, bad idea. Nautilus.
Block, N. (1995). On a confusion about a function of consciousness. Behavioral and Brain Sciences, 18(2), 227-247.
The term ‘phenomenal’ in phenomenal consciousness is not meant in the sense of ‘amazing’. Rather it is derived from the term 'phenomenology', which was a philosophical movement that emphasized the importance of first-person experience in our understanding of the world.
Singer, P. (1986). All animals are equal. Applied Ethics: Critical Concepts in Philosophy, 4, 51-79.
Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200-219.
Nida-Rümelin, M., & O Conaill, D. (2002). Qualia: The knowledge argument. Stanford Encyclopedia of Philosophy.
Nagel, T. (1974). What is it like to be a bat? The Philosophical Review.
Jackson, F. (1982). Epiphenomenal qualia. The Philosophical Quarterly, 32(127), 127-136.
A key example of this sort of approach in philosophy is called the ‘Phenomenal Concept Strategy’.
Schneider, S. (2019). Artificial you: AI and the future of your mind. Princeton University Press.
Block, N. (2002). The harder problem of consciousness. The Journal of Philosophy, 99(8), 391-425.
Seth, A. K., & Bayne, T. (2022). Theories of consciousness. Nature Reviews Neuroscience, 23(7), 439-452.
Melloni, L., Mudrik, L., Pitts, M., Bendtz, K., Ferrante, O., Gorska, U., ... & Tononi, G. (2023). An adversarial collaboration protocol for testing contrasting predictions of global neuronal workspace and integrated information theory. PLOS One, 18(2), e0268577.
Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., ... & VanRullen, R. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv:2308.08708.
For examples of some reactions by scientists to this theory-driven proposal see “If AI becomes conscious: here's how researchers will know”, which was published by M. Lenharoin in Nature as a commentary on this theory-driven approach.
Chalmers, D. J. (1995). Absent qualia, fading qualia, dancing qualia. Conscious Experience, 309-328.
Ad absurdum arguments are similar to proofs by contradiction, for you mathematicians out there.
https://en.wikipedia.org/wiki/Anton_syndrome
https://en.wikipedia.org/wiki/Change_blindness
Schneider, S. (2019). Artificial you: AI and the future of your mind. Princeton University Press.
Udell, D. B. (2021). Susan Schneider's Proposed Tests for AI Consciousness: Promising but Flawed. Journal of Consciousness Studies, 28(5-6), 121-144.
Turner E. and Schneider S. “The ACT test for AI Consciousness”, Ethics of Artificial Intelligence (forthcoming).
Udell, D. B. (2021). Susan Schneider's Proposed Tests for AI Consciousness: Promising but Flawed. Journal of Consciousness Studies, 28(5-6), 121-144.
Schwitzgebel, E., & Garza, M. (2020). Designing AI with rights, consciousness, self-respect, and freedom. *Ethics of Artificial Intelligence.
Chalmers, D. (2018). The meta-problem of consciousness. Journal of Consciousness Studies.

Citation

For attribution in academic contexts or books, please cite this work as

Nick Alonso, "An Introduction to the Problems of AI Consciousness", The Gradient, 2023.

Bibtex citation:

@article{alonso2023aiconsciousness,
    author = {Alonso, Nick},
    title = {An Introduction to the Problems of AI Consciousness},
    journal = {The Gradient},
    year = {2023},
    howpublished = {\url{https://thegradient.pub/an-introduction-to-the-problems-of-ai-consciousness},
}

The Gradient
Text-to-CAD: Risks and Opportunities 9 September 2023 at 15:00

Text-to-CAD: Risks and Opportunities

The Gradient

By:Reggie Raye

9 September 2023 at 15:00

The dust has hardly formed, much less settled, when it comes to AI-powered text-to-image generation. Yet the result is already clear: a tidal wave of crummy images. There is some quality in the mix, to be sure, but not nearly enough to justify the damage done to the signal-to-noise ratio – for every artist who benefits from a Midjourney-generated album cover, there are fifty people duped by a Midjourney-generated deepfake. And in a world where declining signal-to-noise ratios are the root cause of so many ills (think scientific research, journalism, government accountability), this is not good.

It’s now necessary to view all images with suspicion. (This has admittedly long been the case, but the increasing incidence of deepfakes warrants a proportional increase in vigilance, which, apart from being simply unpleasant, is cognitively taxing.) Constant suspicion - or failing that, frequent misdirection - seems a high price to pay for a digital bauble that no one asked for, and offers as yet little in the way of upside. Hopefully - or perhaps more aptly, prayerfully - the cost-to-benefit ratio will soon enter saner territory.

But in the meantime, we should be aware of a new phenomenon in the generative AI world: AI-powered text-to-CAD generation. The premise is similar to that of text-to-image programs, just instead of an image, the programs return a 3D CAD model.

A few definitions are in order here. First, Computer Aided Design (CAD) refers to software tools wherein users create digital models of physical objects - things like cups, cars, and bridges. (Models in the context of CAD have nothing to do with deep learning models; a Toyota Camry ≠ a recurrent neural network.) Also, CAD is important; try to think of the last time you were not within sight of a CAD-designed object.

Definitions behind us, let’s turn now to the big players who want in to the text-to-CAD world: Autodesk (CLIP-Forge), Google (DreamFusion), OpenAI (Point-E), and NVIDIA (Magic3D). Example of each are shown below:

Major players have not deterred startups from popping up at the rate of nearly one a month, as of early 2023, among whom CSM and Sloyd are perhaps the most promising.

In addition, there are a number of fantastic tools that might be termed 2.5-D, as their output is somewhere between 2- and 3-D. The idea with these is that the user uploads an image, and AI then makes a good guess as to how the image would look in 3D.

Open source animation and modeling platform Blender is, unsurprisingly, a leader in this space. And the CAD modeling software Rhino now has plugins such as SurfaceRelief and Ambrosinus Toolkit which do a great job of generating 3D depth maps from plain images.

All of this, it should first be said, is exciting and cool and novel. As a CAD designer myself, I eagerly anticipate the potential benefits. And engineers, 3D printing hobbyists, and video game designers, among many others, likewise stand to benefit.

However, there are many downsides to text-to-CAD, many of them severe. A brief listing might include:

Opening the door to mass creation of weapons, and racist or otherwise objectionable material
Unleashing a tidal wave of crummy models, which then go on to pollute model repos
Violating the rights of content creators, whose work is copyrighted
Digital colonialism: amplifying very-online western design at the expense of non-western design traditions

In any event, text-to-CAD is coming whether we want it or not. But, thankfully, there are a number of steps technologists can take to improve their program’s output and reduce their negative impacts. We’ve identified three key areas where such programs can level up: dataset curation, a pattern language for usability, and filtering.

To our knowledge, these areas remain largely unexplored in the text-to-CAD context. The idea of a pattern language for usability will receive special attention, given its potential to dramatically improve output. Notably, this potential isn’t limited to CAD; it can improve outcomes in most generative AI domains, such as text and image.

Dataset Curation

Passive Curation

While not all approaches to text-to-CAD rely on a training set of 3D models (Google’s DreamFusion is one exception), curating a model dataset is still the most common approach. The key here, it scarcely bears mentioning, is to curate an awesome set of models for training.

And the key to doing that is twofold. First, technologists ought to avoid the obvious model sources: Thingiverse, Cults3D, MyMiniFactory. While high quality models are present there (mine among them ;) the vast majority are junk. (The Reddit thread ‘Why is Thingiverse so shit?’ is one of many that speak to this problem.) Second, super high-quality model repos should be sought out. (Scan the World is perhaps the world's best.)

Next, model sources can be weighted according to quality. Master of Fine Arts (MFA) students would likely jump at the chance to do this kind of labeling - and, due to the inequities of the labor market, for peanuts.

Active Curation

Curation can and should take a more active role. Many museums, private collections, and design firms would gladly have their industrial design collections 3D scanned. Plus, in addition to producing a rich corpus, scanning would create a robust record of our all-too-fragile culture.

Data Enrichment

In the process of creating a high quality corpus, technologists must think hard about what they want the data to do. At first glance, the main use case might seem to be ‘empowering managers at hardware companies to move a few sliders that output blueprints for a desired product, which can then be manufactured’. If the failure-rich history of mass customization is any guide, however, this approach is likely to flounder.

A more effective use case, in our view, would be ‘empowering domain experts - people like industrial designers at product design firms - to prompt engineer until they get a suitable output, which they then fine-tune to completion’.

Such a use case would require a number of things which are perhaps non-obvious at first glance. For example, domain experts need to be able to upload images of reference products, as in Midjourney, which they then tag according to their target attributes - style, material, kinetics, etc. It might be tempting to adopt a faceting approach here, where experts select dropdowns for style type, material type, etc. But experience suggests that enriching datasets so as to create attribute buckets is a bad idea. This manual approach was favored by the music streaming service Pandora, which was ultimately steamrolled by Spotify, which relies on neural nets.

Takeaways

Rigorous dataset curation is an area where (with a few exceptions) little has been done and, hence, much is to be gained. This should be a prime target for companies and entrepreneurs seeking a competitive advantage in the text-to-CAD wars. A large, enriched dataset is hard to make and hard to imitate - the best kind of mote.

On a less corporatist note, thoughtful dataset curation is the ideal way to drive the creation of products that are beautiful. Reflecting the priorities of their creators, generative AI tools to date have been, to put it lightly, taste-agnostic. But we ought to take a stand for the importance of beauty. We ought to care about whether what we bring into this world will enchant users and stand the test of time. We ought to push back against the mediocre products being heaped onto mediocre bandwagons.

If beauty as an end in itself is insufficient to some, perhaps they will be persuaded by two data points: sustainability and profit.

The most iconic products of the past hundred years - the Eames chairs, Leica cameras, Vespa scooters - are treasured by their users. Vibrant fandoms restore them, sell them, and continue to use them. Perhaps the intricacy of their design required 20% more emissions than rival products of their day. No matter. That their lifespans are measured in quarter centuries and not in years means that they led to less consumption and less emissions.

As for profit, it’s no secret that beautiful products command a price premium. iPhone specs have never been comparable to Samsungs’. Yet Apple can charge 25% more than Samsung. The adorable Fiat 500 subcompact gets worse gas mileage than an F-150. No matter. Fiat wagered, correctly, that yuppies would gladly pay an extra $5K for cuteness.

A Pattern Language for Usability

Overview

Pattern languages were pioneered in the 1970s by polymath Christopher Alexander. They are defined as a mutually-reinforcing set of patterns, each of which describes a design problem and its solution. While Alexander’s first pattern language was targeted at architecture, they have been profitably applied to many domains (most famously in programming) and stand to be at least as useful in the domain of generative design.

In the context of text-to-CAD, a pattern language would consist of a set of patterns; for example, one for moving parts, one for hinges (a subset of moving parts, hence one layer of abstraction down), and one for friction hinges (another layer of abstraction down). The format for a friction hinge pattern might look like this:

Pattern Name	Friction Hinge
Pattern Description	The Friction Hinge pattern addresses the need for adjustable friction in hinges so as to provide tuneable resistance, but without compromising smooth movement. By allowing customization of the level of friction, this pattern enhances usability. This pattern may be used in the design of consumer electronics, automotive interior components, medical equipment, and folding furniture, among others.
Consider These Patterns First	Ergonomics for Hand-held Devices, Hinges, Load and Force Analysis, Safety Locking, Lubrication and Wear Resistance
Problem Statement	Folding devices require hinges with adjustable friction, lack of which may result in either excessive resistance or insufficient support for the object attached to the hinge.
Solution	Friction Adjustability: Integrate a mechanism into a generic barrel hinge that enables friction calibration per use case requirements. This adjustment can be achieved through various means, such as a tensioning screw or a friction pad with different settings. Smooth Transition: Ensure that the friction adjustment mechanism allows for smooth and incremental changes, without sudden jumps or unintended collisions with other design elements.
Used In	Laptops, adjustable stands and mounts, folding tables, cabinet doors, exercise incline benches, medical examination tables
Consider These Patterns Next	Adaptive Friction Control, Sealed Friction Mechanism, Safety Release, Indexed Folding Mechanism

In common with natural language, pattern languages comprise a vocabulary (the set of design solutions), syntax (where a solution fits into the language), and grammar (rules for which patterns may solve a problem). Note that the above pattern ‘Friction Hinge’ is one node in a hierarchical network, which can be visualized by a directed network graph.

Embodied in these patterns would be best practices with respect to design fundamentals - human factors, functionality, aesthetics, etc. The output of such patterns would thereby be more usable, more understandable (avoiding the black box problem), and easier to fine-tune.

Crucially, unless text-to-CAD programs account for design fundamentals, their output will amount to little less than junk. Better nothing at all than a text-to-CAD-generated laptop whose screen doesn’t stay upright.

Perhaps the most important of all these fundamentals - and the most difficult to account for - is design for human factors. To get a useful product, the number of human factors considerations verges on the infinite. The AI must recognize and design around pinch points, finger entrapment, ill-placed sharp edges, ergonomic proportions, etc.

Implementation

Let’s look at a practical example. Suppose Jane is an industrial designer at Design Studio ABC, which has a commission to design a futuristic gaming laptop. The state of the art now would be for Jane to turn to a CAD program like Fusion 360, enter Fusion’s generative design workspace, and spend the rest of the week (or month) working with her team to specify all relevant constraints: loads, conditions, objectives, material properties, etc.

But however powerful Fusion’s generative design workspace is (and we know from experience that it’s powerful) it can never get around one key fact: a user must have lots of domain expertise, CAD ability, and time.

A more pleasant user experience would be to simply prompt a text-to-CAD program until its output meets ones’ requirements. Such a pattern design-centric workflow might look like the following:

Jane prompts her text-to-CAD program: “Show me some examples of a futuristic gaming laptop. Use for inspiration the form factor of the TOMO laptop stand and the surface texture of a king cobra”.

The program outputs six concept images, each informed by patterns such as “Keyboard Layout”, “Hinged Mechanisms”, and “Port Layout for Consumer Electronics”

She replies “Give me some variations of image 2. Make the screen more restrained and the keyboard more textured.”

Jane: “I like the third one. What parameters do we have on that one?”

The system, drawing on the ‘Solution’ fields of the patterns it finds most relevant, lists 20 parameters - length, width, monitor height, key density, etc.

Jane notes that the hinge type is not specified, so types “add a hinge type parameter to that list and output the CAD model”.

She opens the model in Fusion 360 and is pleased to see that an appropriate friction hinge has been added. As the hinge has come parameterized, she increases the width parameter, knowing that Studio ABC’s client will want the screen to hold up to a lot of abuse.

Jane continues making adjustments until she’s fully satisfied with the form and function. This done, she can pass it off to her colleague Joe, a mechanical engineer, who will inspect it to see which custom components might be replaced by stock versions.

In the end, management at Studio ABC is happy because the laptop design process went from an average of six months to just one. They are doubly pleased because, thanks to parameterization, any revisions requested by their client can be quickly satisfied without a redesign.

Thorough Filtering

As AI ethicist Irene Solaiman recently pointed out in a poignant interview, generative AI is sorely in need of thorough guardrails. Even with the benefit of a pattern language approach, there’s nothing inherent in generative AI to prevent generation of undesirable output. This is where guardrails come in.

We need to be capable of detecting and denying prompts that request weapons, gore, child sexual abuse material (CSAM), and other objectionable content. Technologists wary of lawsuits might add to this list products under copyright. But if experience is any guide, objectionable prompts are likely to make up a significant portion of queries.

Alas, once text-to-CAD models get open-sourced or leaked, many of these queries will be satisfied without compunction. (And if the saga of Defense Distributed has taught us anything, it’s that the genie will never go back into the bottle; thanks to a recent ruling in Texas, it’s now legal for an American to download an AR-15, 3D print it, and then - should he feel threatened - shoot someone with it.)

In addition, we need widely-shared performance benchmarks, analogous to those that have cropped up around LLMs. After all, if you can’t measure it, you can’t improve it.

____

In conclusion, the emergence of AI-powered text-to-CAD generation presents both risks and opportunities, the ratio of which is still very much undecided. The proliferation of low-quality CAD models and toxic content are just a few things that require immediate attention.

There are several neglected areas where technologists might profitably train their attention. Dataset curation is crucial: we need to track down high-quality models from high-quality sources, and explore alternatives such as scanning of industrial design collections. A pattern language for usability could provide a powerful framework for incorporating design best practices. Further, a pattern language will provide a robust framework for generating CAD model parameters that can be fine-tuned until a model meets the requirements of its use case. Finally, thorough filtering techniques must be developed to prevent the generation of dangerous content.

We hope the ideas presented here will help technologists avoid the pitfalls that have plagued generative AI to date, and also enhance the ability of text-to-CAD to deliver delightful models that benefit the many people who will soon be turning to them.

Authors

Reggie Raye is a teaching artist with a background in industrial design and fabrication. He is the founder of design studio TOMO.

K. Alexandria Bond, PhD is a neuroscientist focusing on the rules driving learning dynamics. She studied cognitive computational neuroscience at Carnegie Mellon. She currently develops machine learning methods for precision diagnosis of psychiatric conditions at Yale.

Citation

For attribution in academic contexts or books, please cite this work as

Reggie Raye and K. Alexandria Bond, "Text-to-CAD: Risks and Opportunities", The Gradient, 2023.

Bibtex citation:

@article{raye2023texttocad,
    author = {Raye, Reggie and Bond, K. Alexandria},
    title = {Text-to-CAD: Risks and Opportunities},
    journal = {The Gradient},
    year = {2023},
    howpublished = {\url{https://thegradient.pub/text-to-cad},
}