This letter is a brief update on developments of the past week or so, that point to the next paradigm shift in AI capabilities: generative programming.1 This concerns crafting connections between instances of Large Language Models (LLMs), like GPT-4. This is “breaking news”; we can only engage with a brief exposition here since our main concern is AI in academia, but it directly relates to the visions we discussed in our last newsletter, and we need to keep these concepts and their consequences on our radar.
Have you heard of Smallville? I don’t mean the peaceful village that was invented in the 1940’s as a utopian anywhere of childhood and innocence, the cultural essence of North American superhero mythology. I am talking of a small town with a couple of houses, a college, a store and a bar. A pared down version of anywhere, an empty canvas for a hypothetical potentiality that never comes to pass. Everything could happen, but nothing really does. It is indeed Smallville.2 It has a population of twenty five. It was published a week ago.3
Smallville
What would happen if you let twenty five computed simulacra interact in a virtual Smallville? What would they do? How would they live? If we consider the behavioural repertoire of video game NPCs (Non Playing Characters), or Sims, one can hardly expect surprises. Their behaviour might be complicated but it would hardly be new, as you would expect of any system that is rule-based. But what would happen if their behaviour is not rule-based, but determined by ChatGPT, i.e. by a system that has been trained to produce language? Because language – according to Martin Heidegger – has its own agency: “Die Sprache spricht.” (Language speaks, 1959, 11). I have mentioned previously: language is not what we think, but how we think. But Heidegger’s word goes beyond that, and understands language as not just being created by us, but to manifest itself through our thoughts. In this sense, generic language abilities are not conventional rules, but they embed thoughts in a hierarchical framework of understanding – that ultimately becomes language itself.
Is this something that plain, standard ChatGPT can do? Not quite “out of the box”. One needs memory, one needs recall, in order to remember, and one needs to make use of what has been remembered. Given the constraints of a small context-window of only 2,000 tokens, some creative thinking is required to add these capabilities to ChatGPT. Joon Sung Park and coworkers (2023) designed an architecture for this: it distills perceptions from the current state, creates a memory stream, allows the retrieval of observations, assesses them through reflection, initiates planning, and creates new actions – and interactions. They then spawned 25 agentive simulacra, give them a brief paragraph worth of personality, and populated their Smallville. What do you think would happen?
Actually, what does ChatGPT-4 think would happen?
ChatGPT-4:
Creating a virtual environment with multiple ChatGPT-based agents that have memory and recall capabilities, and allowing them to follow a schedule of everyday tasks and interact with each other, would result in an intriguing experiment. Here are a few possible outcomes:
(1) Emergent behavior: As these agents communicate with each other and exchange ideas through language, you might observe novel emergent behaviors or interactions that were not explicitly programmed into the agents. These behaviors could arise from the combination of their individual knowledge bases, memory, and language processing capabilities.
(2) Adaptation and learning: The agents may adapt and learn from their interactions with each other and their environment, potentially developing new strategies or patterns of behavior based on their experiences. This learning could include refining their communication skills, improving their ability to collaborate or coordinate, and becoming more efficient in performing everyday tasks.
(3) Social dynamics: Depending on the complexity of the environment and the tasks the agents are given, you might observe the development of social dynamics, such as cooperation, competition, or even the formation of groups with shared goals or interests. These dynamics could emerge from the agents' interactions and their attempts to complete tasks or share information.
(4) Knowledge expansion: As the agents communicate with each other and exchange ideas, they may collectively generate new insights or understandings that were not present in their initial training data. This could lead to a broader range of responses and interactions between the agents over time.
(5) Limitations and challenges: Despite the potential for interesting outcomes, the experiment would likely encounter limitations and challenges inherent to AI language models like ChatGPT. These include biases present in the training data, difficulty handling ambiguous or highly complex tasks, and a lack of genuine understanding or consciousness that might limit the depth and authenticity of the agents' interactions.
Is it really this obvious? Maybe it is, because that is exactly what happened.4 And it actually does feel obvious, because, once we think through this problem in terms of language, not rules, it seems we cannot avoid imagining all manners of self-organized, purposeful, and insightful behaviour. This is a bit profound and we’ll get back to that “magical” property of language below.
The generative agents, that the Stanford/Google team has created, move about their world, perceive others, initiate interactions, remember past experience and form opinions. According to Park “By interacting with each other, generative agents in Smallville exchange information, form new relationships, and coordinate joint activities. […] these social behaviors are emergent rather than pre-programmed. (2023, 6)”. Strikingly, when one agent was seeded with the intent to plan a Valentine’s day party, that information spread through conversations, a number of agents decided to come, tasks were coordinated, and the party actually took place with a number of agents who congregated at the bar at the right time.
The goal of this exercise was to explore ways to make such simulacra believable, and the authors measured this and concluded that the algorithm did quite well, better than human stand-ins in fact. Again, the idea of inherent agency of language would predict this, but the objective of believability itself also reminds us of the “Turing Trap”. Alan Turing’s “imitation game”, the Turing Test (WP) for “intelligence”, whether an AI can successfully imitate a human in discourse, has become a powerful trope among AI engineers and dominates the space of pursued objectives. But as the Stanford economist Erik Brynjolfsson noted (2022), this thinking leads to a trap: it would be far better to envision AI systems that augment humans, rather than systems that are able to substitute for humans and replace them.
This all is quite interesting in its own right, but it is significant because of its implications. By demonstrating the integration of memory and agency with a language model, Park and colleagues have opened the gates to applications with a behaviour that goes far beyond writing and dialogue. Imagine collaborating agents acting as human simulacra for planning and design purposes. Or working on multi-model marketing campaigns. Or creating and refining a research plan …
Apparently, others have been imagining the same thing.
Generative Programming
The top three trending repositories this week, on the code sharing and open software development platform GitHub, all had one thing in common: chaining activities of LLMs. These were AutoGPT (▷), and BabyAGI (▷) – by independent developers, and Microsoft’s Jarvis (▷). A related project that has captured people’s imagination is LangChain (▷).
AutoGPT is an experimental open-source application that is driven by GPT-4. It chains together elementary GPT-4 “thoughts”, to autonomously achieve whatever goal the user sets. It is one of the first examples of GPT-4 running fully autonomously. In a video example, the systems executes a prompt to create a unique recipe for an upcoming event by (i) searching the internet for pages that might speak about upcoming events and summarizing the contents; (ii) after analyzing several pages, identifying a page with a calendar view of local current events; (iii) noticing that Earth Day might be suitable, and close to the current date; (iv) Declaring Earth Day 2023, as the task goal; (v) Updating the task goal to define a recipe with sustainable and eco-friendly ingredients; (vi) and creating the prompt for another GPT instance to finally create a quinoa salad recipe and store it in a local file. This self-directed trajectory requires only the initial prompt, all planning and execution steps are handled internally, by creating the commands to use simple API’s for other services, or by creating prompts for GPT instances. This is the key: GPT models can write prompts for other GPT models.
BabyAGI5 too is an example of an AI-powered task management system. It uses API calls to OpenAI servers and to the Pinecone vector database (▷). Pinecone is a key enabling technology for AI, it stores embedding vectors6 as a kind of associative memory. This allows BabyAGI to create, prioritize, and execute tasks. Tasks are chained, connected through natural language, and offloaded to a semantic memory for processing and retrieval of context.
Jarvis is Microsoft’s chaining experiment. It is a collaborative system with an LLM as the controller and numerous expert models as collaborative executors – i.e. dedicated AIs that know how to perform subtasks such as image analysis, image generation, and text-to-speech. The workflow consists of (i) Task Planning with ChatGPT, to analyze the requests of users, identify their intention, and define a sequence of solvable tasks; (ii) Model Selection based on a stored description of their abilities; (iii) Task Execution by invoking the selected model and returning the results to ChatGPT; and (iv) Response Generation – using ChatGPT to integrate the prediction of all models, and generate the requested output. Again, this allows natural language prompting of complex tasks, for example you can ask the system how many zebras are found in a series of uploaded image. The system then invokes a specialized AI to describe the image contents in natural language, then interprets the descriptions to infer the number of Zebras.
Finally, Langchain too is a framework for developing applications powered by language models. It enables applications to be data-aware, i.e. to connect a language model to various sources of data, and to be agentic, i.e. to allow a language model to interact with its environment.
Why is all of this together a big deal?
(1) Well, for one thing, we are getting the first glimpses of AGI – Artificial General Intelligence. While the LLMs were only built to interpret and produce language, these agent systems use language only as a representation of thought, as an intermediate step to accomplish completely general tasks.
(2) Second, these systems embody a completely novel programming paradigm: I will call this generative programming7 – since the program generates its own pathways of execution, and the programming language is: well, language. The same natural language that you and I speak every day. The language that Heidegger spoke of. One call to a LLM, one thought, is like one step in a computation. String them together, create loops, termination conditions, store data, call specialized functions, spawn whole armies of parallel workers, then collate the results ... you get arbitrarily complex computations.
(3) Third, the systems are simple, can be installed locally, and used by everyone with very little specialized knowledge. They are another step towards democratizing AI. Ian Maurer keeps a list of LLM resources that can be used to build products which individuals can “own” (▷). If everyone has access to those abilities, everyone can compete with AI monopolies. I have written about this in our last newsletter.
(4) And finally, the systems enable more users to build more sophisticated systems. And people are beginning to build systems that build better systems. And this gets us to the issue of accelerating returns.
Accelerating Returns
Things rapidly heat up when a result enhances the ability to obtain more results. This is the “law of accelerating returns” that Ray Kurzweil spoke of (2001). It is the process that leads to technological singularities. And we are seeing it in action right now.
“Accelerating returns” are realized when systems improve their ability to acquire abilities. We have assumed that this will be a concern once we use algorithms to design algorithms, when Artificial Intelligences modify other Artificial Intelligences – or themselves. But we have right now an astounding growth in AI enabled apps, services, and agents – in part because the AI itself is teaching humans to code, in part because code is no longer necessary, and we have discovered language as a computational principle. These are hybrid systems of AI, developers, and users who pay for it. The system is connected and its various developments are feeding back into the system. This is why the current rate of change is so breathtakingly fast. All of its parts are making more of it.
It is growing.
TL;DR
We highlight a very recent further paradigm shift in AI capabilities that we call generative programming, which concerns connecting instances of Large Language Models (LLMs) like GPT-4 with other instances, and tools. In an experiment called Smallville, researchers demonstrated the integration of memory and agency with a language model, which lead to emergent behaviour and opens the door for applications beyond writing and dialogue. Projects such as AutoGPT, BabyAGI, Jarvis, and LangChain allow LLMs to autonomously chain activities, marking a new programming paradigm. This development provides glimpses of Artificial General Intelligence (AGI), democratizes AI, and leads to accelerating returns where AI enables the rapid creation of increasingly sophisticated systems. 💡
References
BRYNJOLFSSON, Erik (2022) “The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence”. Dædalus – Special Issue: AI and Society. Spring 2022. (▷)
HEIDEGGER, Martin (1959) “Die Sprache”. In: Gesamtausgabe, Band 12: Unterwegs zur Sprache. Vittorio Klostermann.
KURZWEIL, Ray (2001) “The Law of Accelerating Returns”. Blog. (▷)
PARK Joon Sung; O’BRIEN, Joseph; CAI, Carrie J.; RINGEL MORRIS, Meredith; LIANG, Percy and BERNSTEIN, Michael S. (2023-04-07) “Generative Agents: Interactive Simulacra of Human Behavior”. arXiv. (▷)
Feedback and requests for interviews, speaking, collaborations or consultations are welcome at sentient.syllabus@gmail.com . Comments are appreciated here.
Sentient Syllabus is a public good collaborative. To receive new posts you can enter your email for a free subscription. If you find the material useful, please share the post on social media, or cite it in your own writing. If you want to do more, paid subscriptions are available. They have no additional privileges, but they help cover the costs.
Cite: Steipe, Boris (2023) “Trains of Thought: Yet another frontier of AI is coming into view”. Sentient Syllabus 2023-04-13 https://sentientsyllabus.substack.com/p/trains-of-thought .
I wish to acknowledge some contributions by ChatGPT (both the GPT-3.5 version of 2023-02-13 and the GPT-4 version 2023-03-14) in response to my prompts, particularly for grammar, expression, and summarization. I take full responsibility for facticity.
Naming in computer science is considered one of two hard things, and the choice of “Smallville” is worth a thought or two. It is not Pleasantview or Pallet Town. It is also not Maycomb, or Medaillon. Perhaps Whoville would be a good alternative. And perhaps Smallville is indeed perfect, with its dialectic of harbouring a nascent authority, while embodying the norms that this authority will live by.
Well – mostly. Point (2), Adaptation and learning did not appear prominently, but that is expected, because the agents were not under any pressure to improve. Undoubtedly, somebody is writing code right now that uses those principles to solve a task that cannot by solved by any individual. And point (5), Limitations and challenges, is quite irrelevant to the question and the scenario; these sorts of disclaimers currently appear all over the conversation, they are a consequence of the “alignment” that OpenAI has added to ChatGPT and they reflects the company’s idea of guardrails, and ethical behaviour, rather than the needs of the conversation.
I can’t be the only one to notice that BabyAGI, when pronounced as a word [ˈbeɪbiˈjɑɡi] sounds like a diminutive of Baba Yaga.
Embeddings in AI take a chunk of text and compute a number that defines a location in a very large semantic space. This allows to measure the relatedness of two chunks of text: they have similar meaning if their embedding vectors are close together (cf. OpenAI ▷). Using this idea, text can be stored in databases and retrieved by associative recall – i.e. just like we recall our own memories by imagining related thoughts or partial contents.