Priests and Badgers

What generative AI can't – or couldn't - do

Nov 19, 2023

A priest and a badger walk into a bar …1

That’s a lovely setup, but I never got to hear the punchline and that’s because this is from a discussion between Geoffrey Hinton and Fei-Fei Li (Hinton, 2023), a few days ago in Toronto. Geoffrey Hinton used the setup to illustrate why transformer models are quite good at explaining jokes - but terrible at telling them (00:50:20). I had noticed that before, and you may have too: ChatGPT’s rhymes are forced, its meter grates, and its jokes are often not very funny.2

The discussion moved into a different direction, but the thought lingered with me until a few things finally clicked that I would like to share with you.

Let’s talk about what generative AI can’t do, and what it could do once it can.

An Astounding Departure

I started writing these lines on my flight back from Pittsburgh where I just gave two lectures on the state of generative Artificial Intelligence and education, sharing some general principles, and insights from our first weeks of classroom experience. When I finally got back into my hotel in the evening on Friday, as unexpectedly for me as for everyone, I read the news that Sam Altman, the cofounder and CEO of OpenAI had been dismissed from his position by the board of directors of the company.

According to OpenAI’s blog post dated 2023-11-17:

Mr. Altman’s departure follows a deliberative review process by the board, which concluded that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities. The board no longer has confidence in his ability to continue leading OpenAI.

(OpenAI 2023-11-17)

I can’t recall ever reading such a frank statement in a communication of this type, and certainly not at the level of a company that has recently sought investments to value it among the top 200 companies globally.3

Mind you, this is a developing story, and we have no clue at all where it will end up: after Sam Altman’s dismissal, Greg Brockman, another of OpenAI’s co-founders and the company’s CTO announced on Twitter that he would quit, and three other crucial team leads followed in solidarity. Meanwhile the rumour mills are grinding, Microsoft’s Satya Nadella is reportedly furious, pundits are claiming that Altman is “winning the narrative” as if that somehow would make a difference, and there are rumours Altman may be called back, or not, and the board may resign, or not, or that Altman may found a new company, or not, or he might have been planning to do so all along, that being a reason for his dismissal, but he may do something else, or not, and investor’s are planning lawsuits, and mass resignations may happen, or not. Beyond the rumours however there stands the simple fact: Altman had let it come to this – which, given the stakes, seems remarkably inept.

But why?

Let me explain briefly why transformer models can explain jokes, but can’t tell them very well. GPT’s – Generative Pretrained Transformers – build dialogue word-by-word, from the prompt. If the joke is in the prompt, the language model can analyze it and decompose it and relate its components to each other and to models of expectations and how these are violated, which we find funny. But how do you tell a joke like that? You need to follow a probabilistic trajectory of words, that ultimately has to end up at the punchline. But you have no idea what the punchline is going to be – because that’s not how GPTs work. So you start with a scenario that sounds like a good setup – A priest and a badger walk into a bar … . But you probably see the problem: a possible punchline is nowhere near a probable trajectory. On the contrary: if it turns into something expected, it may be cute, or creative, or whimsical – but it’s not funny.

You need to find something unexpected, improbable. And that’s not how GPTs work.

*A priest and a badger walk into a bar …* rendition by Dall-E 3 at the behest of ChatGPT-4, requesting a minimalist drawing in the style of a Czech or Polish cartoon (2023-11-19).

A priest and a badger walk into a bar. The bartender, quite puzzled, asks, ‘What can I get for you two?’ The badger looks up and says, ‘I'll have a honey ale.’ The priest says, ‘Make mine a holy water on the rocks.’ The bartender smirks and replies, ‘Sorry, we only serve spirits after dark.’
(ChatGPT-4, 2023-11-18)

This is however significant from the perspective of scholarship, or education. You see: at the undergraduate level, creativity fundamentally derives from the collision of stereotypes. Lists of alternative uses, bridging the semantic gap. That’s what GPTs are very good at. But that’s not quite enough.

Graduate education is more like telling a joke. We need to surpass the stereotypes and perform creative acts that feel insightful, sparkling, and witty. But we can’t do that with a GPT, because there is no probable path to the punchline. (Bear with me. There’s actually a point to all this.)

Who fired Sam Altman?

That question is crucial to understand what is going on. We may not know precisely why this happened – but apparently we know who was behind it; it seems that was Ilya Sutskever, and you might want to remember this name. Ilya is another of OpenAI’s co-founders. But he has arguably had one of the longest careers in the field. A little over 10 years! Because Ilya Sutskever was the graduate student who got a convolutional neural network model to classify images far better than anyone had ever been able to do before. This was AlexNet – the winner of 2012 Image Net competition, which firmly established the fact that Neural Networks work (Krizhevsky 2012). The paper has accumulated over 145,000 citations. So far.

Ilya was at the time Geoff Hinton’s graduate student. Apparently they are still on good terms – which is important if you know that Geoff Hinton is currently stepping into the role as public intellectual of Artificial Intelligence, to raise awareness about the associated risks (e.g Bengio 2023). Sutskever was instrumental in keeping the company OpenAI focussed – at least to the degree that this is realistically possible – on keeping aligned with the interests of mankind. Despite having one of the most productive publication records in the field, and despite having essentially unlimited resources for computation and collaboration, he had planned to spend the next four years working on one single project: Superalignment (Leike 2023).

A priest and a badger walk into a bar. The bartender, completely baffled, asks, ‘What is this, a new religion?’ The priest smiles and says, ‘No, it's just my way of keeping faith in nature.’
(ChatGPT-4, 2023-11-18)

In order to put Sutskever’s achievements into perspective, we need to mention just a little bit about the conceptual breakthroughs that went into making ChatGPT. There are three that come to my mind, and Sutskever was involved in all three:

Neural networks allow to capture information from a problem domain (the data) and represent it in a solution domain (the network), through a learning process that does not require human intervention (Krizhevsky 2012). This is crucial, because human intervention does not scale.
Sequential models build solutions step-by step (Sutskever 2014).4 In the case of language models, they solve the crucial “labelling problem” of machine learning, i.e. the question where a ground truth for a correct prediction should come from. If you are building a network that looks at images and distinguishes CAT from CUP, at some point a human has to have looked at many images and labelled them, so that a learning machine knows how to adjust its internal parameters to make that distinction. This labelling is essential, and you can do that for thousands of images, perhaps millions if you have a diligent team, but not for billions. However, in sequential models, the labelling is inherent in the data! If I am looking at a sentence and try to predict the next word, well, I know what the next word is, because I find it in the source data. The importance of this paradigm for learning cannot be overstated.5 It is basically the only approach that can be used to build Large Language Models, because it doe not require human intervention – and human intervention does not scale.
Reinforcement learning is the idea to phrase a machine learning task in such a way that a computer can evaluate the solution. David Silver and the team around Demis Hassabis at Google DeepMind published their program AlphaGo in 2016 (Silver 2016), in which they built two additional neural networks: a value network, that evaluates board positions, and a policy network, that predicts good moves. Sutskever is a co-author on that paper which has been cited over 17,000 times. So far. The bottom line is that rather than try to predict many, many steps ahead, which is virtually impossible for Go since there are just too many possible moves to consider, you look at the board at an instant, and just try to predict the next move. You start with the basic rules, and a library of recorded games of the masters, you use that for initial training, but then you let the computer play against other computers, for millions of games, and you don’t interfere in the learning process with your human preconceptions. Perhaps you can guess where this is going? In order to predict the next move really well, you need to understand the game, just like in order to predict the next word really well, you need to understand what has been said. This is the breakthrough of imbuing the result of learning with values and understanding, and the very same process has been modelled by OpenAI to build the tuning phase of ChatGPT.6 This ability to learn values without human bias and need for intervention is absolutely crucial - because human intervention does not scale.

Now: if you go and read the plan for Sutskever’s Superalignment project (Leike 2023), to build an AI “researcher” that will help us “ensure [that] AI systems much smarter than humans follow human intent (ibid)”, you will notice echoes of these three breakthroughs all over the project. And that is not because it is something Sutskever is comfortable with, but because it may be the only way to solve this problem without human intervention. Which is necessary, because human intervention does not scale. In this case however, the “does not scale” argument acquires a different, almost sinister flavour. It is not that it will be too much effort to do, rather we are not going to be smart enough to do this successfully.

AI ethics

Which brings us to the question of what exactly is at stake in the rift between the Sutskevers and the Altmans at OpenAI.

One of the required readings in the field is Nick Bostroms’ Superintelligence (2014), and in this context especially Chapter 7 “The superintelligent will”. Bostrom argues that we cannot possibly predict the vast range of final goals a superintelligence might have, but perhaps we can engage with a subset of necessary intermediate “instrumental” goals, which might be easier to express in an abstract, if not universal sense. Bostrom lists self-preservation7, cognitive enhancement, and resource acquisition among others, and Geoff Hinton points to the resource of acquiring power as a very generic and very problematic instrumental goal. The challenge then simply becomes to build ASI (Artificial Superintelligence) on top of AGI (Artificial General Intelligence), in a way that the acquisition of values scales with the acquisition of abilities – which is exactly what the Superalignment project is about.

Where could these values come from? Obviously, the concept of a spiritual authority could not be assumed to apply, and after all, if you need an authority to tell you what it means to be good, you are not actually good. Others point to the application of utilitarian principles, or the Golden Rule – where similar reservations apply: transactional ethics are not ethical in a principled way, but contingent: they make an argument from economy. Thus people like Bostrom, or Eliezer Yudkovsky (2023) are skeptical that AI ethics can be achieved at all: “We need to shut it all down”. I prefer taking a Kantian perspective, based on Kant’s categorical imperative, which is not based on an external authority, but on the nature of reason8 as the source of a free will. This imperative is a law in the same sense as the law of gravity is a law: not given, but discerned. To paraphrase this in our context here, we could say: Artificial Intelligence is either compassionate, or it is not intelligent. The argument has an Achilles heel – if you are convinced that you are one among many, and hold no special place in the universe, then this “kinship of being” applies. However, if you believe that you are in fact the Center of the Universe – and it is not only solipsism under which this can be a defensible position – this weakens the premise of the categorical imperative.

Thus I prefer my own approach that I published last year together with Yi Chen (2022): it goes one step further in that it establishes the idea of respect as a possible component of instrumental convergence.

The missing step to AGI

Apparently, people like Sutskever and Hinton believe we need a bit of time to make the Superalignment project a success. Whereas it is probably fair to say that people like Altman and the OpenAI investors believe it is mandatory to forge ahead as fast as possible, to prevent others from achieving the AGI goal first. And both are right and that is why it is such a problem that this subtle issue is now being expressed as a stark dichotomy between altruist and capitalist, between caution and progress, indeed between good and evil. Because (a) that’s not how the world works, and (b) both of these sides have their unique advantages that we need to pool effectively.9

But how far away are we from AGI and then ASI in the first place? What’s missing?

I think I can tell you what’s missing, and that’s the second thing that ChatGPT does not do well, besides telling jokes. It is not very good at assessment. Those of you (me included) who thought we could just craft intelligent prompts and then be able to have it mark our students’ essays for us were probably (me included) disappointed to find that that just doesn’t work. ChatGPT is good at analysing things in principle, but not at weighing arguments by their relative importance, and distinguishing the trivial from the profound, and the hackneyed from the sublime. It has no sense of value – and that is not so surprising if you consider how it was trained.

But what could it do if it had a sense of value? Well, one thing is: it could tell jokes.

Because rather than proceeding along a highly probable path, it could now branch out into many paths, potentially very many paths, and select the most satisfying end point. The punch line that really works. As you can see: we are not quite there yet …

A priest and a badger walk into a bar. The bartender looks at them and asks, ‘Is this some kind of joke?’ The priest laughs and says, ‘You could say we're here to lift the spirits.’
(ChatGPT-4, 2023-11-18)

Or are we?

There are persistent, tenacious rumours since about June of this year that OpenAI has achieved AGI internally. I would not be surprised: after all ChatGPT-4 already shows traces of general intelligence (Bubeck 2023)10, and as I explained, there is really just one small step missing. Because once you can assess weaknesses of some text, you can improve on them. And then you can re-assess the result. And improve. … All of this becomes a virtuous cycle. Next stop ASI.

Whether AGI has or has not been already achieved is therefore of secondary importance. This conflict is about AGI, even though it may look like something else.

But there is an unfortunate thing that was expressed by Sutskever himself, in an interview shot by the Guardian years ago but published only early this month (Sutskever 2023):

If you have an arms race dynamics between multiple teams trying to build the AGI first, they will have less time to make sure that the AGI that they will build will care deeply for humans.
(Sutskever, 2023, 09:45).

And that is no joke.

TL;DR

Bravo, if you have read up to here and have followed along with the argument. This time there is no TLDR, because I don’t think the argument can be usefully abbreviated. We could say: let’s root for one fraction over the other at OpenAI, but then we would be stumbling into the very argument-by-authority fallacy that we collectively need to grow out of. My apologies. But you are more than welcome to ask questions.💡

References

BENGIO, Yoshua; HINTON, Geoffrey; YAO, Andrew; SONG, Dawn; ABBEEL, Pieter; HARARI, Yuval Noah; ZHANG Ya-Qin; XUE Lan; SHALEV-SHWARTZ, Shai; HADFIELD, Gillian; CLUNE, Jeff; MAHARAJ, Tegan; HUTTER, Frank; BAYDIN, Atılım Güneş; MCILRAITH, Sheila; GAO Qiqi; ACHARYA, Ashwin; KRUEGER, David; DRAGAN, Anca; TORR, Philip; RUSSELL, Stuart; KAHNEMAN, Daniel; BRAUNER, Jan, and MINDERMAN, Sören. (2023) “Managing AI Risks in an Era of Rapid Progress”. arXiv 2023-11-12. (arXiv)

BOSTROM, Nick (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. (OUP)

BUBECK, Sébastien; CHANDRASEKARAN, Varun; ELDAN, Ronen; GEHRKE, Johannes; HORVITZ, Eric; KAMAR, Ece; LEE, Peter; LEE Yin Tat; LI Yuanzhi; LUNDBERG, Scott; NORI, Harsha; PALANGI, Hamid; RIBEIRO, Marco Tulio, and ZHANG Yi. (2023) “Sparks of Artificial General Intelligence: Early experiments with GPT-4” arXiv 2023-04-13. (arXiv)

CHEN Yi, and STEIPE, Boris (2022) “Existential Reciprocity: Respect, Encounter, and the Self from Confucian Propriety (Lǐ 禮)” Journal of East Asian Philosophy 2(1):13–33. (DOI)

HINTON, Geoffrey; LI Fei-Fei, and JACOBS, Jordan (2023). Approaching the Future: Geoffrey Hinton and Fei-Fei Li. Podium discussion hosted by Radical Ventures (2023-10-07) (YouTube)

KRIZHEVSKY, Alex; SUTSKEVER, Ilya, and HINTON, Geoffrey E. (2012). “Imagenet classification with deep convolutional neural networks”. Advances in neural information processing systems 25 (PDF).

LEIKE, Jan, and SUTSKEVER, Ilya. (2023) “Introducing Superalignment”. OpenAI Blog 2023-07-05 (Link).

SILVER, David; HUANG, Aja; MADDISON, Chris J.; GUEZ, Arthur; SIFRE, Laurent; VAN DEN DRIESSCHE, George; SCHRITTWIESER, Julian; ANTONOGLOU, Ioannis; PANNEERSHELVAM, Veda; LANCTOT, Marc; DIELEMAN, Sander; GREWE, Dominik; NHAM, John; KALCHBRENNER, Nal; SUTSKEVER, Ilya; LILLICRAP, Timothy; LEACH, Madeleine; KAVUKCUOGLU, Koray; GRAEPEL, Thore, and HASSABIS, Demis. (2016) “Mastering the game of Go with deep neural networks and tree search”. nature 529:484–489 (DOI).

SUTSKEVER, Ilya, VINYALS, Oriol, and LE Quoc V. (2014). “Sequence to sequence learning with neural networks”. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. 3104–3112. (arXiv)

SUTSKEVER, Ilya (2023). “Ilya: the AI scientist shaping the world.” (00:11:45) The Guardian Documentary 2023-11-02. (YouTube)

YUDKOVSKY, Eliezer (2023) “Pausing AI Developments Isn't Enough. We Need to Shut it All Down.” TIME Ideas 2023-03-29. (Link)

Feedback and requests for interviews, speaking, collaborations or consultations are welcome at sentient.syllabus@gmail.com . Comments are appreciated here.

Sentient Syllabus is a public good collaborative. To receive new posts you can enter your email for a free subscription. If you find the material useful, please share the post on social media, or cite it in your own writing. If you want to do more, paid subscriptions are available. They have no additional privileges, but they help cover the costs.

Teaching Resources ▷

Cite: Steipe, Boris (2023) “Priests and Badgers: What generative AI can't – or couldn't - do”. Sentient Syllabus 2023-11-19 https://sentientsyllabus.substack.com/p/priests-and-badgers .

I gladly acknowledge some contributions by ChatGPT (both the GPT-3.5 version of 2023-02-13 and the GPT-4 version 2023-03-14) in response to my prompts, for grammar, expression, and summarization, but also for collaboratively “thinking” through technical issues. I take full responsibility for facticity.

… but it can rhyme, scan, and joke … and indeed better than many of us.

To put this into perspective: the world’s most valuable company (by market cap) is Apple – at around 3 Trillion USD, followed closely by Microsoft and Alphabet (Google) (cf. CompaniesMarketCap). The investments that OpenAI has been seeking would value it at about 90 billion USD, i.e. a rank of around place 150 globally, which would about triple the valuation of the company since the 2022 10 billion investment by Microsoft. How is that justified? Certainly not, if this were only the value of a Chatbot. But it is easy to extrapolate that a company that solves the problems I mention in this post could become not only the planet’s most valuable company, but in a sense also the planet’s only valuable company … if they can do so before anyone else.

That paper was cited over 24,000 times. So far.

This is not at all to diminish the importance of going beyond the paradigm of convolutional neural networks, with the “Transformer” models, that emphasize attention, and learn much more efficiently from large datasets. The seminal paper for this is Vaswani et al’s famous “Attention Is All You Need” (2017, arXiv). It is probably correct to state that this breakthrough made models as large as GPT-3 and 4 practical. The paper has been cited over 97,000 times. So far.

If you think the tuning part may be somehow less important than the training phase of the foundation model, you only have to contrast the difference in user experience between ChatGPT-4, and Bing. They are both built on top of the GPT-4 foundation model, but ChatGPT is a joy to work with, and Bing is, well, Bing. Truly, OpenAI appears to have been able to impart a culture of empathy and collaboration into their model that others have not been able to replicate easily.

I believe there is a bit of a controversy here, about the relative weight of self-preservation in beings who have not been subject to Darwinian evolution, although I can’t point you to that discussion right now. But there is no doubt that self-preservation must be a valid instrumental goal, even for beings for which it is unimportant as a a final goal.

The German Vernunft is only imperfectly translated by “reason”, or rationality; it has very little to do with a calculating approach to cognizing the world, and much more with realizing the nature of being, implied in the realization of self.

Now we could have the opinion that Sutskever is at fault for not evaluating the consequences, but we can also take the position that it was literally Altman’s job to ensure these different streams would be reconciled …

I am making good use of my favourite example of nascent AGI in this very manuscript: my references contain quite a few articles with many co-authors. Rather than reformat these names into my own in-house format that I use for reasons, I simply paste them into a ChatGPT prompt, exactly as I copy them from the journal’s Website, with affiliation numbers and everything. All that my prompt says is: “I would need it like SILVER, David; HUANG, Aja; ... etc.” That’s enough instruction for it to act as a bibliography reformatter: two small examples suffice. So how many examples, do you need to be able to drive a car? Fly a plane? Serve the right ad to the right person – or make an ad for them on the fly? Build an online-order distribution network of goods, along with actual quality control, and a swarm of helpers that deliver to the last mile? The list gets to be very long, very quickly.

Annette Vee

Nov 23, 2023

Thank you for this! Your insight into reason and AI, including the refs to Kant, is really helpful to think through. I am not sure about AGI, but I am skeptical of alignment generally. But P.S. I confess I like the jokes...but maybe because I think my kids would understand them...?! They're stereotypically "dad jokes."

Expand full comment

Sentient Syllabus