Last week I listened to Joe Rogan interview Steve Jobs. Among other things, they discussed ancient Hindu texts, Buddhism, LSD, and how Apple could beat Microsoft. Except they didn’t. The whole conversation, from start to finish, was AI generated. Every sentence, every laugh, was a computer extrapolating what they might have said to each other, and creating a surprisingly convincing simulacrum.
Do you see the frontiers open up? With the same technology, we can ask Arthur Schopenhauer about his thoughts on horror movies. Character.ai is working on this. The AI can digest everything he’s written on other topics, and extrapolate what he might say about horrors, even though he hasn’t seen any.
When CERN publishes the next revolutionary paper on the fundaments of matter, we could ask Einstein, Feynman, and Hawking to explain it to us. We may not have access to their minds, but their voices—and style of thinking—can live on. The next time a child needs a math tutor, the lesson could be delivered by Kurt Gödel or Peppa Pig—whichever they prefer. And the spaghetti Bolognese recipe you’ve been cooking time and time again? Have it spiced up by Marie-Antoine Carême, the 19th century celebrity chef credited with creating haute cuisine, the Malakoff, and French fries. All of this is now technologically trivial—it’s just a matter of training the same model on a different dataset and changing the UI.
What’s developing here is an interesting twist on the age-old notion that especially famous writers achieve a degree of immortality. I might like my spaghetti spiced up by the chef of Marcus Aurelius, but—alas—he didn’t write a cookbook, so no immortality for him. Those lucky ones, who left behind at least a short volume, can start queuing up to return to life. I hope that my subscribers—those who are still alive—hear the nudge: start writing.
Or are we getting ahead of ourselves? There’s some danger here of losing sight of what’s “real” and what isn’t. A character.ai can be convincing and even informative, but it’s not bringing anyone back from the dead. We might one day study the sonnets of AI Shakespeare, but we won’t know what Shakespeare’s ghost would think of them. You could feed everything I’ve written into a machine-learning algorithm, and ask the model to write that book I’ve been having trouble starting—it won’t be the same book that you’d see if I sat down and wrote it myself.
But the reconstructions are getting very convincing, and they’re not entirely made up. That simulated interview between Joe Rogan and Steve Jobs used real, public information to create their conversation. The language model (like GPT-3) can infer an abstraction of a person’s thinking and apply it to a new area. To take Schopenhauer as an example, a language model would not be limited to collecting his actual thoughts. It would be able to infer his broad approach to ethics and apply it to the topic of horror movies.
Where it gets tricky is that the simulation doesn’t tell you which parts you can trust, and which parts are the algorithm’s guesswork. We can suspend our disbelief for an imagined podcast—but we should keep our skepticism handy when machine learning claims to (for example) predict crime. What we’re seeing isn’t so much a blurring of the line between image and reality, as the finer lines between “definitely,” “probably,” “maybe,” and “sure, whatever.” It’s coming whether we trust it or not.
This is also starting to play with our notion of time, which was always the implication behind singularity. The past and the present, the actual and the potential flatten into a single space with unconstrained flow. Linear progress across centuries loses its firm grip.
If reading this makes you a bit anxious, perhaps consider a quick therapy session. May I suggest ClareAndMe, an AI therapist? And while you’re waiting for an appointment (haha), let’s talk about what comes after image-generating AI.
The past few months will be remembered as the time when Midjourney and Stable Diffusion exploded into our feeds, and we all tried to make a few weird images by writing the new lingo— very sophisticated, futuristic, miniature, paintbrush on canvas, hologram, artificial intelligence, isomorphic, meaning of life and the Universe.
But that’s so Q2. Now we’re onto the next phase. First, AI is starting to generate not only 2D images, but 3D models.
This has the potential make game and metaverse development significantly cheaper, which means democratization of game design and an unprecedented growth of indie gaming. It could be hugely labor-saving. Like any new kind of mass-production, it could also be homogenizing.
How creative can the algorithm really be, beyond remixing the corpus we fed it? Consider that Rogan-Jobs interview, again. A good interviewer can draw out new information and new insight from their subject—the simulated interview, on the other hand, doesn’t say much we don’t already know about Steve Jobs.
In the short term, a lot of artists are nervous. In the long term? Long live the long tail. We can hope the consequence will be more games, larger universes, and more in-game variability. That may be the first thing we imagine, but the whole category of “games” is a little too rigid. Instant 3D model generation will likely transform the nature of digital entertainment— games, VR, AR, metaverse, narratives, and films will find new hybrid forms.
The same democratization and explosive diversity will happen to film. Corporate shorts for education and training are the low-hanging fruit here. For gold-standard AI-assisted face animation, there’s Metahuman.
While it’s incredibly impressive on its own, the animations still feel a tiny bit artificial. But since a few weeks ago, we also know we can layer deepfakes on top of Metahuman. And that’s something else again.
Still, that’s only characters. What will happen to documentary and feature films when it becomes possible to generate whole scenes on demand from text prompts? An explosion. Meta just released it a week ago.
And so did Google. Their text-to-video AI is called Phenaki and—for now—we can see examples up to 2 minutes long.
If you think this still looks a bit lame, remember what image generation looked like last year. You can look ahead to where this is going.
Imagine you want to create a scene for a film. It consists of background, characters, objects, and sound. Each of these will soon be AI-generated from text or image prompts, easily tweaked and tuned, carried over between scenes, and shared between creators. Or we start with an existing scene in a movie, and use AI to generate other related scenes.
There, easy. See this video for a deep dive into the current state of the technology. (Actually, this is one year old already).
So, if your vision of the metaverse still revolves around awkward polygon avatars standing around a table pretending to be real, think again. What’s ahead is a fusion of gaming, VR, AR, metaverse, TV series, movies, documentaries, comics, podcasts, online courses, education, and social media. Not into one psychedelic rainbow blob, but multiple forms that will be as different from today’s categories as podcasts are from the printing press. In one of those metaverses, you will be role-playing the ethics of post-territorial geopolitics with your favorite blend of Arthur Schopenhauer, Marcus Aurelius, and a hint of Aella.
No, this is not what 2025 will look like.
Going further, it reminds me of app store of the future: https://zerohplovecraft.wordpress.com/app-store-of-the-future/ :)
It scares me - how do I identify "virtual" vs. "real" world? Do I need to identify it?