Ape to API

The rise and future of the Intelligence API (or, shall we say, Homo APIens?)

May 01, 2023

Many people view ChatGPT primarily as a chatbot that has gained significant popularity worldwide. However, its API version offers even more potential. This API allows any software to access the underlying LLM and obtain answers or executable code.

The first order implication is that any software can include ChatGPT as a service. We've witnessed a surge in such services, with Notion AI, Tana AI, and hundreds more launching in the last two months. The second order implication is a revolution in how software works.

A typical software product has dozens of levels that work together. In a simple model, these levels are three: backend, database, and frontend. But when one double-clicks on each of these boxes, more layers come out. And imagine a world, in which each of these layers has access to ChatGPT's API. Each of these layers can then be enriched with a high level of intelligence - and flexibility.

For example, this may mean that the structure of a database will not need to be as precise as now. Data from two different sources can be implicitly linked by a higher layer on the fly, without pre-programming. Without product management. Without a new version release. The data lake (a data layer with many disparate data sources) might finally find its real purpose.

My best guess where this may lead:

Intelligence available on demand (Intelligence API) will change how software is developed. Within a few years, a large segment of programming (20-50+%) will change from explicit to implicit (developer writes an intent, not instructions).

For users, this will mean more flexible software, faster product development, richer features, and much greater automation. Also, the low hanging fruit of small, single purpose software tools will likely get significant competition from a general-purpose AI agent.

But this paradigm shift will take a few years to unfold, because the tradeoffs are still too pronounced today. I will discuss them one by one. What I write about them applies not only to software development, but to other ChatGPT uses as well.

Tradeoff 1: Time

Getting an answer from ChatGPT (and hence GPT API) today takes 5-15 seconds. If we slow down every software layer like this, it will become unusable for many (though not all) uses. Even if the response time were 10x faster, it wouldn't help much. This will likely lead to "intelligence caching," where Intelligence API is only called when the input conditions change significantly (e.g., when a new data source is added).

This is the same principle by which attention is invoked in the human brain: the cheaper subconscious process works quickly and independently until it encounters something unexpected (and a prediction error occurs). At that point, the energy-intensive and slow consciousness is called, which takes over the processing and sends the result back to the subconscious process (agent).

Tradeoff 2 - Accuracy

GPT's accuracy is not 100%, but perhaps 90%. This is not a flaw, but a tradeoff. For many things, this is a sufficient level of accuracy, which gives us a high level of intelligence. Moreover, the (estimated) 90% accuracy does not mean that every answer is inaccurate, but that many are accurate and some are very off. There are plenty of other examples where abandoning 100% accuracy generates enormous benefits. For example, if we were to accept a 1% error rate in chips, we could save more than 75% of energy. Such a 1% error rate for displays and GPUs would be below the threshold of perception. In other words, more than 75% of energy goes to the last 1% of accuracy.

And besides cost, I often say that "truth is a fallacy". What that means here is that our fixation on 100% accuracy is a fixation on an illusion. The long form narrative argument is in my novel, but as as a quick pointer into this rabbit hole - the probabilistic currents in cognitive science (Bayesian brain theory, predictive processing) are gaining prominence, just as probabilistic computing influences chip manufacturing, database design (Bayesian db), programming languages, and deep learning models (TensorFlow Probability).

Prediction: The confusion and disorientation of the post-truth reality in Western society will gradually normalize into a probabilistic approach to reality. Sometime. Maybe.

Tradeoff 3 - Price (of the context window)

Many discussions about the power and quality of models focus on the number of parameters (which are now in the billions for all major models). But, for example, Sam Altman (CEO of OpenAI) recently said that further advances in LLMs will not come from increasing the number of parameters.1

The contextual window and its price are the third major tradeoff of language models. It is the maximum number of tokens that the model can take into account when generating text. GPT3.5 works with 4k tokens, GPT4 with 8k tokens. A GPT4 version with 32k tokens (which is about 40 pages of text) is also being tested. This does not mean 32KB of memory though, not even remotely. Memory requirements currently grow quadratically with the number of tokens, so increasing the window further is a difficult problem. A possible solution was proposed in a paper published last week, with an approach that could shift memory demand from quadratic to linear and scale to 1 million tokens in the contextual window. This would mean that 1500 pages of text could be uploaded to ChatGPT in a single chat. That would be something, right? And again, this would not only open up the chat frontier. While we would chat away with our brilliantly aware conversation partner, the API capabilities would meanwhile bring closer the vision of dynamic, self-improving, autonomous software: the rise of Homo APIens.

10-Year Horizon

Discussion about this post