Using GPT for backend logic is 10x harder

Mar 29, 2023

(Screenshot from Categorcle, a toy app that uses GPT behind the scenes to train word-finding ability in aphasia patients, and creativity in everyone else!)

Over the past few weeks, we’ve been flooded with examples of GPT doing incredible things. It can write poetry. It can write code for you. It can build you a website. It can build AWS from scratch. It will write your social media posts. It will write all of your emails. It will go to your meetings instead of you.

GPT is a mind-boggling leap forward and the technology will surely play a major role in the future. But nearly all of the hype filled examples that have been circulating recently are more akin to magic tricks than how to guides. What I mean by this: it’s a lot more eye catching to claim that “ChatGPT can X for you in 5 mins” than to say, “I prompted ChatGPT over 167 times trying to get it to say what I wanted, in a domain I was already skilled at, and after several hours, it finally gave me something halfway decent”.

I don’t mean to minimize the potential of this tool, but I want to stress that right now it’s just that: a tool. I’ve had the experience several times now of seeing an example of someone pulling a seemingly incredible output from ChatGPT. But what you don’t see is the entire path they took to get there — only the finished product. Contrary to what you see, GPT is hard to use.

First of all, you need to be able to specify exactly what you want. If your prompt is “Write me a poem”, you will almost certainly not get what you want. Even when prompts are very specific, some portion of the time, you still don’t get what you want, but in extremely unpredictable ways.

Then there is the hallucination problem. When asking GPT about areas that involve specialized expertise, but you yourself are not an expert in, you’ll have to do your own fact checking. And when you’re asking it about areas that you are an expert in, the hallucinations aren’t that problematic — they’re just annoying. Imagine searching Google for “How to write X in python” and getting articles that recommend you install and use libraries that don’t exist.

Cherry Picking

Whenever I have personally tried to reproduce some amazing GPT or Stable Diffusion behavior I’ve seen online, I’ve found it significantly more involved than what the posts let on. Even if they include the prompt, I’ve found that getting good results from stable diffusion entails a very high degree of chance. It’s a lot more like photography than programming: You take 1000 photos, edit 10-50, then you pick 1 really good one.

Using GPT for backend logic is 10x harder

Most of the well advertised use cases of the GPT API thus far have been chatbots. This is unsurprising, as it’s the proven format of the technology. It’s also the least creative and most likely way to increase the value of your company.

The above problems I mentioned aren’t that bad given a chatbot interface because you have a human in the loop which can guide / learn / error correct.

In situations like this, when the goal of the application is to generate some end-product (a thoughtful essay, a beautiful painting) that the user is able to evaluate, the user is able to use the application in a feedback loop to guide the AI to what they want. The program is a tool, and the “errors" are that the user didn’t perfectly write the prompt. These hurdles aren’t an issue when you are sitting down to create something. As many have discovered, this process can be quite fun (although in a way, it can feel similar to gambling).

But this is not the case for a system which uses GPT behind the scenes. If the prompt doesn’t work correctly, you have to tell the user “oops, hopefully this works next time” with no real feedback. The user might be in the loop, but they can’t correct a prompt being run for backend logic.

Your error rates need to be significantly lower if you intend to be able to make decisions on the fly, especially if you will make multiple dependent decisions in series. If you are able to build in some way to allow the user to correct / override the AI logic, perhaps you won’t need perfect accuracy. But in general, real-time logic is less flexible than the reactive loop we have in a chatbot interface.

Creative software and personalization

The experiences you have with technology are going to gradually get a lot less scripted and a lot less predictable. They’ll start being more creative and even more personalized. I’m reminded that, in its early days, many saw the internet as a utopic way to connect people better than ever before. In some ways, that did happen, but with it came an unforeseen amplification of tribalism and a sterilization of human interaction. With advanced AI, perhaps the technology will become so optimally personalized that we discover the works now thought of popularly as “creative” are actually formulaic — we just didn’t yet perfectly understand the formula.

Speculative Inference

Using GPT for backend logic is 10x harder

Cherry Picking

Using GPT for backend logic is 10x harder

Creative software and personalization