OpenAI GPTs: Affordances Along the Path to Ubiquity
Why designing for specific intents and contexts will make LLMs more useful
Another atypical newsletter this time, although standard Thoughts + Things recommendations should return soon! I hope you enjoy this nonetheless.
Editor’s note: Well, timing can be funny. I wrote most of this essay last week prior to the events of the last few days. Most notably, Sam Altman’s ousting as CEO of OpenAI by the board and the subsequent mayhem. We’re not sure what is to come, especially as the vast majority of OpenAI employees are now threatening to quit and theoretically follow Sam and Greg Brockman to Microsoft.
This essay is not about any of that. Rather, I thought some people might find my thoughts on OpenAI’s launch of GPTs at their developer day interesting. Time will tell how relevant any of this is given the rocky state of OpenAI, but I hope some of these ideas will inspire or challenge how you think about LLMs, their interfaces, design affordances, and the importance of specific tools. To the OpenAI team, I find your work inspiring and hope the progress announced recently can find a home in whatever you all end up working on. Godspeed to you -- onwards!
Also, I hope at least a few of you enjoy the option to listen to the post. My first time trying that. I’d recommend listening at >1x speed.
It's often easier to use a specific tool than a general tool. It helps to know precisely what something is used for. Our tools and goals shape each other: more specific goals make it easier to design the tools and vice versa.
(Images via Bret Victor)
Occasionally, tools can become so ubiquitous, flexible, or frictionless that a general tool outperforms a specific one. But this takes time and lots of iterations.
The smartphone is a good example of this evolution. Now, it’s obvious that one flexible device is better than a range of specific products ("an iPod, a phone, and an internet communicator... are you getting it?"). But it took quite a long time to get there and several failed attempts, such as General Magic, Palm, and a range of touchscreen phones. Even leading up to the iPhone's launch, it was obvious that an iPod and a Blackberry were each superior for their respective jobs.
This is often due to technical or design constraints. Making a product with the focused goal of creating the best music player is much easier than designing an all-purpose mobile computer that also plays music well. Generalized tools have competing design goals that can reduce their efficacy for a specific use case. Good design almost always comes from operating within some kind of constraints.
Specific tools can also win out initially for a different reason: they tell us what to do with them. By limiting our options, they make it easy for us to act. There is something wonderful about the simplest tools in the way they afford intuitive use for a given context. Who knows? Maybe we'll even see new kinds of single-purpose hardware again soon.
We're seeing an example of this dynamic play out now with OpenAI's ChatGPT: an exceptional general tool. Their recent announcement of custom GPTs is an exciting step toward leaning into the challenges I've laid out above, using increased specificity to move us toward a panacea-like super tool. While some have already written these off as technically insignificant, I believe they're the foundation for a dramatic expansion in the usability of LLM products like ChatGPT. GPTs afford us more agency by narrowing our intent and context.
ChatGPT's Infinite Options
ChatGPT has taken the world by storm since it launched a year ago. It's a step toward the ultimate general tool: artificial general intelligence in your pocket. The launch had a profound impact by giving OpenAI’s premiere language model, GPT-3.5, an intuitive interface for regular users. Importantly, the underlying technology was not new; it was the same LLM that had existed for several months. Rather, the design and chat-style implementation changed everything and made the world realize that a new paradigm of AI products had arrived. In doing so, it also turned OpenAI from a purely research and developer-focused company into a consumer product company (and one of the fastest-growing of all time). A year later, the experience can feel magical--especially with multi-modal functionality across text, voice, and vision.
I like to ask people how they're using it whenever the topic comes up. A recurring theme, even among those who work in technology, is that they think it's fascinating but don't have a great answer. I imagine it is a bit like the first time consumers got their hands on the Macintosh. In that case, its mouse and graphical user interface were a major design improvement for personal computers, but it would still take a range of new applications for many people to “get it.”
That’s not to discount how far we’ve already come with LLMs. A few obvious use cases are thriving: programming (whether ChatGPT or a more specific tool like Github Copilot), homework (used to do the work explicitly or as a tutor), and now image generation with DALL-E 3 natively integrated. Perhaps it is gaining ground in search as well; I've found myself and friends using it in place of Google search in a range of cases, especially on mobile with voice. Several creative individuals are pushing the frontier of prompt engineering and exploring broader use cases.
Still, I've found it quite rare that someone thinks they're maximizing the potential of such a powerful and flexible tool. It's clearly possible--we’ve all seen a hundred Twitter threads assuring us that we've fallen completely behind if ChatGPT doesn't make all our decisions for us. But we mere mortals seem to be struggling. ChatGPT has an accessibility gap between theoretical capability and practical value for many users: anything is possible, but where do I start? The magic chat box awaits, yet the questions and commands don't come.
GPTs: Agency through Specificity
A major focus of OpenAI’s recent developer keynote was the announcement of GPTs: a new consumer modality for OpenAI's GPT-4 Turbo model. GPTs are fairly simple: they are instances of ChatGPT with "custom instructions, expanded knowledge, and/or actions."
Custom GPTs could have a major impact on proliferating the range of products and uses for LLMs like GPT-4. More importantly--and just like ChatGPT--this is not about a change in the underlying model, and perhaps not even so much about new functionality. Rather, it's about the framing of the tool to the end user and how that can induce more intuitive use and agency across a range of needs.
Even before developers can fully realize what’s possible with new functionality (instructions, knowledge, and actions), I suspect GPTs will produce different expectations and use of the tool. The chatbot interface places user input (questions or commands) at the center of the experience. The context for a specific intent can’t be built up until then. Thus, the way the product and interface shape a user's assumptions about how to use it in various contexts is critical.
To make this more explicit, consider a user trying identical instances of ChatGPT (GPT-4 Turbo model) with no custom instructions or functionality. Just different names in the style of custom GPTs (laundry, math, etc). Their relationship to the job at hand and the perceived capability of the assistant will be changed, and thus their prompts and questions will change accordingly.
When a user chooses a custom GPT, they're far more likely to know how to start talking to the assistant. By scrolling a list of available or popular GPTs, they're looking at ChatGPT's brains spilled out, organized as a set of potential tools and available contexts to get help with. You can imagine most users' first experience with GPTs feeling something like, "Wow, I had never considered using it to help me [fill in the blank] (make a cocktail or understand memes)." GPTs can shape intent at scale and make the general tool more accessible by way of focus.
Even with the fairly recent addition of random "get started" prompts, we could still see the same effect. Custom GPTs should quickly produce better starting prompts before the user has done anything. They have more context implicitly because the user has opted into that GPT. In fact, it’s likely the more specific the GPT’s context, the better these starting prompts will be.
Undoubtedly, experimentation with custom instructions, knowledge, and actions will improve the GPT experience over time. As they proliferate, they'll help us understand various form factors for ways we might use LLMs. It's unlikely that the best cooking GPT will be the first one created, but seeing and using it is likely to prompt lots of remixing, especially when these can be created with natural language. Financial incentives will only accelerate exploration. The GPT store will launch later this month, but here’s an initial directory.
Building Toward Agents and the Universal Interface
One major factor limiting ChatGPT’s usefulness has been its inability to take action for you, especially with other services. It’s clear we’re headed toward commanding intelligent agents that can act on complex tasks for the user. ChatGPT Plugins, which enable these types of actions across various services, launched earlier this year to initial excitement and theoretical utility. Unfortunately, they didn’t things change too much, largely due to a range of friction points including discoverability and requiring pre-selection.
GPTs are a step forward here as they can natively include plugins (now known as custom actions). Ben Thompson (Stratechery) discussed why this is a UX improvement in his recap of the OpenAI keynote:
“I still think [plugins were] incredibly elegant, but there was just one problem: the user interface was terrible. You had to get a plugin from the “marketplace”, then pre-select it before you began a conversation, and only then would you get workable results after a too-long process where ChatGPT negotiated with the plugin provider in question on the answer.
This new model somewhat alleviates the problem: now, instead of having to select the correct plug-in (and thus restart your chat), you simply go directly to the GPT in question. In other words, if I want to create a poster, I don’t enable the Canva plugin in ChatGPT, I go to Canva GPT in the sidebar. Notice that this doesn’t actually solve the problem of needing to have selected the right tool; what it does do is make the choice more apparent to the user at a more appropriate stage in the process, and that’s no small thing.”
Ben goes on to argue that GPTs are not enough, however, and are simply a halfway point toward natively integrating all plugins into stock ChatGPT so it can opportunistically use them (just like browsing or DALL-E in the latest update). Put another way, drop the more specific tool and move straight to the most general:
“The best UI, though, is no UI at all, or rather, just one UI, by which I mean “Universal Interface”... ChatGPT will seamlessly switch between text generation, image generation, and web browsing, without the user needing to change context. What is necessary for the plug-in/GPT idea to ultimately take root is for the same capabilities to be extended broadly: if my conversation involved math, ChatGPT should know to use Wolfram|Alpha on its own, without me adding the plug-in or going to a specialized GPT.”
While Ben may be right in the long run, I believe this view still misses the problem the interface has with context, intention, and agency. I'd posit that even if we could accelerate things to this infinitely capable "Universal Interface" now, many users might still not know how to maximize the tool's capability. While we have a metaphor in mind for what ChatGPT in its ultimate, frictionless form should be (the perfect personal assistant), we have a lot of work to do to improve its intuitive usefulness. By focusing on specific areas, GPTs are one way to add more context today. There will surely be others: hardware devices with real-time audio and visual sensors, invisible software that sees what we see, personalized agents with continuous context windows and lots of user data, and so forth.
Many LLM use cases will use different interfaces than chat, of course. As Logan, OpenAI’s head of developer relations points out, we are moving toward the development of complete agents that can simply and continuously act on our behalf. OpenAI also announced the assistants API, which similarly unlocks more agent-like experiences across the applications we use. It's likely that many of the best "GPTs" will live inside other apps. Still, custom GPTs may be a breeding ground for new types of AI-native apps and interfaces. Many AI products may even begin as MVPs in the form of GPTs. We need end users and GPT developers to wade through the experimental period of creating specific instances of LLM-based tools. In doing so, we’ll better understand the range of intents and capabilities. It’s early, but even extreme specificity is a great place to start creating value.
Much thinking in this domain is focused on the view that the best and only interaction UX with an LLM should be with a single universal tool. Perhaps we'll get there someday, but such claims have been made for technology products and software for a long time (just wait until we get our universal social network that’s been discussed for years! At least there are attempts at the universal messaging app...).
The point is that when we use tools and products, intent and context matter. And for tools as infinitely flexible as LLMs, our agency and goals are defined as much by how the products afford expectations of their usability as by what is actually possible. Even the most dynamic and capable digital "objects should explain themselves."
Here are a few resources if you're interested in diving in. If you build anything and want feedback, please share it with me (my DMs on Twitter/X are open):
OpenAI DevDay, Opening Keynote
The OpenAI Keynote – Stratechery by Ben Thompson
A collection of early examples
AllGPTs - Find All GPTs for ChatGPT in one directory.
GPT Site Search
A GPT for searching GPTs
Thanks to Jonny Cohen, Dylan Eirinberg, Blake Robbins, Andrew Ettinger, Joe Albanese, Ethan Eirinberg, and Linda Dahl, whose feedback improved this essay.