AI’s growth needs the right interface [MIT Tech Review]

August 28, 2024August 28, 2024 Cliff Kuang 0 Comments

View Article on MIT Tech Review
If you took a walk in Hayes Valley, San Francisco’s epicenter of AI froth, and asked the first dude-bro you saw wearing a puffer vest about the future of the interface, he’d probably say something about the movie Her, about chatty virtual assistants that will help you do everything from organize your email to book a trip to Coachella to sort your text messages.

Nonsense. Setting aside that Her (a still from the film is shown above) was about how technology manipulates us into a one-sided relationship, you’d have to be pudding-brained to believe that chatbots are the best way to use computers. The real opportunity is close, but it isn’t chatbots.

Instead, it’s computers built atop the visual interfaces we know, but which we can interact with more fluidly, through whatever combination of voice and touch is most natural. Crucially, this won’t just be a computer that we can use. It’ll also be a computer that empowers us to break and remake it, to whatever ends we want.

Chatbots fail because they ignore a simple fact that’s sold 20 billion smartphones: For a computer to be useful, we need an easily absorbed mental model of both its capabilities and its limitations. The smartphone’s victory was built on the graphical user interface, which revolutionized how we use computers—and how many computers we use!—because it made it easy to understand what a computer could do. There was no mystery. In a blink, you saw the icons and learned without realizing it.

Today we take the GUI for granted. Meanwhile, chatbots can feel like magic, letting you say anything and get a reasonable-sounding response. But magic is also the power to mislead. Chatbots and open-ended conversational systems are doomed as general-purpose interfaces because while they may seem able to understand anything, they can’t actually do everything.

In that gap between anything and everything sits a teetering mound of misbegotten ideas and fatally hyped products.

“But dude, maybe a chatbot could help you book that flight to Coachella?” Sure. But could it switch your reservation when you have a problem? Could it ask you, in turn, which flight is best given your need to be back in Hayes Valley by Friday at 2?

We take interactive features for granted because of the GUI’s genius. But with a chatbot, you can never know up front where its abilities begin and end. Yes, the list of things they can do is growing every day. But how do you remember what does and doesn’t work, or what’s supposed to work soon? And how are you supposed to constantly update your mental model as those capabilities grow?

If you’ve ever used a digital assistant or smart speaker, you already know that mismatched expectations create products we’ll never use to their full potential. When you first tried one, you probably asked it to do whatever you could think of. Some things worked; most didn’t. So you eventually settled on asking for just the few things you could remember that always worked: timers and music. LLMs, when used as primary interfaces, re-create the trouble that arises when your mental model isn’t quite right.

Chatbots have their uses and their users. But their usefulness is still capped because they are open-ended computer interfaces that challenge you to figure them out through trial and error. Instead, we need to combine the ease of natural-language input with machines that will simply show us what they are capable of.

For example, imagine if, instead of stumbling around trying to talk to the smart devices in your home like a doofus, you could simply look at something with your smart glasses (or whatever) and see a right-click for the real world, giving you a menu of what you can control in all the devices that increasingly surround us. It won’t be a voice that tells you what’s possible—it’ll be an old-fashioned computer screen, and an old-fashioned GUI, which you can operate with your voice or with your hands, or both in combination if you want.

But that’s still not the big opportunity!

Why shouldn’t we be able to not merely consume technology but instead architect it to suit our own ends?

I think the future interface we want is made from computers and apps that work in ways similar to the phones and laptops we have now—but that we can remake to suit whatever uses we want. Compare this with the world we have now: If you don’t like your hotel app, you can’t make a new one. If you don’t want all the bloatware in your banking app, tough luck. We’re surrounded by apps that are nominally tools. But unlike any tool previously known to man, these are tools that serve only the purpose that someone else defined for them. Why shouldn’t we be able to not merely consume technology, like the gelatinous former Earthlings in Wall-E, but instead architect technology to suit our own ends?

That world seemed close in the 1970s, to Steve Wozniak and the Homebrew Computer Club. It seemed to approach again in the 1990s, with the World Wide Web. But today, the imbalance between people who own computers and people who remake them has never been greater. We, the heirs of the original tool-using primates, have been reduced from wielders of those tools to passive consumers of technology delivered in slick buttons we can use but never change. This runs against what it is to be Homo sapiens, a species defined by our love and instinct for repurposing tools to whatever ends we like.

Imagine if you didn’t have to accept the features some tech genius announced on a wave of hype. Imagine if, instead of downloading some app someone else built, you could describe the app you wanted and then make it with a computer’s help, by reassembling features from any other apps ever created. Comp sci geeks call this notion of recombining capabilities “composability.” I think the future is composability—but composability that anyone can command.

This idea is already lurching to life. Notion—originally meant as enterprise software that let you collect and create various docs in one place—has exploded with Gen Z, because unlike most software, which serves only a narrow or rigid purpose, it allows you to make and share templates for how to do things of all kinds. You can manage your finances or build a kindergarten lesson plan in one place, with whatever tools you need.

Now imagine if you could tell your phone what kinds of new templates you want. An LLM can already assemble all the things you need and draw the right interface for them. Want a how-to app about knitting? Sure. Or your own guide to New York City? Done. That computer will probably be using an LLM to assemble these apps. Great. That just means that you, as a normie, can inspect and tinker with the prompt powering the software you just created, like a mechanic looking under the hood.

One day, hopefully soon, we’ll look back on this sad and weird era when our digital tools were both monolithic and ungovernable as a blip when technology conflicted with the human urge to constantly tinker with the world around us. And we’ll realize that the key to building a different relationship with technology was simply to give each of us power over how the interface of the future is designed.

Cliff Kuang is a user-experience designer and the author of User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play.

Spread the word!

Leave a ReplyCancel reply