k.zine

Share this post

077. The main limitation of current AI interfaces

kzhai.substack.com

077. The main limitation of current AI interfaces

In service of the machines, we became like them

Kevin Zhai
Mar 18
4
2
Share this post

077. The main limitation of current AI interfaces

kzhai.substack.com

In 2011, Bret Victor wrote a great essay criticizing our limited imagination when it came to creating the future of technology. Even a decade later, his criticism still stands: we have neglected the innate capability of our bodies in favor of poking at a glass screen with a single finger.

Microsoft concept video

We take for granted how nuanced our hands evolved to interact with reality. In the following photo, notice how much the position of the fingers can vary and how differently each object is balanced in the hand.

credit: Bret Victor

Victor provides examples that seem utterly primitive on the technological spectrum, and yet they still provide more tactile feedback than the latest iPhone:

Go ahead and pick up a book. Open it up to some page. Notice how you know where you are in the book by the distribution of weight in each hand, and the thickness of the page stacks between your fingers. Turn a page, and notice how you would know if you grabbed two pages together, by how they would slip apart when you rub them against each other.

Go ahead and pick up a glass of water. Take a sip. Notice how you know how much water is left, by how the weight shifts in response to you tipping it.

In comparison, the flat, glassy pane of an iPad has no connection whatsoever with the task it’s performing. Victor calls this paradigm “Pictures Under Glass”, and his stance is clear:

Pictures Under Glass is an interaction paradigm of permanent numbness. It’s a Novocaine drip to the wrist. It denies our hands what they do best. And yet, it’s the star player in every Vision Of The Future.

We may be spiraling towards a similar local maximum with AI and its star player: text.

Those who truly understand the promise of large language models, prompt engineering, and text as a universal interface are retraining themselves to think in a new way…

The most complicated reasoning programs in the world can be defined as a textual I/O stream to a leviathan living on some technology company’s servers.

1

Text is trivial for computers to store and process, and it’s how programmers interface with computers. However, text takes concentrated mental effort for humans to parse. Which is easier for a human to read: a comic book or a dictionary? How about for a computer?

In contrast, video is expensive to store and to process for computers, while humans easily watch YouTube at double-speed. We evolved to intuitively maneuver through 3-dimensional spaces and identify objects without conscious effort. We can easily recall the layout of our childhood homes, and our ability to recognize faces is almost excessive.

see: pareidolia (img src)

So how do we currently leverage these innate abilities with AI? That’s the neat part – we don’t! We instead are learning how to deform our language to better suit the machine. The most cutting-edge technology for creating art is wielded via noun snippets and hyphens on a platform designed for video game chatter. You have to wait in line to even get your results, and the feedback loop is selecting from a grid of 8 homogenous buttons.

Welcome to the future!

Compare that to the simple act of applying a paintbrush to paper, adjusting the pressure as you see the line being left behind, or even just picking the next color you want to use from a box of crayons, visually scanning for just the right hue.

In Victor’s essay, he defines a tool as something that addresses human needs by amplifying human capabilities. A great tool is designed to fit both the problem and the person.

credit: Bret Victor

So is text really the best fit for both sides? Are we really amplified as humans by contorting our thoughts into a little prompt box?

If the best way to predict the future is to invent it, then we still have some work to do.

To me, claiming that Pictures Under Glass is the future of interaction is like claiming that black-and-white is the future of photography. It's obviously a transitional technology. And the sooner we transition, the better.

k.zine is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

1

https://scale.com/blog/text-universal-interface

2
Share this post

077. The main limitation of current AI interfaces

kzhai.substack.com
2 Comments
Christin Chong, PhD
Writes Christin’s Newsletter
Mar 18Liked by Kevin Zhai

another essay that hits it out of the park

after my retreat i've been moving back to paper books and trying to not be on the computer and phone as much as possible b/c I realized that it takes me out of my body / sense of the present moment

Expand full comment
Reply
1 reply by Kevin Zhai
1 more comment…
TopNewCommunity

No posts

Ready for more?

© 2023 Kevin Zhai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing