A skeptic’s guide to humanoid-robot videos [MIT Tech Review]

August 27, 2024August 27, 2024 Thomas Giboney 0 Comments

View Article on MIT Tech Review
This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here.

We are living in “humanoid summer” right now, if you didn’t know. Or at least it feels that way to Ken Goldberg, a roboticist extraordinaire who leads research in the field at the University of California, Berkeley, and has founded several robotics companies. Money is pouring into humanoid startups, including Figure AI, which raised $675 million earlier this year. Agility Robotics has moved past the pilot phase, launching what it’s calling the first fleet of humanoid robots at a Spanx factory in Georgia.

But what really makes it feel like humanoid summer is the videos. Seemingly every month brings a new moody, futuristic video featuring a humanoid staring intensely (or unnervingly) into the camera, jumping around, or sorting things into piles. Sometimes they even speak.

Such videos have heightened currency in robotics right now. As Goldberg says, you can’t just fire up a humanoid robot at home and play around with it the way you can with the latest release of ChatGPT. So for anyone hoping to ride the AI wave or demonstrate their progress—like a startup or an academic seeking lab funding—a good humanoid video is the best marketing tool available. “The imagery, visuals, and videos—they’ve played a big role,” he says.

But what do they show, exactly? I’ve watched dozens of them this year, and I confess I frequently oscillate between being impressed, scared, and bored. I wanted a more sophisticated eye to help me figure out the right questions to ask. Goldberg was happy to help.

Watch out for movie magic

First, some basics. The most important thing to know is whether a robot is being tele-operated by a human off screen rather than executing the tasks autonomously. Unfortunately, you can’t tell unless the company discloses it in the video, which they don’t always do.

The second issue is selection bias. How many takes were necessary to get that perfect shot? If a humanoid shows off an impressive ability to sort objects, but it took 200 tries to do the task successfully, that matters.

Lastly, is the video sped up? Oftentimes that can be totally reasonable if it’s skipping over things that don’t demonstrate much about the robot (“I don’t want to watch the paint dry,” Goldberg says). But if the video is sped up to intentionally hide something or make the robot seem more effective than it is, that’s worth flagging. All of these editing decisions should, ideally, be disclosed by the robotics company or lab.

Look at the hands

A trope I’ve noticed in humanoid videos is that they show off the robot’s hands by having the fingers curl gently into a fist. A robotic hand with that many usable joints is indeed more complex than the grippers shown on industrial robots, Goldberg says, but those humanoid hands may not be capable of what the videos sometimes suggest.

For example, humanoids are often shown holding a box while walking. The shot may suggest they’re using their hands the way humans would—placing their fingers underneath the box and lifting up. But often, Goldberg says, the robots are actually just squeezing the box horizontally, with the force coming from the shoulder. It still works, but not the way I’d imagined. Most videos don’t show the hands doing much at all—unsurprising, since hand dexterity requires enormously complicated engineering.

Evaluate the environment

The latest humanoid videos prove that robots are getting really good at walking and even running. “A robot that could outrun a human is probably right around the corner,” Goldberg says.

That said, it’s important to look out for what the environment is like for the robot in the video. Is there clutter or dust on the floor? Are there people getting in its way? Are there stairs, pieces of equipment, or slippery surfaces in its path? Probably not. The robots generally show off their (admittedly impressive) feats in pristine environments, not quite like the warehouses, factories, and other places where they will purportedly work alongside humans.

Watch out for empty boxes

Humanoids are sometimes not as strong as the videos of their physical feats can suggest; I was surprised to hear that many would struggle to hold even a hammer at arm’s length. They can carry more when they hold the weight close to the core, but their carrying capacity varies dramatically as their arms are outstretched. Keep this in mind when you watch a robot move boxes from one belt to the other, since those boxes might be empty.

There are countless other questions to ask amid the humanoid hype, not the least of which is how much these things might end up costing. But I hope this at least gives you some perspective as the robots become more prevalent in our world.

Now read the rest of The Algorithm

Deeper Learning

We finally have a definition for open-source AI

Open-source AI is everywhere right now. The problem is, no one agrees on what it actually is. Now we may finally have an answer. The Open Source Initiative (OSI), the self-appointed arbiters of what it means to be open source, has released a new definition, which it hopes will help lawmakers develop regulations to protect consumers from AI risks. Among other details, the definition says that an open-source AI system can be used for any purpose without permission, and researchers should be able to inspect its components and study how the system works. The definition requires transparency about what training data was used, but it does not require model makers to release such data in full.

Why this matters: The previous lack of an open-source standard presented a problem. Although we know that the decisions of OpenAI and Anthropic to keep their models, data sets, and algorithms secret makes their AI closed source, some experts argue that Meta and Google’s freely accessible models, which are open to anyone to inspect and adapt, aren’t truly open source either, because licenses restrict what users can do with the models and because the training data sets aren’t made public. An agreed-upon definition could help. Read more from Rhiannon Williams and me here.

Bits and Bytes

How to fine-tune AI for prosperity

Artificial intelligence could put us on the path to a booming economic future, but getting there will take some serious course corrections. (MIT Technology Review)

A new system lets robots sense human touch without artificial skin

Even the most capable robots aren’t great at sensing human touch; you typically need a computer science degree or at least a tablet to interact with them effectively. That may change, thanks to robots that can now sense and interpret touch without being covered in high-tech artificial skin. (MIT Technology Review)

(Op-ed) AI could be a game changer for people with disabilities

It feels unappreciated (and underreported) that AI-based software can truly be an assistive technology, enabling people to do things they otherwise would be excluded from. (MIT Technology Review)

Our basic assumption—that photos capture reality—is about to go up in smoke

Creating realistic and believable fake photos is now trivially easy. We are not prepared for the implications. (The Verge)