Figure (Open AI) release a status update on its humanoid robot

05032-Mendicant-Bias · March 17

Summary

Figure is working on Humanoid robots, it is related with OpenAI, Nvidia and Microsoft in founding and partnership

A few days ago, Figure released a sharply produced status update showcasing a surprisingly natural and surprisingly high performance demo of speech to text, reasoning, text to speech, accurate descriptions of the world, natural HMI interaction, all very fast, accurate and precise

Quotes

Quote

"hey figure one what do you see right"

"now I see a red apple on a plate in the center of the table, a drying rack with cups and a plate and you standing nearby with your hand on the table"

My thoughts

The previous demo from figure was very believable with a humanoid interacting with a warehouse, it's about what you expect from state of the art humanoids

The torso kitchen demo insteads shows an impressive leap forward across the board. There has several instances that makes me believe the demo be staged to an extent:

The interface shoud describe the task the robot is about to execute, before acting on it, it seems really optimistic to trust the inference to have gotten the task right one-shot. You really want the robot to tell you it's about to pick a plate and letting it fall on the floor.
At the timestamp, the speech to text inserted a very human stutter, it could be the speech to text adding emotional sub context to communicate the level of confidence, but it seems far fetched all other tasks had none of this feature. It might be an emergent behaviour of a multimodal GPT5 trained with sound tokens and transcripts, but it's a really big leap forward if true.
At the timestamp the prompt is multitasking, again, it's really optimistic and trusting. "while you pick up the trash" is a very ambiguous command to begin with, it's optimistic to assume the model would choose amongst the ambiguous interpretations the intended one. I would have expected at least a confirmation prompt: "I'm about to pick up the bags in front of me and move them to the basket to my left."

Now, it could be a showcase of an unreleased multimodal reasoning optimized GPT5, hooked up to a huge stack H100s that has gotten getting much better at multimodal reasoning, Open Ai was rumored to have a work in progress with much more impressive one shot reasoning capabilities.

I think it's more likely that this is a cherry picked demo, with at least some elements not run in real time and staged. Having such a seamless interaction between the humanoid robot and the model is not trivial, other companies are achieving that by having warehouses full of industrial robots trying to pick stuffs up, like Deep Mind is doing.

Still, it's only a matter of time before this level of interaction becomes realistic, but likely it'll be extremely expensive

Sources

https://www.figure.ai/

https://www.youtube.com/@figureai