Why do you keep misrepresenting things? The robot was told, “Clean up this mess”. Not, “Pick up the three bits of trash of known types and put the trash in the bins.” There is a huge difference. The robot has to kow what a ‘mess’ is, which is ambiguous. It has to be able to tell that an object on the floor is ‘mess’ and not supposed to be there. And it has to figure out that to ‘clean up’ means to put it not just in a trash bin, but in the bin for the material it found.
A big part of the value of including an LLM in the mix is that you can ‘program’ the robot in English. Saying ‘clean up this mess’ ultimately gets translated by the LLM into hundreds of actions, each of which probably requires many sub-commands, timing loops, feedback loops, object detections, whatever. The LLM handles all of it.
There are other autoomy demos out there. I just picked the first one in the Google search. And a robot could today handle walking down a busy street with ease. Hell, Boston Dynamics robots can do that. They’ve been trialled as pack mules for the military, walking over new terrain all day long, dealing with obstacles, going around people, etc. Navigating unseen terrain has been a solved problem for years.
It should be noted that articles on Forbes are essentially blog posts–it’s more like Substack than an actual news outlet. Even by the poor standards of tech journalism, Forbes is a dumpster fire.
There was no need for a “close examination” of the video. It was stated at the time of the original video that it wasn’t autonomous, so there was no “trick”. It doesn’t require tech knowledge, just reading comprehension.
It’s irrrelevant (for the purpose of this thread) whether the Tesla robot was acting autonomously in the demo - it’s only a question of whether or not it showcases any particular set of physical capabilities - I think it does pretty well on the dexterity; not so much on the ambulation.
Other organisations (not necessarily with any hardware to show) have other projects that would contribute to the construction of a (prototype) autonomous robotic ‘person’ of some kind. I suppose that’s part of an answer to the question: ‘if it’s possible, why doesn’t it already exist?’ - the best examples of each of the various parts needed to do it are not all in the same hands.
Agreed. It has the required level of dexterity for many household tasks. As for ambulation–it’s not bad as it is, but could stand improvement. I’m not sure whether software or hardware is the limit here. A more natural gait might require more sophisticated feet/ankle articulation. But walking can’t really be done with teleoperation, so it’s going to be limited by the software as well, which we can expect to improve. In any case, I think it’s mostly a development problem here, not so much limits with the fundamental tech.
Teleoperation doesn’t necessarily have to mean puppetry of the whole (or even any part of it, all of the time) - a teleoperator of a humanoid robot might have controls to be able to puppet the hands and arms as required, but also some sort of macro functions like ‘pick up that object’ or ‘walk over there and sit down’, which could be delegated to onboard systems.
Boston Dynamics appears to be ahead of Tesla on locomotion and environment handling, but not ahead on the dexterity. Other organisations have done some pretty impressive work on visual processing and object comprehension, but they’re only working with traditional robot arm manipulators; other organisations are further ahead on decision making etc - I feel fairly sure that all of the parts exist in different places to at least make a stab at creating a functional device that could do its own thing, within some obviously limited scope, but the holders of all these parts are competitors.
Boston Dynamics Atlas is a hydraulic robot. It’s designed for power and strength, not fine manipulation. There’s room for many types. Spot is fully electric.
Actually, there are a whole bunch, WITH hardware to show. In fact, I think there are even one or two available for sale. One of the humanoid robots can run as fast as a human. Another is full sized but only weighs 99 lbs.
Sure, but it has no hands. Each offering currently out there seems to be good in some ways and not so great in others, but the superset of best functions across all offerings would be quite remarkable - but you can’t pick and mix.
Yeah, those are pretty impressive demos, especially in view of this being learned, not explicitly programmed.
Cynics will probably not accept that this is the machine acting under its own control, or focus on specific aspects of what it didn’t do (it didn’t plug in the coffee maker or handle the cup, etc), or other current shortcomings such as the delay in processing inputs or speed of movement, but that’s to be expected; I remember having a discussion a while back with someone who, after seeing the capabilities of Honda’s ASIMO, was adamant that it was a suit with a human child inside (despite that you can see daylight through the limb joints) and simply would not accept the reality of what was happening right there.
Given that I am the “cynic” who talked about all the things the robot doesn’t do in that video before, it’s a little alarming to find I’m being mentioned in the same breath as a guy who can’t accept reality.
But never mind plugging the machine in: it is glaringly obvious in that video that the people running the demo don’t expect or trust the robot to pick up the coffee mug, either to put it in the machine or take it out.
What the robot does do is genuinely impressive: what it does not do is significant.
Who here would trust this robot, as it is now, to make a cup of coffee and successfully hand the mug to an elderly person sufficiently frail that it makes sense for them to have help making a cup of coffee? If you do, you are significantly less cynical about it that than the people who made the video, who clearly don’t trust the robot that much.
I do believe this will come. You can see the parts falling into place. But in terms of what we can do “right now” this falls short of meeting the use case.
The voice seems more natural than TTS - I would speculate that it might be voiced by a human who is reading out a realtime text response from the model that is operating the robot.
Sure - I don’t think you think I said the Honda robot wasn’t real.
But you also said a cynic would focus on specific aspects of what it didn’t do, like plugging in the coffee maker or handling the cup). Here’s what I said the first time the video was shared in this thread:
… I mean, it was probably unconscious, but from this end it did feel like a fairly specific reference.
I think it’s fair to say that there are, broadly speaking, two opposing views in this discussion, one that is focused on what they believe are possibilities (and in my own case, I believe those possibilities are pretty nearly ready to be joined up into a coherent whole); the other focused on what they perceive as currently-insurmountable obstacles. Is that fair so far?
Insofar as the bit about cynics not accepting parts of various demos, I wasn’t thinking of you at all. I wasn’t thinking of anyone in particular, but I think scrolling up in this thread, there will be found examples of posts arguing that a thing isn’t possible, just because that specific thing wasn’t evident in one specific example of a demo. This fact is kind of the point of making this thread; it seems to me that all of the pieces of the puzzle exist separately, and they could be assembled together, but nobody has yet done that completely.
Of course I’m interpreting that from where I stand, and you are completely free to characterise my position as overly-optimistic or point out places where I am glossing over a currently-unsolved piece of the problem, or where I am joining dots that are not in fact joined or joinable.
So I largely agree with this - I think it will come and you can see all of pieces. But I do think the assembly bit is a pretty chunky step. Integration of parts into a whole isn’t always seamless and there might be quite a big cost involved in doing that, and it will take longer than we think. (In part this is pure cynicism, I admit: “Everything takes longer and costs more than you think” is a heuristic that comes good depressingly often).
But things like that coffee cup do give me pause. The researchers are clearly working on a very specific task involving putting pods into the coffee machine and have eliminated anything extraneous to that. I am not going to say that this is the wrong approach, they’re the experts. But it does imply that there are a lot of highly specified tasks that robots will have to be trained in before they can be put together into a seamless whole. Which in turn implies quite a lot of compute.
I can see from your example that an LLM probably can act as executive function and create tasks in response to stimuli. But it’s one thing to create a task saying “scoop hedgehog from pond” and another to have an accessible trained behaviour in scooping hedgehogs from ponds.
Does it need to be specific training? I’ve never been trained in scooping hedgehogs from ponds myself, but I have done some pond-scooping in my time and I’m confident I can rise to the challenge if needed. I am less confident that a robot trained to scoop weeds from a pond could apply that safely to a hedgehog - simply because the training videos I see are so highly specified.