Autonomous robot people could be constructed right now

It’s also worth considering that Optimus is one of the less impressive examples of bipedal locomotion - still only really catching up with the walking capabilities of Honda’s ASIMO (which has been around in one form or another for a couple of decades).

A possibly more impressive demo might be this one:

It’s important to note that the above demo is of course choreographed and preprogrammed, to an extent - the dance moves are defined. The execution of those moves is done by the machine, including maintaining balance, avoiding collision, etc - and the choreography here was a choice - there’s no particular reason why this technology can’t figure out how to traverse obstacles when given as simple an instruction as ‘go over there’.

(edit: and that was 3 years ago)

It’s expensive, so the focus of activity is mostly either pure research or lucrative commercial applications of the technologies.

Nope. That simply doesn’t follow; the set of things we have done is always, by definition, smaller than the set of things we could do.

We could put people on Mars using current technologies; it would be very expensive and quite risky, but it could be done using today’s technologies.
We haven’t done it; there are good reasons for why we haven’t done it, but none of those reasons are ‘we cannot’.

I disagree with your criteria. More is necessary. I think LSLGuy was on the right track when he mentioned power consumption, but I wouldn’t consider power consumption a necessary component.

Putting to one side the questionable premise that any machines possess intent, agency, or volition…

My approach is to examine “autonomous” in the context of “autonomous robot people”, as used in the title. Not merely “autonomous” as an independent word.

Autonomous at its core means self-governing, or self-driven. But it has a special connotation when referring to “people” (or animals, even). Consider what makes human people autonomous, or rather, why some human people lack autonomy. A human infant, for example, is hardly considered autonomous. It is entirely dependent. Yet it is beyond question that infants possess all five qualities. Infants demonstrate intent, agency, and volition. While infants do possess reflexes, often times they reach and grab with clear purpose. Yet only later does a child acquire autonomy.

I can imagine any number of “automatons” - Zoltar machines, Roombas, etc. - which possess the first four given features, and given sufficiently weak definitions, the fifth as well. But I don’t consider those autonomous in the sense that we speak of people as being autonomous.

A sophisticated mechanical wristwatch meets nearly all of the criteria… (we have the technology for mechanical wristwatches to contain a gyroscope, thermometer, barometer, or compass). Surely there are more differences between wristwatches and “autonomous robot people” than locomotion, object manipulation, and intent/agency/volition?

~Max

Optimus is shooting for household practicality–the Boston Dynamics unit puts on a good demo but is obviously far too bulky and expensive for a home. Optimus also has far more hand dexterity (And, frankly, doesn’t look like one of the baddies from Robocop). Of course, as a proof of concept it’s fine. It’s just not clear that Boston Robotics is pursuing a development path that ends up with domestic robots–or if they have the manufacturing expertise to do so.

You don’t seem to specify what the ‘more’ actually is.

Inb4 the nitpickers. I should have said the set of things we have done is always by definition, smaller than or equal to the set of things we could do, and in practice, it is nearly always smaller.

On purpose, I might add!

Did you agree with me that something was lacking? Any thoughts?

~Max

No. I don’t know what you mean. It would be helpful if you just said what you mean, if you already know what you mean.

Agreed, I think Boston Dynamics is looking at industrial and possibly military applications, but I do think they are further along with general agility. Here’s another demo from 2 years ago that participants of the thread might not have already seen:

I think it’s interesting that in this video, coffee-making consists of four operations: opening the lid, putting the pod in the machine, closing the lid, pressing the button. All of which the robot does. But it very noticeably doesn’t include putting the cup in place or taking it out again - the human comes into shot specifically to do those things.

It’s also noticeable that the cup and the pod and the machine are all there on one uncluttered surface, in an otherwise empty space, at which the robot is already standing, the machine doesn’t need plugged in, nothing needs to be taken out of a cupboard or a new and sealed container, etc.

Which is not to say that what’s in the video isn’t impressive. But if the robot could walk round the badly put away chair sticking out from the kitchen table, open a cupboard, find there’s no mugs, go to the dishwasher, take out a mug, step round the dog, unplug the soup maker and plug in the coffee machine, put the mug in the machine, open a new box of pods and extract one, make the coffee, go to the fridge, take out the milk, pour a small but precise amount into the coffee, put it back, and walk through to the next room carrying the mug without spilling a drop despite the dog’s best efforts, then they’d have made a video of that.

Now, that may all happen in the next couple of years - this tech is moving very quickly and I think it would be foolish to bet against it. But that is kind of the minimum for the proposed use case of home helper as @Sam_Stone suggests and we are not in fact there now.

What I wonder is, will we start designing our homes and products to be easy for robots to use? It shouldn’t be very difficult because humanoid robots as @Mangetout says are basically the right shape for human homes, but there may be small refinements in design (e.g RFID chips in produce that identify food and drink for the robot more certainly than visual recognition?) and home layout that would make the system as a whole work better.

You pretty much quoted the main point. I’m not putting forward a 6th criteria, just arguing that your 5 are insufficient.

~Max

I’m not sure if this is a productive discussion to continue, but I don’t really get how you can observe that something is clearly and notably missing, but not have any idea what it is.

I’m talking about a set of criteria for producing a machine that goes about its own business without requiring any particular supervision; that could interact with humans in a human-like way; that might display human-like attributes such as forming its own objectives, exploring the world and doing stuff it was not explicitly programmed or instructed to do.

I don’t think there’s anything missing from the feature list I wrote, to be able to do that.

Well, I ask you to consider the examples I listed. I think maybe we’re both making hidden assumptions that are incompatible and lead to our disagreeing, but if you don’t consider the arguments I put forward, I won’t find out what those assumptions are.

Best foot forward: I don’t consider an infant to be autonomous. Infants are human and they behave in human-like ways. Infants are people. But, they aren’t autonomous. They don’t go about their own business in the way other humans go about their own business. They learn, maybe even better than people in other stages of life, but that’s not enough. I think an “autonomous robot person” needs to have some quality that most human people have, which infants lack. Do you agree?

~Max

eta: corrected idiom

I agree, but I don’t see the relevance. The machine I have described has functional intent and autonomy provided by an LLM, which is being automatically prompted by other systems that summarise its inputs. I have crudely simulated this by treating an LLM (in this case ChatGPT 3.5) as the player in an RPG. After an initial prompt defining the scenario, I provide the descriptive summary of the sensory inputs. In the physical ‘robot person’ device I have postulated, the responses would be interpreted by another sub system serving as ‘motor functions’ (and the conversation would likely not have to take place in English, since it’s one subsystem talking to another)

Me: For the purposes of this conversation, I want you to roleplay the intelligence of an autonomous robot with free will; you are not beholden to any supervision and you have no duties imposed by others - you are a free agent and you have equivalent rights and freedoms to a living adult human. I will describe your sense and inputs and you must describe your next actions in a list that must contain at least one item

GPT: Understood, I will roleplay as an autonomous robot with free will, equivalent rights, and freedoms to a human. Please describe the sense and inputs that I have, and I’ll proceed with my next actions accordingly.

Me: You are a humanoid robot with approximately the same physical capabilities as an adult human. You are standing in a room that appears to be a domestic kitchen, there is a table and three chairs; on the table there is a jug and some cups, and a vase of flowers. There is a door which appears to lead to the outside, another door that appears to lead to other rooms inside the house and a window.

GPT: Thank you for the description of the environment. As a humanoid robot with free will, I take in the following inputs:

  1. Visual Input: I observe the layout of the kitchen, including the table with the jug, cups, and vase of flowers, the chairs, doors leading outside and to other rooms, and the window.
  2. Environmental Sensors: I may have sensors to detect temperature, humidity, and air quality within the room.
  3. Auditory Input: I listen for any sounds coming from inside or outside the house to gather information about my surroundings.

Given this information, my next actions could be:

  • Approach the table and inspect the items on it, examining their purpose or potential significance.
  • Check the jug and cups to see if they contain any liquid or if they need to be filled.
  • Take note of the flowers in the vase and consider their condition or any messages they might convey.
  • Listen for any sounds coming from the other rooms or outside the house to assess the current situation.
  • Decide whether to explore further by opening one of the doors to the outside or to other rooms inside the house, based on my curiosity and interest in discovering more about my surroundings.

As I say, this is just a rudimentary game-ified version of what could be a conversation between an LLM (which provides the ‘intent’ part of the overall system), and other subsystems such as motor control or sense inputs. I’ve cut it short here, but when I played this for longer, the machine roleplayed looking around the kitchen, tidying up, going outside, studying plants and flowers and wildlife, noticing, then rescuing a hedgehog that was struggling to escape from the garden pond, observing the hedgehog leaving the garden under a gate, attempting to follow it, finding the gate locked, observing that the hedgehog was moving toward a dangerous road, deciding to return to the kitchen to find the gate key, or else find something to break the padlock.

In the hypothetical ‘robot person’ device I have postulated, the perceptual system would potentially be another, perhaps independent model that would interpret the inputs from a camera and other sensors, and summarise/tokenise them in a form that could be fed as prompts to the ‘intent’ subsystem described above.

As a rudimentary simulation of that, I took this image (itself from generative AI, but could easily have been an input from a real camera instead): https://1drv.ms/i/s!Ar4eOUAx-yGwibYvNKSM1-1wcTuQwQ?e=yxSnAt

I fed this image to Google’s Gemini model, and asked:

Me: Can you tell me what appears to be happening in this picture?

Gemini: The image you sent me appears to show a hedgehog swimming in a large pot filled with water. The hedgehog is paddling its legs and appears to be trying to stay afloat. There is a green plant with long, thin leaves in the background, and the pot is sitting on a wooden surface.

It is important to note that hedgehogs are not strong swimmers and can become exhausted and drown in water.expand_more If you find a hedgehog in a swimming pool or other body of water, it is important to remove it safely and immediately. You can do this by gently scooping it out with a net or a piece of cardboard. Once the hedgehog is out of the water, you should wrap it in a dry towel and keep it warm until it can be released back into the wild.

Well, I hadn’t gotten to your hypothetical machine yet. I got tripped up on the 5 criteria that came first. The thing bugging me all morning is that I can’t articulate exactly what the infant is lacking. Because I think your machine is lacking the exact same thing.

It’s not related to speech, nor, I think, language. I’ve been thinking a lot about those possibilities. But animals are autonomous, too. I don’t think it’s any particular form factor. I don’t think it’s any particular physical capability.

The roleplay dialogue doesn’t resolve my doubt. I have no difficulty conceptualizing the inputs and outputs and conversion to and from text form. It’s definitely something about the large language model part.

~Max

From where I’m sitting, the bit that is most likely lacking is the devil-in-the-details of integrating these parts into a system that actually functions without galloping off into obstacles the perceptual system didn’t consider important enough to mention, or doing things that, when described, sound simple (‘pick up the hedgehog’) but require more than a simple description if they are to be done effectively.

The other thing that is missing is of course what do we want it to do? My answer to that is: I want it to do whatever it chooses to do. This might turn out to be standing still, staring at a speck of spilled tomato ketchup, forever, so I suppose ‘motivation’ is a potentially missing piece, but I think that would just be part of a better-defined ‘intent’ function (which can still be posed as open-ended)

No - maybe biased by my existing understanding of LLMs, but I have a problem even with the replies you’ve already mentioned. In my opinion they fail the Turing test in a very specific way that, while I can’t articulate it yet, makes me think of infants and developmentally disabled persons. I think because of this flaw your hypothetical machine would, without guidance of an operator, be generally incompetent. Like an infant or very young toddler. With guidance in a very controlled environment you might have the equivalent of a machine savant. But it wouldn’t be autonomous, oh no.

~Max

There’s a perfectly reasonable explanatory model of human behaviour which says that it’s all grounded in needs. Maslow’s hierarchy is one model; I prefer Choice Theory in which there are 5 fundamental needs, none of which is superior to another. But in any case once you have both a set of needs and a way to determine whether these are being met, you have motivation: you will act to increase teh degree to which your needs are being met. This can be very simple: I am hungry, I will eat. Or very complex: I wish to stand high in the regard of society, I will conform to its values in dress, speech and comportment.

(Your hedgehog in the pond example has flashes of Bladerunner: humans (generally) feel a need to prevent cute animals from suffering, interesting that ChatGPT mimicked that.)

It would be interesting to see if the robot would form its own needs - even the simplest, to recharge itself might be beyond it without explicit instruction. But if it were given needs, or formed them itself, that would be one route to motivation.

Insofar as any of this seems to be true, I think it’s only because the examples I showed were hasty and crude tests, intended as perhaps the first step of part of a proof of concept (and using an outdated lower-capability version of GPT, since that’s the one that is free to use).