I think all of the parts necessary to build autonomous robotic intelligent people already exist, and although integrating the parts may present some technical challenges, the hardest parts have already been invented.
I suggest that an autonomous being needs the following features in order to function:
- Senses to receive inputs from the outside world
- Perception - the ability to model the environment and understand the raw sense data
- A body physically capable of locomotion, object manipulation and proprioception
- Motor skills to be able to move the body in an organised fashion and drive the manipulators
- Intent, agency and volition - the ability to decide to do things
Senses are very significantly a solved problem; electronic cameras and sensors to measure pressure, temperature, battery level, etc already exist in wide diversity.
Perception is solved - we have image recognition algorithms, including neural models that can be shown an image and will accurately and reliably describe what it sees in detail.
Physical body - see Boston Dynamics.
Motor skills, I think are solved - there are vision-assisted algorithms that will take natural-language instructions (such as ‘place the green box on the blue tray’) and translate those instructions into coherent movement of manipulating motors.
Which leaves the thinky part. Intent, agency, volition, etc - but this is actually solved using LLMs - people often say that LLMs don’t have any agency, but it seems like it would be incredibly simple to just make that happen. In fact you can do it already yourself - you can (for example) tell ChatGPT ‘For the purposes of this conversation, I want you to roleplay the intelligence of an autonomous robot; I will describe your sense and inputs; you just need to tell me your desired actions’ - and it will act as if it is the player in an RPG.
You do need to keep asking it ‘what do you want to do next?’, but that could literally be performed by a do-while loop that keeps feeding it the descriptive version of the sense inputs (that is, the textual output of the ‘perception’ model), with the text ‘what do you want to do next?’ appended to it - then the output of this LLM can be fed to the ‘motor skills’ model and acted upon.
I tried this a little bit - basically just me as the DM and ChatGPT as the player, in an imaginary domestic setting, starting in the kitchen of a house; first off it seemed to think (without me saying so) that its purpose was that of a servant, so it cleaned and tidied; this wasn’t going anywhere, so I let it discover a note telling it that it was the owner of the house and could do anything it chose to do; it went outside, looked around the garden, studied the plants and wildlife, rescued a drowning hedgehog from the pond, then wanted to continue exploring, observing and doing stuff.
I’m not sure what would happen in an extended session like this - LLMs have a tendency to go a bit insane after a while, so maybe it would eventually go on a killing spree or something.
Of course, I am not arguing that such a machine would have any inner experience of consciousness or anything like that, only that the parts necessary to build it and let it loose all pretty much exist already.
What have I missed out?