As I understand it, sight, hearing, and (probably) touch are pretty linear sensations. Hearing is simply a matter of detecting fluctuating pressure. There’s more pressure, there’s less pressure, but it’s a single variable. Lightwaves effectively, could be classified by two variables, wavelength and strength. Our eyes have opted to specialize receptors for certain ranges of wavelength, so that the variable count is expanded, but it wasn’t technically necessary for that to be done. Touch is basically, again, just pressure. Hearing and touch both have a “location” variable, I suppose, but still it’s pretty straightforward what the signals are that are being sent into the brain.
Now a molecule is a set of atoms that are bonded together, into a particular shape. Swap out the atoms with different ones, but preserve the shape, and you have a different molecule. Use the same atoms but arrange them into a different shape and you have a different molecule.
Taste and smell are based on having receptors that react with a wide variety of different molecules. It’s sort of like those wood boards with holes cut out for square blocks, round blocks, and triangular blocks. If you try to put a block that doesn’t fit into one of the holes, well it doesn’t fit, it doesn’t go in. Find the right shape and it will go through. In the case of our senses, the wood board (e.g., our tongue and olfactory system) has been shaped to have slots for specific molecules and if a molecule which is floating by fits into the keyhole, then a signal will go to the brain. If it doesn’t fit, it will just bounce off and keep going past different slots which, again, may or may not fit the shape of the molecule (or some branch off of it, like a thiol).
In a sense, this is a simpler system than the other ones, since it’s just a “present” or “not present” (boolean, for programmers and mathematicians) variable of whether a molecule has slipped in to fit the slot. Other senses are a range of values. But, there may be hundreds or thousands of different slots that support different types of molecule, and there are probably thousands or millions of duplicates of each type of receptor and so a “stronger” signal is sent when more of these sensors are triggered at once for one particular type of molecule, so we can tell how large a component of the smell/taste the molecule is.
But so basically each molecule that is supported is like its own sense, somewhat independent of every other type of molecule that can be detected. Or, you might think of it like our light sensors, where we have three (or four, if you count our low-light grayscale sensors), each receptive to a particular subset of all possibilities, except we have hundreds or thousands of different types of them instead of just three or four.
A machine would need to work in a somewhat similar way, being able to interact with each type of molecule that we can detect, and able to interpret it in a similar way to how we are hardwired to detect it. This means either having machinery which can image molecules and recognize what all bits could interact with a human receptor, or having microscopic receptors of its own. And it means determining how humans are hardwired to interpret different molecules. Some of those are probably straightforward. Thiols are universally bad. Nearly all humans are instinctively put off by receiving a thiol. But some molecules are probably left to experience or are a bit randomly defined. Potentially, they vary by country and what their cuisines consist of. One molecule might be a marker of “delicious” in one cuisine and a marker of “spoiled” in another. We have to be taught by our upbringing how to interpet those molecules. And so you would need to isolate the molecule and do population studies by country in order to make a determination for how a person would interpret the sensation, so that you could make a flavor or scent for people in that market.
And that’s assuming that we interpret each molecule indpendently of one another. Like light, we probably don’t interpret each of the three colors independently of the other. They’re merged together, by our brain, into a single “color”. Likely, our nose and tongue merge together the sensations to create a unified “flavor” that is a mixture of the subcomponents that make it up.
Making a machine that can recreate this, then, means not just isolating the individual molecules and performing market-specific research for each of them, but also mixing those molecules and testing those out, and using data mining to make determinations of what does and doesn’t “go together”.
Overall, it’s going to take some effort. It’s a lot harder than the other senses.