Trying to fuzzify height with thresholds beyond which “tallness” or “shortness” are absolute has its own problems. Suppose that we look at the distribution of heights of adult male humans, and conclude that only a minuscule proportion of men are below 5’, and therefore set any man shorter than 5’ as having a “shortness value” of 1. But now suppose that, after we do that, it turns out that H. floriensis didn’t go extinct after all, and there have been hobbits living among us all this time, and compared to them (average height 3’), even the 5’ men are “tall”. That’s exactly the sort of situation that fuzzy logic is supposed to be able to deal with: You have people who can, by one standard, be considered short, but by another standard, be considered tall. And it’s failing spectacularly.
Now, granted, finding a living population of hobbit humans is highly unlikely. We’ve extensively studied human heights, and know the ultimate limits very well. But there are plenty of other domains of knowledge that we don’t know so well, where fuzzy logic should be useful.
But humans don’t, in general recognise stop signs by first recognising that the object is read, and is octagonal, and says “STOP” in the centre. That is how someone would describe a stop sign to someone else who has never seen one, but normally, if you see a stop sign, you just recognize it as such without going through any inference process involving shapes and colours and text.
It’s similar to how one can “sound out” words by looking at the letters in the word, knowing what each letter sounds like, and making that sound in sequence. But no one with more than the basic competence at reading actually does that. Instead they just recognize words or even sentence fragments without going through the process.
The process by which humans recognise things is just as inscrutable as the process the ML uses.
Chronos, you’ve fought my ignorance on many occasions.
But this seems a terrible attempt at finding a flaw. How can any algorithm, or any model even, deal with a sudden batch of entirely novel empirical data? They would all come unstuck at least at first.
Let’s say we don’t have a threshold under which we classify all heights as “short”.
So, instead, we say an adult male who is 5 feet tall is fuzzified as being 99.6% short and 1.2% tall, say.
How is this better? After we find the hobbits, if no change is made to the fuzzification function, we’re still going to classify almost every 5 foot tall man as “short”.
Yes, any “fuzzification function” will have the same problem. Which is a flaw with fuzzy logic in general. But it’s not a problem with all models. We could, instead, say that a 5’ tall man has a “tallness value” of 60, and a 3’ tall man has a “tallness value” of 36, and a 6’ tall man has a “tallness value” of 72, and a 20’ tall man has a “tallness value” of 240. Our tallness values then do not lie between 0 and 1, but so what? What’s the benefit to restricting them to that range? What benefit does fuzzy logic bring to the table?
Good points all. but I’ll go a step further with these proposed additions to your text:
… normally, if you see a stop sign, at the level of consciousness you just recognize it as such without going through any conscious inference process involving shapes and colours and text.
It’s similar to how one can “sound out” words by looking at the letters in the word, knowing what each letter sounds like, and making that sound in sequence. But no one with more than the basic competence at reading actually does that consciously. Instead they just recognize words or even sentence fragments without going through the process at the conscious level of cognition.
The subconscious parts of the process by which humans recognise things is just as inscrutable as the process the ML uses.
As best we can tell, in humans there is a conscious editor or quality control function at the conscious level that is usually (not always) informed / alerted when the subconscious recognizer function(s) are ambiguous or contradictory enough. “Enough” being a highly vague idea nicely tying us back to the thread topic: fuzzy logic.
Switching to specifics …
As to stop signs you’re right that at the conscious level I’m not ticking the boxes of red, octagonal, etc. But there is research that shows that perturbing those key features results in slower recognition, more errors, and with greater checkpointing by the conscious level of cognition. Anecdotally:
I’ve lived in a Spanish-speaking country where the stop signs were identical to US signs except they said “ALTO”. At first, it seemed to me I was missing those signs or seeing (read “recognizing”) them later than I would otherwise.
It’s not uncommon for wealthy suburbs to have signage ordinances requiring subdued commercial signage. Which inside shopping center parking lots can result in pale tan or lettuce green-toned octagonal stop signs mounted low to the ground to avoid cluttering the relaxing and oh-so-tasteful beauty of high end American retail commerce. Those signs are hard to notice. But once noticed are reliably recognized = classified as stop signs. And once learning takes place, a human can adapt their recognizer to the new attributes of these signs.
To a certain degree (of uncertain amount) one can make up for the environment being extra confusing or non-standard versus one’s habits by actively concentrating harder using conscious techniques like deliberate scene scanning and self-talk. But, much like drunks trying to drive extra carefully, that only goes so far. And only comes into play after the person has had the conscious thoughts that a) I’m having a problem with sign recognition, and b) I’m going to bear down hard on my looking and “paying attention” (whatever that is) to counteract that.
ISTM overall the deep problem with AI today is that so far we’ve managed to duplicate the brain power of (total WAG) slugs and bugs. What’s lacking is the higher executive functions.
IOW … What needs to come out of a recognizer is not a result like “this is/is not a cat”. Instead what needs to come out is “I’m X% percent confident this is a cat and Y% confident this is not a cat”. Note that X+Y != 100%; instead it ranges from 0% to (approaching) 200%.
IMO there also needs to be something above that result that in the “call stack” that can do something useful when X+Y is far from 100%. Right now that is what seems to this interested rank amateur observer to be lacking. With the result that we have “artificial stupidity”, not reliable AI.
It’s early days yet. I am optimistic for the future. But we have a long journey ahead of us, despite the PHB-driven hype blizzard that spins up every few years.
Firstly, as I’ve said, normalizing everything to 0…1, if indeed that’s the range to be used, is purely used internally (unless we’re connecting multiple fuzzy systems).
There is no output to the user of 0.325 or whatever. The output is either a classification like “good candidate” or a specific instruction to a control system like “turn thermostat to 23.2 degrees”.
Secondly, as to why we do this, it’s because it can deliver useful answers in situations where we are handling multiple variables that all have an effect but not in a binary way. Such as in control systems and medical decision making for example.
I’m not here as an evangelist for fuzzy systems btw. AFAIAC it’s just one arrow in the quiver, and one that has lost a lot of its usefulness since deep learning. But your specific argument with the hobbits just. doesn’t. work. – any system would blow up in that hypothetical.
Because, thirdly, for a hypothetical like hobbits being discovered, the impact it would have on any given system depends on what the system does.
Let’s say we’ve made a fuzzy system that outputs a probability of a person having lower back problems in later life. Well…that fuzzy system might well handle crazy new data like hobbits being discovered better than some traditional systems because it would just treat them all as short humans.
In reality though, most real world systems of any type would need to be updated, whether they internally hold height as 0…1, or 36…72 or in units of wuzzle wazzles.
Whatever we mean by “fuzzy logic” (e.g., that truth values are open subsets of [0,1] rather than conforming to classical logic), philosophically this seems related to intuitionistic logic and modal logic, which are arguably useful And, come to think of it, you don’t have to restrict your values to lie between 0 and 1; subsets of the entire real line would work just as well for instance.
In my limited experience, machine learning does a horrible job with (what I call) feature extraction; that is, pulling out what is useful from what is noise. My machine learning work would be considered old school, with support vectors and likelihood estimates, using hand-produced feature algorithms. The expert decides what might be useful and the machine decides how to use it.
The latest fad seems to be to let the machine pick its own features, and simply throw more resources at it until can converge. But that’s both wrong and wasteful, in mine opinion. Wrong, because an expert needs to sanity check the inputs to avoid false features. And wasteful because an expert knows some tricks that can be taught to the machine (like doing edge detection and smoothness filters if those are useful, instead of forcing the machine to reinvent image processing ab initio).
In my experience, a technique is to take a fully-trained DNN image classification model, trained from a large dataset to do some recognition task, and then strip off the top layer, and train a few extra layers on top of the pretrained model to recognize the thing we are actually looking for. The outputs from near the top of the pretrained model serve as abstract features upon which we can train a much simpler model (which requires less computational resources, and less data, to train).
The pretrained model serves as the “expert who knows tricks”.
While no one has yet offered to share their Amazon billing statements with me, I’m guessing at least part of the blame is on cloud costs. Many years ago, we had just finished spending $15K in p3.16xlarge charges (about $27 per hour at the time). One of my employees said “I think Inception V3 might have worked better”, at which point the decision becomes do you spend another $15K and 3ish weeks of training, or do you call it good enough and move forward.
That one single statement caused me to move all of our work off of the cloud and onto several purpose built machines on premises instead.
I assure you that it’s not my cleverness – someone much cleverer than me who did that .
Another thing I’ve seen is training language models on Wikipedia or Reddit data before specializing to your (probably much smaller) dataset. Or even just using a pretrained word embedding is a similar concept.
It also follows the way humans learn. You don’t teach a baby particle physics by starting them with particle physics books. You start with shapes and colours, then the alphabet, then basic reading, then arithmetic, and only after the baby masters those do you start on non-abelian gauge theory.
Except that was exactly the blind alley we went down before we started doing deep learning. People would spend all of this time gently teaching the machine how a human would solve the problem in the face of an absence of training data and it turns out, for many problems, the far more profitable use of human time is to figure out how to collect an infinite amount of training data to train on instead. Once the training data was there, the machines figured out how to solve the problem in a way alien to humans but far more effectively.
Its a fundamental shift in philosophy, do you take the data as given and spend your effort trying to improve the algorithm or take the algorithm as given and figure out how to improve your data collection? Every non trivial problem is a combination of both but we’re slowly discovering for a surprisingly large class of problems, it tilts far more heavily towards the latter than the former.
I think you misunderstand what I am suggesting. My suggestion is that instead of making the machine learn how to do basic processing, give the machine inputs that have useful processing already completed. There is no need wasting resources on the machine rederiving how to do wavelet transforms, FFTs, or any other complicated processes.
For example, if the human expert knows that using a particular wavelet transform is helpful, then pass the data through that wavelet transform (as well as the original data) to the machine. The machine can then decide how to use it.
It’s not about teaching the machine what to do, it’s giving it the same tools a human expert already has.
Which includes “decide not to use it”. There’s generally little cost on throwing everything you have into the model, and let the model tell you what you thought were useful features actually weren’t .
I was never able to find any of the papers talking about the issues surrounding lab => real world. Then last night I saw this article. Note, I do not necessarily agree with everything in the article. Their opinions are their own, but I thought it may be of interest to some of you (also, it shows I wasn’t making it up, people really are talking about this issue ;P).