Can big data, everpresent sensors and AI replace the scientific method

I’m not sure if the training set would have to be that large. But one of the reasons image recognition is taking off these days is that you can just direct the system to find pictures labeled cat and dog on Facebook or Google or wherever. You no longer have to show it anything.
One way you can do this is through clustering. You place the images in an n-dimensional space where n is the number of useful features. 4 legs is not too useful here, whiskers would be. You adjust the weights on these until you can draw a line between images labeled cat and images labeled dog. (With some small degree of mislabeling.) Then you apply to a larger set to see if your training set worked. So success is defined by the labeling and some metrics of how much separation you get and how many mistakes you make.
So in a sense the system does inform you that it figured out the difference between cats and dogs - but not in a verbal way.

Expert systems were rule based systems, which was my point. It isn’t a black box really - you can look inside to see the weighting and for neural networks what they look like. We can understand it at a high level, but maybe not the details - but I don’t think there is any one person who understands the full details of Win 10 or how the Google algorithms work. I’m not sure I could say I understood the data mining system I wrote myself - it was over a period of years and way too complex to fit into my head.

This I agree with. One of my AI textbooks was from 1959, and it was just around the corner then. We are going to make a theme of our conference next year AI - it is really machine learning, but AI is a sexier term. That is why we hear so much about AI today, and why people confuse machine learning with real AI.

I would draw a distinction between things that aren’t intuitive and things that are incomprehensible. Quantum mechanics is unintuitive, but you can go do the math and get an answer. You can say why the particle tunnels through the potential well or whatever. If you have a 1,000 layer neural network, how the activation of neuron 4 in layer effects the output is so obscured that it’s basically a black box. There isn’t understanding of any underlying mechanism.

Now, hopefully people develop tools that allow us to have some understanding, but it’s current very vague.

This won’t really be Big Data - there won’t be enough of it. It is tempting to use big data techniques on little data, but you get misleading results. I never had enough volume in the stuff I studied to say anything statistically interesting.
Searches are good for finding correlations between things in different fields, and there is a concept called the Semantic Web which might be of use. It is nothing new - lots of advances have been made when people in one field learn another and bring concepts from the first to the second.

Bolding mine.

Not disputing in any way your post or your expertise. But the bolded part brings up an interesting conundrum. Perhaps one worth discussing.

How many of those cat and dog labels on all those pix were applied by humans and how many were applied by Google’s or Facebook’s AIs?

If orgs are training their AIs using input that was generated by other AIs, there’s some risk of a feedback loop developing that goes off the rails. Sorta like the dueling pricing bots on eBay working themselves up to $10 million for a pair of socks.

Yes, humans are in the loop some. But if Facebook starts using Google’s image labels as learning inputs while Google is using Facebook’s, we might get some very “interesting” interpretations of what’s what.

It might be fun to watch. Then again, if in some future world the police departments’ automatic face recognition stuff gets confused and starts recognizing toddlers as crazed gunmen we *might *have problems.

Ah, you are right. I made the common mistake of just talking about “big data” as if it just means “lots of data”.
But in fact it doesn’t refer to a specific size of data; it’s more about to what degree we can process records, and the significance of individual records.