# An AI algorithm that tells you the How and Why or another algorithm

I’ve been working away at my algorithm that finds algorithms using artificial intelligence. It has been pretty successful at finding algorithms that describe mechanically how something works. But just now, perhaps about 2 minutes ago, my latest version not only expressed how something works algorithmically, but why it works that way. It is very cool.

Specifically, I created an sample algorithm that does a process and then after a fixed period of time it switches processes. So my old algorithm could identify both processes but it could not tell you why the algorithm switched; however, it can now say “Oh, and it switches after time X” (it doesn’t actually use natural language of course). Although this example uses a single parameter, time, the algorithm works when there are more parameters and the parameter(s) do not need to be time.

It does have a few limitations. The biggest is that it requires perfect data. Lastly, it is not guaranteed to find a solution if a solution exists (although it has not failed yet). Also it is limited to about five parameters, and no more than twelve processes in the target algorithm. However, I’m only using a single core of a CPU. If I switch this over to using some massive parallelism, I should be able to raise those limits quite a bit.

(The subject line should be “of another algorithm”, not a big deal, but if a mod wants to change it, I wouldn’t object)

That’s an interesting idea, what’s your loss function?

And it wasn’t clear from your post, but how was it explaining the “why?”

This loss function is straightforward. It is a count of the differences between the expected output and the output of the candidate solution.

As for the second question, suppose you have some mechanism A that outputs B (A->B). If A always outputs B, then it is deterministic. If you can infer A->B, in a sense, you’ve determined how something functions. That mechanism A occurs and this produces a B (suppose that the expected output is only a B).

But suppose that A produces B, but also sometimes produces C.

Well, if you can infer that A sometimes produces and B and sometimes produces C, then again in a sense you’ve inferred the mechanisms of the underlying algorithm. What you would not know is why does the output of A vary?

Now, it could be stochastic, and I solved this instance a few months ago. But suppose that A has associated to it some list of parameters A(x,y,z) and again sometimes produces B and sometimes produces C. If the output is not stochastic, but actually determined by the parameters, it would be nice to know what is the condition(s) with those parameters that controls the variance.

So that’s what I solved today.

(Keep in mind all the samples above are trivial and could be done by just looking at them, my algorithm infers the underlying algorithm of a process under much more complex circumstances, but to explain all that would require … well, probably a paper’s worth of space)

Awesome thread. I’ve been trying to learn about AI for some time now. I’d like to try doing some training myself but I’m not sure where to start. I downloaded Python but haven’t opened it at all because I have no idea how to do anything in Python.

So your algorithm can figure out the algorithm for describing, say, how quickly liquid can fill a volume?

I’m going to say maybe, leaning towards probably. When I’m talking about algorithms in this context it is a very abstract sense of the word. My algorithm is not a code generator per se, although for a specific problem the resulting algorithm implies certain code. It is mainly intended for natural processes, since man-made processes are usually pretty well defined algorithmically (that’s not to say it might not have some purpose even in very complex man-made processes, but just not the intent). So it is more like, what is the algorithm, what are the tasks that execute, that describes how a cancer cell grows? And as of today, given a perfect parametric state for the cell over the execution of its algorithm, what factors are causing specific mechanisms to execute in its algorithm?

Back when I was in high school, we’d be given a problem like this: a sphere rests atop a cylinder, the bottom of which becomes a funnel which is attached to a box. Given dimensions for all of these, how long will it take a liquid to fill this shape if poured at such-and-such a rate?

And then we’d have to solve each of those shapes with the formulas we already knew then add up the sums for a total time. When we got more advanced, we’d try and come up with a formula (algorithm) that described the entire process, so we could just put numbers in and get results no matter what combination or configuration of shapes we were given to fill.

So it sounds to me like your AI is doing the same type of thing; aye? It’s figuring out the mathematic description of a process by constantly testing it’s equations against the observable real world results that it’s trying to describe, right?

Although it could, it really isn’t designed to do so. It can infer an equation but that’s not really what it is meant to do (and really it only does it by the happenstance that I’m using somebody else’s technique for inferring equations, so if you put in the data it will spit out an equation).

My algorithm is focused on finding a sequences of tasks that is used by a (natural) process to accomplish, well, whatever that process is trying to accomplish.

So, if we’re looking at tumor growth, for example, the the tumor starts in one state and ends in another state. The process that transforms it from the initial state to the final state (through a sequence of steps) is presumably algorithmic, i.e. it isn’t purely random. It is describable in someway as this happened, and then this, and this and so on. That’s what it finds. What are the mechanisms that occurred to transform something from an initial state to a final state.

Another example, would be plant growth. You start with a bud, and you get a tree. What is the algorithm that describes how that plant grew? And on knowing it grew, then since we know two plants of the same species are similar but not identical, why did they grow in different ways? What were the factors that determined the differences in the growth (assuming they aren’t stochastic)?

Can you provide more specific details? Like plant growth, how are you modeling a plant in your system? what data is being used to indicate the plant has changed states?

Seems interesting, but it’s not clear at what level the “how” is being identified. Seems like you would have to be modeling/simulating a lot of physics and chemistry to arrive at the “how”.

Aye; pretty sure we’re describing the same thing. So how did you start? What was the data set you began with?

Now that’s very cool, especially if you’ve put something together that solves for that in reasonable amounts of time on a local machine for problems more complex than toy problems.

Just thinking about it, I’m not sure how you’d even approach that with anything that isn’t a grid-search or something similar (but grid searches could bog down pretty fast depending on your problem).

If you publish that paper, I’d be interested in reading it - I hope you link it here on the Dope, or send me a PM if you remember after it’s done.

If you’re looking to pick up Python from scratch, I highly recommend code academy for the from-nothing fundamentals and then after that solving the simple and intermediate problems on hackerrank to get familiar with it (both of those are free and have pretty good interfaces and pacing). It’s pretty easy to pick up.

It’s a great language, and at least half of ongoing Data Science and AI / neural net stuff is written in it (or at least has libraries in it), so being conversant will greatly help if you’re getting your feet wet with neural nets or predictive modeling.

Thanks a bunch for this recommendation; I started immediately. the thing is, I also stopped pretty quickly. I’ve had this problem with online courses before: they don’t give you all the information you need to solve some problems or it marks my solution as wrong when it is not. I had Lesson 14 all done, exactly write, but the course said I had it wrong (“points_total is 165 when it should be 100”). When I finally, after 20 different iterations, hit “Solution” it changed nothing but declared it “solved”. And there’s no teacher to ask “what the fuck is up with this” when these anomalies do occur. I find it even more frustrating than sitting in an actual class, to be honest, and it does little to make want to continue with the lessons.

I did appreciate the faux-console interface; it made some of the formatting worry shrink to nothing.

From what I saw, tho, it’s not really that different than the stuff I used to know back in the mid-80s, so I just need to find a course that agrees with me. Again, big thanks for the lead there even if I’m skeptical that I want to continue with their lessons.

What, in your opinion, is the biggest strength there, that I might find a counter to the poorly written descriptions and instructions I’ve seen in my first 14 lessons?

Hackerrank is apparently too advanced for me; I couldn’t even do the first Python 3 challenge.

It is “how” in a sense of the word. Certainly, it is not describing things, for example for plant growth, a cellular level (although you can model cellular growth this way too but then you lose the larger scale, I suppose you could layer them maybe). So, there isn’t a need to simulate the physics or chemistry. The exact process being used is not as vital for this kind of modelling as knowing that a process is executing to do the work (and for an expert, such a plant physiologist, this is sufficient, as they know the biological process underlying the abstract description of the model). Similar models using this kind of abstract thinking about the mechanisms have been done for geological formations, arterial growth, etc, and been found to be quite useful.

My data comes from existing models that were produced by hand and published by domain experts. We’re in the process of getting some data for some plant species that have never been modelled before, so it will be a great test for the algorithm.

My algorithm can infer non-parametric deterministic models (with up to 78 processes in the target algorithm) in under 10 seconds, typically in under 2 seconds. Stochastic models with up to 12 processes can be inferred in about 1.5 hours. And now parametric deterministic models can be inferred in about the same amount of time. 1.5 hours is very reasonable, given that to produce such a model by hand usually takes weeks to months. And again, this is all single CPU core, with parallelism or massive parallelism, the speed could be greatly improved (or the limits increased).

That’s really cool; thanks.

After I went thru that Code Academy first lesson, I called a friend who went back to school a couple of years ago and got some advice on doing the same. I’ll have some time next Friday to go see about enrolling in a class at CSN for the fall semester.

Good luck learning Python. I cannot offer any real advice since I don’t really know Python that well. I know it, but I so rarely use it. I generally use C/C++ just because that’s what I learnt in my undergrad back in the 90s.

Thank you.

As I said, it doesn’t seem all that different from what I remember of COBOL, PASCAL, Fortran, Basic, etc. Heck, it’s even vaguely similar to what I know of HTML, and I’m like 15 years out of date there.

Right now I can do basic math and print stuff on the screen; it’s a start.

So did you write this whole AI or did you start with some basic AI bot and then train it?

Yeah, I write all my own code. It might not be good code, but at least it is mine.