… because in a way I’m polling for responses from those who have been down this road. I’m really not too confident about this being the correct forum, so if the moderators want to move it, please do so.
In my job I have the task of making sure that a number of programs and other tasks get completed at the right time and in the right sequence, based on their dependencies on one another. We run this system in a very archaic fashion, namely by submitting jobs to the batch processor and eyeballing the log files afterwards to make sure everything is OK. Most jobs are contingent on upstream jobs having completed successfully, and this is one of the things that we have to manually check. (Yes, we are planning to automate this, but for now assume we’re stuck with the manual system). As bad as managing all of this is, it pales in comparison to the task of manually drawing up a projected schedule of these tasks, and publishing it for all the users. So, being the Prometheus that I am, I came up with a little Java application to do the schedule for me. The way it works is that I have two database tables, one of which defines all the tasks, their estimated durations, and various other data, including in some cases a forced start date. The other table defines the dependencies. The first thing the program does is to use these tables to build a structure of task objects which embodies all of these dependencies–I guess you could call this a graph. Each Task object has a list of “predecessor” jobs, i.e. jobs that have to be done before the current Task can be started.
Once the graph has been built, the next step is to take a starting date and time (controlled by a dummy Task with a forced start date), and simply iterate a loop that increments a date-time counter by one hour, and continue doing this until the end of the processing period is reached. With each iteration, the program simply checks each job for startability by testing whether, based on the current value of the date-time counter, all of its predecessors are likely to be complete.
When a Task is apparently startable, a projected end date is calculated based on the estimated duration; future tasks dependent on the current Task won’t be scheduled to start until after the projected end date has been reached.
Though my program works great and I don’t have any plans to change it, I’m just curious as to what others have done when faced with this type of problem.
There is a whole mathematical area used to figure out these problems, it is known as “operations research.” It’s too heavy to deal with the math here. Suffice to say that Nobel Prizes have been awarded for operations researchers.
From your scenario, I can see several obvious problems.
It is impossible to accurately estimate program runtimes, except for very simple (that is, trivial) problems. This is due to “Turing’s Incomputability Problem.”
Slotting a large number of pieces into a schedule may not have ANY optimal solution, if there are dependencies. See the “Bin Packing Algorithm” for details.
Assuming that different job streams with jobs of difffering length must run in sequence, it is a major fallacy to assume that each job must run to completion. Suspending a long job to let a short job in may lead to operational efficiency being improved dramatically.4. Contrary to common sense, some obvious methods to improve throughput (i.e. buying a second computer) may actually INCREASE the total time to job completion.
So you see, there are many pitfalls to Operations Planning. If it was a job I was serious about, I’d go hire a PhD mathematician. I know I’m sure not up for it, so I don’t have any advice, I don’t know the answers, I just know all the problems.
IIRC, some incarnations of the task-scheduling problem are NP-complete. However, there is one variation that can be solved very quickly by a greedy algorithm; this variation is discussed in section 16.5 of the second edition of Introduction to Algorithms, by Cormen, Leiserson, and Rivest, and there are probably references to other versions discussed therein.
Sounds like basic Program Management. I don’t mean the word “basic” to be demeaning, just that there are plenty of books and applications out there to do program managenent, i.e. to help understand the specific problems of a program and marshall it’s execution.
Perhaps I’m misunderstanding your situation; perhaps you’re already steeped in this stuff, but here are some basic things to consider:
The application you built sounds good. A widely available program for this is Microsoft Project. It’ll generate the start times for everybody and draw various graphical representations of the tasks and dependencies that you can pass out so everyone is aware of who they depend on and who depends on them.
An important concept is the “critical path”, which is a dependency path where there is no slack. To minimize the ultimate start to end times, it’s important to understand and carefully manage the critical path. There can be several critical paths.
If the tasks and dependencies are similar across multiple runs, it’s a good idea to track how they do against how they were predicted to do. This can help with future predictions and help build better processes, since you can realign priorities and dependecies to get slackers out of the critical paths.
It’s a good idea to have someone own the role of minimizing the actual time (and money) for completion of these tasks. This person is typically called a Program Manager. Perhaps you are a PM. You can get lots of info on Program Management via Google.
This is a accounting application, so in this case we actually can come up with a useful estimated duration simply based on recent experience. The duration of a given task doesn’t change that much from month to month. Right now, I’m maintaining the durations manually by updating the database table; the number of tasks is such that it is feasible to do so while still being too large to easily write the schedule by hand.
**
I’m not looking to optimize the actual production process; only to see if there is a better algorithm for calculating the schedule. All my scheduling algorithm has to do is write out a projected schedule that will be printed on paper or e-mailed to the interested parties. It doesn’t actually manage the process; rather its purpose is simply to advise people when things are expected to be done by and/or when they need to have certain tasks of their own completed.
After we implement the automation, we’ll still need to calculate these schedules because, even though the system will be handling most of the day-to-day processing automatically, the users will still demand something that
tells them when various milestones in the process will occur.
**
This is in a multi-processor environment, so I don’t think that should be too much of an issue here. A long job may run concurrently with a sequence of shorter ones; it’s up to the Unix server to assign processors as it sees fit.
**
I’m ruefully aware of this, as I’ve seen the addition of processors and disk space not really help much. But what can I do? Dammit, Jim, I’m just a simple country rogrammer.
**
No offense taken. I’ve been programming a long time but this is a new area for me. I don’t have all the mathematical background but can usually come up with a useable – though perhaps not optimal-- algorithm.
**
The problem with Microsoft Project is that it’s Microsoft :D. Actually, I did use that product for a few months, but
found several problems with it. First and foremost, it crashed my computer continually. Secondly, it seemed to be geared more towards projects where the agents are all people, and if you double the number of people you halve the duration. And third, it didn’t seem to have any way of conveniently duplicating one month’s processes into the next.
**
We do log our run times and will certainly be looking into this as part of the automation project. I can see that the schedule writer I developed probably should be enhanced, or cloned, into a general analytical tool that would look at these issues. Probably I should wait until I learn more Java; I don’t want to be too locked into a non-OOP/non-html format.
Ah, well that is a whole nuther kettle of fish. This is a much simpler problem, it seems like you’re on the right track, basically you’re creating a computerized Critical Path chart based on prior experience, plus a bit of guesstimation. If it was me, I’d probably whip one together in a spreadsheet. But if you’re going to all this trouble to write custom programs, you might as well go whole hog and do process optimization too. Or not…
On the other hand, you could always do what my Grandfather did when people asked him when his work would be done. He always replied, “It will be done on the second Tuesday of next week.”