Hi, I am trying to come up with a formula that would let me evenly distribute recordings onto CDs, so each CD has roughly the same amount of time (or number of tracks). For example…
One audio CD holds 80 minutes. Lets say I have 50 recordings or tracks, totaling 85minutes. I’d like to distribute the recordings onto both CDs, so that each CD has 40 or so minutes, vs. the first CD having 80 minutes, and the second CD having 5 minutes. Or, I’d like each CD to have 25 tracks.
The complicating thing is that each recording might vary in length from 10 seconds to 30 minutes. So if (for example) there were 150 minutes of recordings, but a recording in the middle was 30 minutes, we may end up having 3 CDs of recordings instead of 2, because maybe the 30 minute recording came towards the end of the 1st CD and didn’t fit, so it had to go onto the second CD.
In theory, depending on the situation, I’ll have enough recordings to fill up 1 to 5 CDs.
I’m trying to write the code in PHP, but if you just tell me how to iterate through the recordings and determine which CD they go on, I can do the translation to code.
If you can only use two CDs, this is the partition problem, and no efficient algorithm exists, but there are reasonable approximations as described in the link. However, it seems like you’re willing to allow more than two CDs under some circumstances, which could change the problem somehow, but it very likely depends on how exactly you penalize using more than two discs.
Yes, the tracks do have to be in order, and the # of CDs isn’t limited to 2. It depends on how many recordings there are. If there are 161 minutes of recordings, they would definitely have to go onto 3 CDs.
But in that case I wouldn’t want the third CD to have just a solitary 1 minute recording (for example). For a better user experience, I’d want the recordings to be more evenly distributed across the CDs. It doesn’t have to be perfect - I just want to avoid where the last CD just has a small handful of recordings.
There is a dynamic programming algorithm that will solve the problem exactly, but it’s pretty complicated, and probably not worth doing if you just need an approximate solution. There has to be a better heuristic than just fiddling with it, but nothing comes to mind.
How about:
Find minimum number of CDs needed
Find average time per CD
Since the tracks have to be in order, add track times until close to average (±10%?), then split
After this, check times on each CD
If time between CDs varies (±50%?), split tracks earlier/later
If still no solution found, add 1 more CD and try again
There may be problems with long tracks. Maybe you could see if any tracks are longer than the average time per CD at the start, and refuse to work if that’s the case.
What about exhaustive search? Figure out the minimum number of CDs, N, and look at every possible location for the N-1 breaks, keeping the maximum time under 80 minutes. Find the one with the smallest difference between minimum CD time and maximum CD time. Repeat with N breaks, and look at the two best solutions and choose the one you like most.
Horribly inefficient from a computer science standpoint, but should be easy to code, and for N = 5, I’d think it could be run in a reasonable amount of time. Once it’s running, there are probably some easy optimizations you could add if you need to. How often will you need to do this?
ETA: Or you could use the highly efficient 8-track algorithm. Stick the breaks where you want them by fading the song out, then fading it back in on the next disk.
ETA2: For you young dopers, yes, they really did that.
It wasn’t so bad when it occurred at a natural break in the song. I remember that the Magical Mystery Tour 8-track broke “I Am The Walrus” around the two-minute mark, where there is a deliberately abrupt edit in the original recording.
This pretty much sums it up.
Determine min. number of CD`s, X; Average length per CD, Y
Put tracks in order.
Add tracks in order to CD 1 until if you add the next track, total time goes past average Y.
Repeat for CD 2.
For the next CD, do all the tracks fit - if so, done.
If you exceed number of CDs or
If you dont like this layout (too many tracks on last CD, say), repeat above allowing 1 extra track over average on CD1. If that does not work, repeat but put 1 track past average on CD 2 instead. Then 3, then 4; until you like the layout. If no good, try going past average on 2 CDs - 1,2; 1, 3; 1,4…; 2,3;
Or - when the algorithm recalculates the average for remaining CDs after filling firstCD; then recalculate avg for remainging CDs after filling 2 CD`s. etc. to always balance.
Alternatively, if you are going past 2 CD`s, run all algorithm mixes over and over and keep a list of which arrangements have the lowest range (deviation) between CD times.
The Wikipedia article on Bin Packing gives some good, simple heuristics algorithms for approximating an optimal solution. First fit decreasing, best fit decreasing and all that.
I once attended a talk by Coffman about the difficult analysis of various Bin Packing heuristics. He said he went thru 3 grad students. And he was proud of that rather than embarrassed. Ugh.
Because the tracks may not be rearranged, the Wikipedia articles on bin packing and partitioning don’t seem directly applicable.
Here’s how I would do it.
First assign the tracks to ensure you can fit everything: put as many as possible on CD 1, then fill up CD 2, etc. You’ll end up with the last CD partly filled.
Then go backward through the CDs one by one to even them out, each time moving tracks to the immediately previous CD.
Say you have a total of 330 minutes of tracks that fit on 5 CDs. Then your goal for each CD would be 330/5 = 66 minutes. So move tracks from CD 5 to CD 4 until CD 5 is *at or just under *66 minutes.
Then move tracks from CD 3 to CD 4 until CD 4 is *at or just over *66 minutes (while not going over the CD’s limit of 80 minutes, of course).
Repeat, alternating *over *and under, until you get to the first CD. (Alternating over and under helps keep them even when you have a very large number of CDs. Recalculating the goal minutes after each CD, as md2000 suggests, would help too.)
This algorithm should find a good solution in linear time (though not always the ideal one).