I’d imagine someone would have programmed one of these by now. I can’t seem to find any decently made. I’d imagine one which did a bit by bit comparison of files with say… a 95+% similarity would be extremely accurate(as well as taking forever).
The problem is that there may be no bit similarity between MP3s of the same song that are at different bitrates or CBR vs. VBR. The same problem comes up with identifying anonymous MP3s. Generally, the only way to do it properly is to decode the first x seconds of the song and convert that into a checksum-like signature. Then you can compare all of the signatures to check for duplicates. (Or look them up in a database to identify the MP3)
I can’t find any software that explicitly checks for duplicates in this way. The MusicBrainz TRM Generator will create signature files, but it doesn’t use them for anything.
Even if they at the same bitrate, if they were made with different encoders they will not be bit-for-bit identical.
Th best strategy, I believe, would be for a program to look for similarities in the file name.
There was an fdupes program for the PC many years ago, that ran on DOS. IIRC it only found files that were exactly the same with the same file name, though.
Well considering i’m a Biotech major and i’m barely going to take my 2nd programming class for my computer science minor next quarter I don’t think i’m qualified juuuusst yet…
I was discussing this with a programmer friend of mine. Now I realize how stupid bit by bit comparison would be. But he told me about a program that compared the mp3’s against an online database with approximately 90% accuracy. But alas, he forgot what it was.
Question still stands, anyone know of a mp3 duplicate pruning program?
ok, so I looked up smack fu’s linked program. It seems their method would work nicely to prune mp3’s… but it doesn’t seem to actually do it. anyone know of anything that does?
You can download a program called Media Jukebox, it will scan your hard drive and create a catalog by album, artist, genre etc. I believe it reads the ID3 tags in the file to determine aritist, album and genre. This may be close to what you’re looking for.