MacOS software to detect duplicate image files even of different formats?

Hi everyone.

Like many amateur photographers, I have an external SSD drive full of images. Since I haven’t always the very most diligent of archivists, the filing is… suboptimal. There’s a mix of at least three different filetypes (RAF from my Fuji cameras, NEF from my Nikons, and JPG). Some of the JPGs are final export versions of the raw files, but some aren’t. In many cases the camera captured both RAW and JPG versions of the image, and they all ended up on the drive. Some of the folders are backups of other folders*. There are multiple “work in progress” versions of the same image in many cases. In short, it’s a mess.

In total, MacOS tells me there are ~22,000 files in the whole folder structure, but I imagine that once the duplication is taken care of, the number of actual images will be a lot less.

What has Brough this to a head is that I have moved away from Adobe subscriptions and will be using CaptureOne instead (initially, the free version). It works in a similar way to LightRoom in that it maintains a database of the actual location on disk for images. I’d like to get the folder structure tidied up before running any imports, so the CaptureOne catalogue doesn’t contain needless duplication.

What I’m interested in is whether there exists automated software that can run through the entire folder structure and eliminate duplicates leaving only a set of individual images. I think that my requirements are:

  • Where there is a matching pair of RAW and JPG images, keep both images. They may not be in the same folder, since I went through a phase of splitting files into “RAW” and “JPG” folders.
  • Where there are identical RAW / JPG pairs (i.e. identical copies of both the RAW and the JPG image) then only keep one pair. This is most likely to occur when there are folders that are backups of other folders.
  • Where there is only a JPG version of a photo, keep it.
  • Where there are multiple versions of a JPG photo, only keep one version.
  • Ideally, matching would be based on the contents of the file itself rather than the filename.

*The SSD in question is a consolidation of a lot of other sources, that included backups, and is itself backed up. I realise the comment above could be read as saying that the SSD is a primary backup, but it isn’t. I won’t be losing anything that isn’t saved elsewhere.

Thanks!

Checking for actual contents, especially between different filetypes, isn’t easy. You could find files that are exactly the same (same bits), but I’m not sure that would quite get what you want.

However, It’s quite likely that Lightroom will keep the metadata for your photos, like when it was taken, with what camera settings, and so on. If so, those would likely be unique, and you could search that way.

Searching online, I did find a free plugin for Lightroom (classic, I believe) which searches the metadata in your Photo library and marks duplicates, and the site includes a tutorial how to delete them. I think what it describes would fit what you want., as long as you go to the Rules tab and check “file type” so it will only count duplicates if they are the same file type. (You might also have to experiment with other settings on that tab.) That would make sure that JPGs would not be counted as duplicates of RAWs and vice versa.

That said, I can’t test it. Not only do I not have Lightroom, but I don’t have your type of RAW files or anything to test it on. However, since it makes it clear it cannot delete files on its own, and you have backups, it seems that it would be a useful thing to try out.

http://www.bungenstock.de/teekesselchen/doc/v1/en/tutorial.php

…a quick google came up with these possible options.

Looking at them all they seem to all be capable of doing what you want to do? I suspect there would be a lot of manually selecting what images you want to keep or go though as it looks like your requirements (for example picking one of many versions of a JPG) isn’t something that could be automated unless you have more specific criteria on which version of the JPG you wanted to keep?

…sorry, ignored the MacOS part of the equation. Maybe this one?

Thanks for all the suggestions. In case it’s useful for anyone in the future, I found an app called Gemini 2 on the App Store which does most of what I need. :slight_smile:

I have been very pleased with PhotoSweeper Lite.