Hi everyone.
Like many amateur photographers, I have an external SSD drive full of images. Since I haven’t always the very most diligent of archivists, the filing is… suboptimal. There’s a mix of at least three different filetypes (RAF from my Fuji cameras, NEF from my Nikons, and JPG). Some of the JPGs are final export versions of the raw files, but some aren’t. In many cases the camera captured both RAW and JPG versions of the image, and they all ended up on the drive. Some of the folders are backups of other folders*. There are multiple “work in progress” versions of the same image in many cases. In short, it’s a mess.
In total, MacOS tells me there are ~22,000 files in the whole folder structure, but I imagine that once the duplication is taken care of, the number of actual images will be a lot less.
What has Brough this to a head is that I have moved away from Adobe subscriptions and will be using CaptureOne instead (initially, the free version). It works in a similar way to LightRoom in that it maintains a database of the actual location on disk for images. I’d like to get the folder structure tidied up before running any imports, so the CaptureOne catalogue doesn’t contain needless duplication.
What I’m interested in is whether there exists automated software that can run through the entire folder structure and eliminate duplicates leaving only a set of individual images. I think that my requirements are:
- Where there is a matching pair of RAW and JPG images, keep both images. They may not be in the same folder, since I went through a phase of splitting files into “RAW” and “JPG” folders.
- Where there are identical RAW / JPG pairs (i.e. identical copies of both the RAW and the JPG image) then only keep one pair. This is most likely to occur when there are folders that are backups of other folders.
- Where there is only a JPG version of a photo, keep it.
- Where there are multiple versions of a JPG photo, only keep one version.
- Ideally, matching would be based on the contents of the file itself rather than the filename.
*The SSD in question is a consolidation of a lot of other sources, that included backups, and is itself backed up. I realise the comment above could be read as saying that the SSD is a primary backup, but it isn’t. I won’t be losing anything that isn’t saved elsewhere.
Thanks!