Purging duplicates from a long list of e-addresses

I hold and organize various local activities. Over the years my email list has grown to nearly 2000 e-addresses. The list exists as continuous text in an MS Word document – just one address after another separated by a semi-colon and a space.

I suspect that there are duplicates in the pile. How can I find them and eliminate them?

One idea I had was to convert the text list into a table, then sort (alphabetize) the table. That would make spotting duplicates relatively easy, but it would still involve manual work – reading down the whole list – that I don’t want to do. Is there an easier, automated way?

Thanks all, in advance.

Do you have Excel?

If so, just copy the whole list into Excel and use the Remove Duplicates function. One click, done. It’s under the Data tab in 2007, probably somewhere else in the other versions.

I missed the part where you said they’re separated by semicolons. In that case, you need to perform one extra step: Use the excel import wizard and specify the semicolon as a delimiter.

Yes, I have Excel. This sounds perfect, but I can seem to locate the “Remove Duplicates” function. Where is that?

Hmm, maybe that’s a 2007-only function.

If you have 2003, there’s a similar procedure:
http://www.lytebyte.com/2008/10/30/how-to-remove-duplicates-in-excel-2003/

Thanks for all your help, Reply!

I miss Unix:

$ cat addressfile | sort | uniq >NoDupAddressFile

There’s a few problems with that… first, you’ve got a redundant cat and uniq. Second, e-mail addresses are case-insensitive. A better approach:

$ sort --ignore-case --unique addressfile > NoDupAddressFile

Next Week on MST3K – The Human Un-Duplicators!!