Removing duplicates from a list

butler1850 · July 9, 2008, 6:52pm

I’m attempting to generate a list of users from a windows event log (system log on a print server), and though I can export the logs, and pull out a list of the users, due to the high amount of traffic on the print server, there are a large number of entries for each user.

The end result that I’m looking for is to get a list, which represents each user only once, so that I can send a bulk email to the users in my environment who use this particular server.

Currently, the exports go out into a .CSV, but I can export in other formats as well.

I do, however, run into the 65K lines limit in Excel (very busy servers).

Anyone have any thoughts?

control-z · July 9, 2008, 6:59pm

You need a sort program. One I’ve used for years is OptTech sort, it easily sorts a 400 megabyte 1.8 million record database. I’m sure there are free sort programs available too.

Leaffan · July 9, 2008, 8:00pm

You could open the .csv file in Excel and remove duplicate rows like this.

UncleRojelio · July 9, 2008, 8:08pm

PERL, PHP, or even a simple shell script using “awk | sort | uniq < input.csv > output.txt” will work.

UncleRojelio · July 9, 2008, 8:44pm

Here it is is PHP:




//open your files
$inputFile = "InputFile.csv";
$outputFile = "OutputFile.txt";

// create file handles
$ifh = fopen($inputFile, 'r');
$ofh = fopen($outputFile, 'w');

// loop through the input file, stuffing the addresses into and array
while($theData = fgets($ifh)) {
    $theChunks = explode(",", $theData);
    $addressArray[] = $theChunks[0];                                                          
}

// sort the array
ksort($addressArray);

// output the array
foreach($addressArray as $address) 
   fputs($ofh, $address);

// close the files
fclose($ifh);
fclose($ofh);

K364 · July 9, 2008, 11:40pm

I would use a variant on this method - Data/Filter/Advanced filter/Unique Records Only/Copy to another location. Seems simply than selecting/show all/deleting/pasting.

Mangetout · July 10, 2008, 12:18am

I’d import the list into Access and do a quick query - something like this:

SELECT Table1.username
FROM Table1
GROUP BY Table1.username;

then export the results to whatever format was appropriate.

J_Cubed · July 10, 2008, 1:57am

Open the original list and a new file. Go through the list and for each username, think back and remember if you’ve seen it before. If not, type it into the new file. If you have already typed it, don’t type it. If you’re really pressed for time, use your computer’s Copy and Paste functions.

Or what UncleRojelio said.

MrSquishy · July 10, 2008, 2:35am

I think I can save you a few keystrokes:


select distinct username from table1;

I think that would work in Access. I can’t remember if you use “distinct” or “unique” or either one.

LSLGuy · July 10, 2008, 11:57am

If you only need to do this once and the user count is small, even though the record count is large, you could even do the “group by” manually.

Just export the event log as csv, use the Windows comamnd-line Sort app to sort by the user name (enter c:>sort /? for help), then open it in Notepad & scroll to find your distinct users. Notepad will handle millions of lines, and you just lean on the page down key while watching the name column for changes as the records zoom by.

Probably not practical if you have more than a couple dozen users.

Ximenean · July 10, 2008, 1:02pm

Using just Windows command line utilities, and assuming you have an input file containing just the users’ names, like

Roy
Jen
Moss
Jen
Jen
Moss
Rich
Roy
Den
Roy
etc.

You could copy the file to a new temporary directory, change to that directory and do something like


echo bla>blabla.txt
for /f %a in (input.txt) do copy /y blabla.txt %a
dir /b *. > uniquenames.txt

uniquenames.txt would then look like

Roy
Jen
Moss
Rich
Den

Gets tricky if the names have characters like * in them.

Topic		Replies	Views
Purging duplicates from a long list of e-addresses Factual Questions	8	1158	October 19, 2009
Duplicate records in Excel Factual Questions	4	804	May 1, 2003
Help me to Help her. Excel Guru's sought! Formula's needed! Factual Questions	9	835	August 5, 2003
Need to use Access to remove duplicates between 2 lists Factual Questions	1	1624	June 29, 2004
Hopefully a simple Excel question Factual Questions	12	962	February 25, 2007

Removing duplicates from a list

Related topics