Removing duplicates from a list

I’m attempting to generate a list of users from a windows event log (system log on a print server), and though I can export the logs, and pull out a list of the users, due to the high amount of traffic on the print server, there are a large number of entries for each user.

The end result that I’m looking for is to get a list, which represents each user only once, so that I can send a bulk email to the users in my environment who use this particular server.

Currently, the exports go out into a .CSV, but I can export in other formats as well.

I do, however, run into the 65K lines limit in Excel (very busy servers).

Anyone have any thoughts?

You need a sort program. One I’ve used for years is OptTech sort, it easily sorts a 400 megabyte 1.8 million record database. I’m sure there are free sort programs available too.

You could open the .csv file in Excel and remove duplicate rows like this.

PERL, PHP, or even a simple shell script using “awk | sort | uniq < input.csv > output.txt” will work.

Here it is is PHP:




//open your files
$inputFile = "InputFile.csv";
$outputFile = "OutputFile.txt";

// create file handles
$ifh = fopen($inputFile, 'r');
$ofh = fopen($outputFile, 'w');

// loop through the input file, stuffing the addresses into and array
while($theData = fgets($ifh)) {
    $theChunks = explode(",", $theData);
    $addressArray[] = $theChunks[0];                                                          
}

// sort the array
ksort($addressArray);

// output the array
foreach($addressArray as $address) 
   fputs($ofh, $address);

// close the files
fclose($ifh);
fclose($ofh);


I would use a variant on this method - Data/Filter/Advanced filter/Unique Records Only/Copy to another location. Seems simply than selecting/show all/deleting/pasting.

I’d import the list into Access and do a quick query - something like this:

SELECT Table1.username
FROM Table1
GROUP BY Table1.username;

  • then export the results to whatever format was appropriate.

Open the original list and a new file. Go through the list and for each username, think back and remember if you’ve seen it before. If not, type it into the new file. If you have already typed it, don’t type it. If you’re really pressed for time, use your computer’s Copy and Paste functions.

Or what UncleRojelio said.

I think I can save you a few keystrokes:


select distinct username from table1;

I think that would work in Access. I can’t remember if you use “distinct” or “unique” or either one.

If you only need to do this once and the user count is small, even though the record count is large, you could even do the “group by” manually.

Just export the event log as csv, use the Windows comamnd-line Sort app to sort by the user name (enter c:&gt;sort /? for help), then open it in Notepad & scroll to find your distinct users. Notepad will handle millions of lines, and you just lean on the page down key while watching the name column for changes as the records zoom by.

Probably not practical if you have more than a couple dozen users.

Using just Windows command line utilities, and assuming you have an input file containing just the users’ names, like

Roy
Jen
Moss
Jen
Jen
Moss
Rich
Roy
Den
Roy
etc.

You could copy the file to a new temporary directory, change to that directory and do something like


echo bla>blabla.txt
for /f %a in (input.txt) do copy /y blabla.txt %a
dir /b *. > uniquenames.txt

uniquenames.txt would then look like

Roy
Jen
Moss
Rich
Den

Gets tricky if the names have characters like * in them.