Background:
I have a huge Excel spreadsheet with something like 5000 rows and 70 columns packed with dense data (text and numbers, no formulae). That was an extract from a data source about 3 months ago. I am just getting another huger spreadsheet current as of yesterday. I am supposed to transform this data in several ways and even upload a portion of it back to another database.
Is there a easy way to compare these two Excel sheets, row by row, highlighting cells that are not the same? I can then work on only those (changed) cells, hugely saving my time.
Is there an easy way to accomplish this? I am no Excel or VBA expert, but can handle small macros or VBA procedures. Any 3rd party tools available?
Paste the first extract in a new workbook, Sheet1.
Paste the second to Sheet2.
On Sheet3 in A1, type “=Sheet1!A1=Sheet2!A1”.
Copy that to all the cells that apply. Any that aren’t exactly the same will return false. Use conditional formatting to make the bad results stand out.
You could use “=if(Sheet1!A1=Sheet2!A1=true,0,1)” which will return a 1 if true and a zero if false. Sum the rows and use autofilter to only see rows that with variant records.
Or return the variant data by changing the “,1)” to “Sheet2!A1”.
I use something like:
=if(Sheet1!A1<>Sheet2!A1,“Different”,"")
That way, the cells that are the same are blank. It just makes it easier to find the different cells. Though conditional formatting would do the same thing, just one more step.
Can you guys expand on this a little? Do these commands compare the exact same cell in both sheets, requiring that both sheets be sorted the same beforehand? If the second sheet is “huger”, aren’t whole sections of it going to be different from the first one? I usually do some sort of lookup and then an +EXACT(x,y) so I know I am comparing info on the same key data point, if that makes sense.
But again, it means you are comparing the exact same rows. I’m concerned that the OP said the second file was “huger” so it may mean they no longer line up…?
Could be. Could be that the size difference is due to appended rows. OP asked for cell to cell comparison, so I went with a way to check that.
Could be that a concatenated match or something is required, due to a lack of unique IDs and no good way to sort the records. It’s impossible to know the requirements from the average message board Excel query, so I tend to just guess a simple solution and see if that works. Often the OP never comes back because the suggested solution worked, a solution was found elsewhere, the project got chucked, the OP figured it out on her own, etc.
Start by identifying a unique value, such as invoice number. If there are no unique values, you may have to create one, for example, take invoice number and item number and concatenate them into one cell (=concatenate(a2,’-’,b2) will give you the value in A2, a dash, then the value in B2 A2-B2. Then you have a unique value for every item in every invoice. Do that to the other sheet, using the same values. Then run a V-lookup on the unique value to the column you’ve created.
Do the V-lookup on both sheets, not just the larger one, so you can tell if there’s any info on the smaller sheet that doesn’t match. People too often only check the sheet they’re expecting discrepancies in, and forget that there can be errors on both sheets.
Thanks to all responders. The second file contains a few hundred additional rows, including duplicates in what I expected would be unique values (employee ID). Since this extract is from a Lotus Notes non-relational database, anything goes.
Now I was looking to simplify my work by comparing matching rows (based on employee ID) and seeing if any cells were changed. Now since rows contain duplicated identifiers this simple scheme will no longer work.
As mentioned, you can concatenate fields to create a unique ID. Employee number and date/time is a typical method. It’s simple: =A2&B2 combines the contents of the references.
People have also mentioned VLOOKUP. I prefer OFFSET and MATCH. Either way the methods mentioned already will work. I’ll use VLOOKUP because people know it better:
…Returns a true or false. I’d suggest that you determine a concatenated identifier, add the column to both sets of data, copy BOTH sets of IDs in a single column to another sheet, and then remove any duplicates, thereby ensuring that you’ve got them all. You’ll get an N/A if the ID does not appear in one of the sets. You could add IF plus ISNA to add a flag telling you which set is missing the record.
All this assumes that it is indeed possible to create a unique ID. In my experience, it always has been.
Simply copy both lots of data to one sheet, sort them by employee ID and then use the function. You will now have only one row for the unchanged ones and multiple rows for the changed ones. The next step depends on what you are actually trying to output.