Excel/Access non-display character?

I have an .xlsx file containing just a bit of header and the sample cell that has the non-display character. (i.e., it’s a 2x2 table.) I’ve emailed it to myself, and can email it to anyone who wants a crack at it.

I’m confused - was the original file .csv or .xlsx ?? Does the mystery character already exist from the beginning, or does it appear at some intermediate stage?

The original file is .xlsx. The NDC is also in the .csv file. Then it’s in Access when I import it. When I export to .txt, the double-quotes and spaces are in that file.

Man it’s hard to follow you. Let’s just talk about what you can SEE, not mystery characters that you infer must be there because of behavior.

When you export the file from Excel to CSV, and look at the CSV file in Notepad, are the quotes in the CSV file?

I had some trouble like this with an Excel file I exported to CSV - it turned out the reason I couldn’t see it in the Excel file was that certain cells contained line feed characters - that is, the cells contained more than one row of text - because the data in that cell had been copied and pasted into place from somewhere else - and the copy operation had accidentally spanned two rows of text

To find the problem, I opened the file in Excel, dragged the height of the formula bar so it could show more than one line of text, then moved the cell highlight around the worksheet until I discovered the double row in one cell.

The other case I had to deal with lately caused an import error on a CSV file - I’m not sure where the file originated, but it was broken by the presence of a single " character at the start of one of the comma separated values - it was like this:

123,abc,987,def
101,"bcd,012,iii
121,rrr,000,aaa

Because the CSV format typically uses double quotes to encapsulate text fields containing commas, the double quote was intepreted as the start of a single text value - and because there wasn’t another closing quote, the entire rest of the file was imported into a single cell.

It sounds like the problem in this thread is a bit different from this, but maybe not that different - I also have used Notepadd ++ to find these things in the raw CSV file. There’s a ‘portable’ version of Notepad++ that runs from a USB drive with (I think) no admin privileges necessary - although if your organisation has locked down software installation, they may also have locked down running executables from removable media (or banned it)

In Excel it looks like this:


Sayaka Hibachi & Sushi Bar

Note that there are no double-quotes or extraneous spaces.

I copied the cell and tried pasting it here. It looks like this:


"Sayaka Hibachi & Sushi Bar			"

This is also how it looks when I open the file in Access, and when I export as a .txt file. Note the double-quotes and the spaces.

So I don’t see the characters, but I can see that they are there because of what they do to the data.

This is what I see when I open the .csv file in Notepad:


SAH900,"Sayaka Hibachi & Sushi Bar			"

I think we may be chasing a wild-goose here.

I don’t know that there actually are any extra non-displayable characters here.
That 2nd version (enclosed in quotes and including trailing spaces) is the proper way such an Excel field should display in a .csv file.

There has to be something there, otherwise there wouldn’t be a problem. And if there was a problem, I suspect it would affect all of the data instead of only a half-dozen or so cells.

Ok, that helps. The quotes appear when you export from Excel to CSV. That’s what I asked in post #8 and in post #9 you denied it. There’s certainly nothing wrong (from a view of conformance with the rather loose CSV specs) with Excel adding quotes whenever it wants. Your desire to avoid quotes is one more example of the fact that you’re using the wrong tools for this job.

But anyway, back to the problem. It looks like Excel must be quoting either because of the ampersand (which you denied but I don’t know whether to believe) or the three tab characters at the end (which you said you tried to deal with in post #15). Are there tabs at the end of the field in the original xlsx file? If you manually delete them, does that field render without quotes in the csv?

They’re also there if I copy and paste into Notepad. So it’s not a function of exporting to .csv. If I copy and paste-as-text into a new spreadsheet, the quotes and spaces are not visible; but if I click on the cell there are extra spaces. Find & Replace does not find them.

These are the tools I have.

No, I have ampersands in every file I work on. The file in question has ampersands in other fields, and they do not have quotes and spaces.

If I click on the cell and manually delete the three spaces at the end, then go to the beginning of the text and click Delete, and then copy into Notepad, then there are no double-quotes or extra spaces. But this doesn’t do me any good, as I cannot see which fields have the problem and the majority of the records that don’t.

As I said, I have a sample Excel file I can email if anyone wants to look at it.

Since tabs themselves can be delimiters, it does seem likely that Excel is quote-wrapping them to keep them inside the single exported cell.

Access and Excel have /always?/ had different rules for importing and exporting text: If you import then export it comes out different. In the case of Access, this was because import was handled by the database engine, and export was handled by a code module.
So what you describe never needed a non-display character: just commas, quotes, spaces and CR and LF characters.

Doesn’t do you any good? You’ve just solved 90% of your problem. The issue is that some fields have tabs in them. Now you just need to figure out how to delete the tabs. I would have thought that the Replace Special you tried in post #15 would have done it. Are you sure you did that correctly?

Holy moly, I’m glad I don’t have to deal with bug reports like this on a regular basis. I asked if quotes appear when you export to csv, and even though you know that they do, you said “no” because they also appear in other circumstance. Geez.

I already knew that. I thought the problem was clear in the OP that there’s something in the field that shouldn’t be there.

Is there some reason why choosing a special character other than Manual Line Break, which I use all the time, would be done differently from Manual Line Break?

I have the .csv file open right now. There are no double-quotes or extra spaces visible. So no, the quotes don’t appear when I export to a .csv file, when I open the .csv file in Excel. You can see them if you open in Notepad, but when I answered your question I opened the .csv file in Excel and they were not visible.

If it annoys you so much, then you don’t have to answer.

I un-zipped the .xlsx file and examined the “sharedStrings.xml” file contained there. It definitely for-real has Tabs in there (UTF-8 code U+0009), as suspected. The line looks like



<t xml:space="preserve">Sayaka Hibachi & Sushi Bar□□□</t>


where the white squares are actually tabs in the file.

Therefore, whatever you are doing to get rid of white space at the end of lines isn’t working.

Ok, sorry, you are apparently rather less technically knowledgeable than I thought. You can’t tell whether there are quotes in the CSV file by opening it with Excel. Excel will process and remove the quotes when it displays it.

Anyway, in your OP your question was, what characters are causing this? The answer has been found, they are TABs.

Also, you said earlier “I cannot see which fields have the problem”, but in fact you can see that, by opening the CSV file in Notepad and looking for the fields that have quotes. If you can’t figure out how to automatically remove the tabs, you can find the offending fields that way and remove the tabs manually.

Thanks for looking at that. I’m not doing anything to get rid of white space. It hasn’t been an issue before. I did try that earlier (post 15), but it changed all of the spaces to pipes. Word Special => White Space does not differentiate between spaces and tabs. I also tried Special => Tabs, but it didn’t find any. Same with Nonbreaking Space.

I’m not familiar with UTF codes (UTF-8 code U+0009). Is there a way to search for and replace them? Again, I connect to my PC via a VPN connection on my Mac, and I can’t do ALT codes from here. That’s why I use Word to get rid of ALT+010 when I’m telecommuting. (In the office I’m on a PC so I can just do a Find & Replace on ALT+010.) So I guess two questions: How do I search for and replace the characters when I’m connected remotely, and how do I do the same when I’m in the office? I could email the file to myself and fix it on my Mac, but I’m reluctant to have files on my personal computer. Nevertheless if it’s something that has to be done that way when I’m working from home, I should know how to do that.

Thanks again!

Sure, if you nuke all white space (which includes spaces, form feeds, line feeds, carriage returns, tabs, and possibly many rarer others*), the ones in the middle of your lines will also disappear. So you only want to eliminate space at the end of lines— except, apparently, some of them (line feeds) you want to get rid of or turn them into something else no matter where they appear. To do all of this automatically, you will have to cook up an appropriate regular expression search-and-replace. I have no idea how to do that in Word or Excel, but likely someone else does. If it is a one-off or occasional issue, then you could also simply delete the characters manually, as suggested.

  • Wikipedia has

U+0009  character tabulation
U+000A  line feed
U+000B  line tabulation
U+000C  form feed
U+000D  carriage return
U+0020  space
U+0085  next line
U+00A0  no-break space
U+1680  ogham space mark
U+180E  mongolian vowel separator
U+2000  en quad
U+2001  em quad
U+2002  en space
U+2003  em space
U+2004  three-per-em space
U+2005  four-per-em space
U+2006  six-per-em space
U+2007  figure space
U+2008  punctuation space
U+2009  thin space
U+200A  hair space
U+200B  zero width space
U+200C  zero width non-joiner
U+200D  zero width joiner
U+2028  line separator
U+2029  paragraph separator
U+202F  narrow no-break space
U+205F  medium mathematical space
U+2060  word joiner
U+3000  ideographic space
U+FEFF  zero width non-breaking space

NB you should be able to cut and paste weird characters from the Mac or Windows Character map, if you cannot otherwise type them in. I think.

Yes, if there are manual line breaks, I turn them into pipes so that I can separate them into their own fields (Text-to-Data in Excel). This specific issue only happens with one company, in its Customer and Aging files (one each), once a month, and only on a few records. I’ve been fixing it in Notepad, but it’s an annoyance and I was hoping to be able to do a global change.

I found a page that says you can search for any character in Word if you know the ASCII code by entering a pipe followed by the ASCII code in the search box. I tried entering |09, but it didn’t find any. I tried |65, which was the example on the page, and fond the 'a’s. Is there a different ASCII code I can try?