Unidentified non-display characters

A client has a new ERP system, so their data has changed. In one field there are pairs of non-display characters. One is a CRLF, which is alt+010. These are easily changed to pipes. The other character, which precedes the alt+010 character, is a mystery. I imported the file into Access, and these mystery characters caused a new record each time they were encountered. (i.e., the record was split.)

I can import the resulting file into Excel and move things around manually so that I have single records for each account, in which each field has its own column. But that’s a pain. It would be much easier if I could change these mystery characters into pipes, and use the pipes in Text-to-columns to separate the fields.

Does anyone have any idea what the alt code for these mystery NDCs might be?

Thanks.

You say that this mystery character always precedes alt+010? It’s probably a CR.

Using your “alt” terminology:
CR = alt+013
LF = alt+010

CRLF = alt+013, alt+010

CR and LF are different characters.

I’m working from home today, so I’m on a Mac. You can’t make alt characters on a Mac, and they’ve cracked down on Internet usage at work so I can’t come to SDMB when I’m in the office. But I asked a coworker to try the alt+013.

I assume she did it correctly, as occasionally I’ll ask her to change the alt+010s when I’m offsite. I checked the file, and the little boxes (NDCs) are still there. So they’re apparently not 013. Maybe an EOR/EOL?

Yes, the NDC always precedes the 010.

If you’re on a Mac with OS X, then you have the full power of the Unix command line. Perform a hexdump on the file and figure it out from there.

od -x filename

will get you output in hexadecimal. Here’s an ASCII table if you don’t have one handy: http://www.asciitable.com/

I don’t know how to do a hex dump.

The file resides on my PC at the office. I connect using RDC. So I don’t actually have the file on my Mac.

I too will vote for the mystery character being a Carriage Return (CR).

If they’re coming in pairs and the second one is a Line Feed (LF), then it’s your most likely culprit.

Download and install a freebie text editor that will show you a hex dump of the file. I use PSPad at work.

Is the client’s new system Windows-based? If so, I’m going to add to the pile-on saying that the first character almost has to be a CR.

You probably don’t see it on your Mac, because all Unix systems, as well as Mac-OS systems (after version 9), only use a LF character as a Line Separator. Windows uses CR+LF.

They’ve become very strict about downloading. Any downloads must be done by out outside IT guy, and he’s expensive. The person who gives the OK for computer work doesn’t know much about computers or data files, so it would be difficult to get approval.

Is there another alt code for CR? alt-013 doesn’t work.

If someone wants to take a look at it, I’ve made a sample file that I can send if someone wants to PM me.

Here’s a Windows command line hexdump program.

Here’s an Internet hexdump program. If you feel you can upload your file to some website.

Give. E-mail address should be in my profile. If you can’t see it either PM me or post here.

ETA: Just checked, not it isn’t in my profile.

PM being composed as you read this.

Email sent.

Thanks!

Thank you, Noone Special for looking at the file.

Noone Special says the NDC is a CR alt-013, but Find & Replace (in Excel) isn’t finding and replacing.

To elaborate – AFAICT, it’s the “find” part that’s failing, due to the fact that, at least for me, any attempt to enter Alt-013 in a user interface (e.g., “Find” box in an editor) results in an actual CRLF being entered (i.e., it’s as if I press the Enter key)

Anyone know a windows shell equivalent to sed or awk that will allow running the file through a filter that will catch and delete/replace instances of Non-printable characters like this? Because this is basically what he needs (and I’m not good enough at Windows Shell programming / Excel VBA(?) to pull it off!)

For this kind of work, I’d just install Cygwin on the Windows machine and use the standard *nix tools. Cygwin is a port of the *nix environment (command line and graphical) to Windows, so you get everything needed to do this in a reasonable fashion.

Agreed (and in fact that’s what I used to determine that it really was a CR) – however Johnny has indicated (see post #9) that he cannot install any new software on the work PC, therefore my shout-out to anyone who may know how to work around this limitation by using Windows Shell built-ins.

Stock Windows doesn’t really have much in the way of command-line tools. It may be better to ask why CR/LF is in the original file to begin with. If they’re there to indicate an actual line break, then the final display app should be programmed to behave appropriately.

I’ve seen Johnny’s sample data. Without going into too much detail, yes it probably was a Line Break originally, but the several “lines” from the original field need to be merged for Data Mining purposes.
Agreed that the best place to nuke the Line Break would have been during data acquisition; but that’s probably spilled milk at this point…

The data mining app should be able to ignore CR/LF or, better, treat it as whitespace. I’m sure whatever tool is being used to process these records (Oracle?, DB2?, generic SQL?) could also convert it itself.

I’ve asked the client a couple of times if it’s something they can fix on their end. I’m dealing with the credit manager and not an IT person, and she’s come back to say she can’t fix it. I don’t know what system they’re using.