A client has a new ERP system, so their data has changed. In one field there are pairs of non-display characters. One is a CRLF, which is alt+010. These are easily changed to pipes. The other character, which precedes the alt+010 character, is a mystery. I imported the file into Access, and these mystery characters caused a new record each time they were encountered. (i.e., the record was split.)
I can import the resulting file into Excel and move things around manually so that I have single records for each account, in which each field has its own column. But that’s a pain. It would be much easier if I could change these mystery characters into pipes, and use the pipes in Text-to-columns to separate the fields.
Does anyone have any idea what the alt code for these mystery NDCs might be?
I’m working from home today, so I’m on a Mac. You can’t make alt characters on a Mac, and they’ve cracked down on Internet usage at work so I can’t come to SDMB when I’m in the office. But I asked a coworker to try the alt+013.
I assume she did it correctly, as occasionally I’ll ask her to change the alt+010s when I’m offsite. I checked the file, and the little boxes (NDCs) are still there. So they’re apparently not 013. Maybe an EOR/EOL?
Is the client’s new system Windows-based? If so, I’m going to add to the pile-on saying that the first character almost has to be a CR.
You probably don’t see it on your Mac, because all Unix systems, as well as Mac-OS systems (after version 9), only use a LF character as a Line Separator. Windows uses CR+LF.
They’ve become very strict about downloading. Any downloads must be done by out outside IT guy, and he’s expensive. The person who gives the OK for computer work doesn’t know much about computers or data files, so it would be difficult to get approval.
Is there another alt code for CR? alt-013 doesn’t work.
If someone wants to take a look at it, I’ve made a sample file that I can send if someone wants to PM me.
To elaborate – AFAICT, it’s the “find” part that’s failing, due to the fact that, at least for me, any attempt to enter Alt-013 in a user interface (e.g., “Find” box in an editor) results in an actual CRLF being entered (i.e., it’s as if I press the Enter key)
Anyone know a windows shell equivalent to sed or awk that will allow running the file through a filter that will catch and delete/replace instances of Non-printable characters like this? Because this is basically what he needs (and I’m not good enough at Windows Shell programming / Excel VBA(?) to pull it off!)
Agreed (and in fact that’s what I used to determine that it really was a CR) – however Johnny has indicated (see post #9) that he cannot install any new software on the work PC, therefore my shout-out to anyone who may know how to work around this limitation by using Windows Shell built-ins.
Stock Windows doesn’t really have much in the way of command-line tools. It may be better to ask why CR/LF is in the original file to begin with. If they’re there to indicate an actual line break, then the final display app should be programmed to behave appropriately.
I’ve seen Johnny’s sample data. Without going into too much detail, yes it probably was a Line Break originally, but the several “lines” from the original field need to be merged for Data Mining purposes.
Agreed that the best place to nuke the Line Break would have been during data acquisition; but that’s probably spilled milk at this point…
The data mining app should be able to ignore CR/LF or, better, treat it as whitespace. I’m sure whatever tool is being used to process these records (Oracle?, DB2?, generic SQL?) could also convert it itself.
I’ve asked the client a couple of times if it’s something they can fix on their end. I’m dealing with the credit manager and not an IT person, and she’s come back to say she can’t fix it. I don’t know what system they’re using.