[QUOTE=Stoid]
I have to work with some transcripts prepared by a court reporter. They are .txt, and he tells me they are somehow “ASCII” related. All I know is that they are a NIGHTMARE to try and work with, change, convert… he does everything in all caps, horrible font, weird line breaks.
Help!
[/QUOTE]
I feel your pain, but alas! despite what others have said here, I doubt anyone could help you much without at least part of the file to look at. I think, from your description, that you do not have enough of a background in computers to fix the problem yourself, and the court reported probably doesn’t either. In this situation, it’s rather like trying to both diagnose and fix your car by looking at a picture of the car and listening to a recording of it running!
In other words, not easy.
Here’s my educated guess:
The court reporter recorded the original notes in a computer system with its own file format. He then output these files naming them “.txt”. They may or may not contain plain text without formatting; it is hard to know without looking at the original file.
You then opened these files on a Macintosh using an editor that you do not mention. This editor may expect a particular file format, or may not. Again, without knowing which editor you used, I can not say.
This editor may be interpreting characters in the file as “formatting” characters, and thus display the file weirdly, with capitalization, fonts, and other info that is really not there.
So, you may have files that are perfectly fine, but you aren’t looking at them the right way.
Let me clarify.
Microsoft Word is a well-known document editor for both PC and Mac. You can format with fonts, put in line breaks and tables, and so forth. This formatting is stored in a Word file (.doc) as special characters that do not show up on your screen and do not print, yet they are there in the file itself.
If I were to edit a .doc file in another editor that knew nothing of Word, I would see strange behavior. The editor might treat the special characters as formatting, or might assume that they are text. I don’t think this second editor would give you the same result that Word would.
Conversely, with a bit of thinking I could make a file out of characters plus other “stuff” that, if I named it “.doc”, would display strangely in Word.
Remember that to a computer, everything in a file is, in the end, 0s and 1s. By convention (and this is a simplification that assumes only countries that use a Roman alphabet), each set of 8 0s and 1s is called a byte. An editor program assumes that each byte represents a character, in a “code”. A byte can represent 256 codes, so you have a limit of 256 characters.
Almost all programs across all operating systems use the 8-bit ASCII coding scheme. All of the codes have a printable character associated with them, although some are also used for old-style control codes. An editor program should be able to read these codes out of the file and display them. The issue, though, is whether or not the program has the right fonts to do this.
Sigh. Even if a program understands all 256 codes, it has to be able to map them to something you understand. It has to map the ASCII code for lowercase “a” to something in a font file that will put lowercase “a” on the screen and also on a printer.
All the “.txt” ending on the file says is “interpret this file as a set of ASCII characters”. You put in all 0s into a .txt file and the editor will open it and display weird characters, because the ASCII code of 8 0s maps to a character that usually does not display in any font.
This only scrapes the surface of what is going on. A good SW engineer or tech writer could look at the file and figure out what’s wrong, I am sure. He or she might even be able to fix it automatically. I’m a tech writer and I’ve done stuff like this before.
Alas, I have no way of contacting you, or you me.