Is there a way to condense machine-captured words into paragraphs rather than having to individually do it by hand?

Been doing some freelance paid work, which involves transcribing YouTube speeches into edited paragraphs. (AI will surely take this job at some point, but for now isn’t accurate enough, especially when speakers have thick accents or use unusual vocabulary.)

YouTube already has a machine-to-text transcript function. The problem is, 90% of my work involves condensing it from many short lines into a solid paragraph.

Giving one example: Suppose the speaker says, “Healthcare is really screwed up in America. I had to pay $17,000 for an appendectomy, and that was with insurance. Can you imagine what it would be like if I were uninsured?”

The YouTube function captures it like this:

00:01 Healthcare is really
*00:02 screwed up in *
00:03 America I had
00:04 to pay 17000
00:05 for an
*00:06 appendectomy and *
00:07 that was with
00:08 insurance
00:09 can you imagine
00:10 what it would
00:11 be like if
00:12 I were uninsured

So, the vast majority of my work involves backspacing and re-arranging until it’s a proper paragraph - “Healthcare is really screwed up in America. I had to pay $17,000 for an appendectomy, and that was with insurance. Can you imagine what it would be like if I were uninsured?”

I’m hoping there is some simpler automatic or quick-shortcut way to get this done rather than endless manual backspacing, especially since I may have to handle 4,000 - 7,000 words on some days. Anyone know of any tricks to automatically “paragraph-ize” a bunch of words?

How about this?

To expand on that general idea a bit, since the stuff you want to delete will always be in the same format at the beginning of a paragraph, you could search for this:

^p^#^#:^#^#

and replace it with nothing. The ^p represents the paragraph marker at the end of the previous paragraph. The ^# represents any single numeral, and the : is, of course, literally the colon in the middle of the numbers. Do a “replace all,” and they’re gone pretty much instantly, and you’ll end up with everything in one big paragraph. Then you’ll have to go back through it and “reparagraph” where necessary (or if necessary), but I wouldn’t expect that to be nearly as odious a task as deleting all that stuff manually.

And if some of the lines start with an asterisk, as we see in your example, do a second find-and-replace, adding an asterisk between the ^p and the first numeral.

This assumes the text won’t have anything in that same time-stamp format that should be kept.

This sounds like genius, but I’m not fully sure what you mean. Do you mean Microsoft Word, using the Search/Replace functions? I tried that but it only reads words literally, like, literally ^p^#^#:^#^# , so I’m sure I am misunderstanding…

Yes, it’s the standard search and replace function in Word. You should be able to just copy and paste that line of codes into Word’s search box, click Replace All, and Bob’s your uncle.

If it doesn’t work, there’s one modification I would suggest. Instead of the usual ^p code to search for a hard return (this is what you get when you hit the Enter key), try ^l for a soft return (you get this if you hit Shift-Enter). The difference is that the latter starts a new line of text, but Word treats it as a continuation of the previous paragraph.

Aha, that did work! Thanks. The P didn’t but the I did.

Glad to be of help!

I see you already found a solution!

I was going to say that I would be using macros to repeat certain commands over and over again. Personally, I would use a text editor to edit text files, not Word (but I realize that at some point you just use what you already know to work). E.g., in “vim” typing vip:norm ^dW will delete those 00:03 timestamps from the beginning of a block of lines, and vipJ will join the lines. Obviously you would assign such commands to macros so as not to type them out each time.

Better to do it in Word, but ChatGPT did handle the formatting of your example fine when I asked it to remove timestamps and present it as a paragraph. In case you need another option.

Thanks. I tried it out and it worked as you advertised. I’m sure ChatGPT has a text limit but I think i can spit long columns into it now, great.