Audio editing

Wow. So I’m jumping on the podcast bandwagon and let me tell you I never realized just how painstaking editing audio can be. I listen to my audio and cringe as I consider just how many coughs or throat clears or umms I make. Oy.

– IG

I edit audio for a living, so if you have any questions on how to achieve anything, tips, tricks, methods, what-have-you, I’d be happy to tell you what I know. My first question would be, what program are you using?

I’m playing with Adobe Audition right now (got a hold of 1.5 when I was at Tech.) I found a deal for a Samson condenser mic podcast kit on ebay and the mic is great. I’m still trying to figure out the best setup but that will come with time. The tricky part is working through Skype since the podcast is interview driven.

It isn’t too complicated in structure, it’s just a pain to edit out those little sounds and such :wink:

– IG

Just don’t make 'em in the first place. :smiley:

I’m using Audition 1.5 as well, both at work and at home. It’s the best tool I know of.

In the past, I’ve spent three days editing out the little unwanted sounds and reading flubs in an hour session down to twenty minutes. I hope you never have to do that! The number one thing to keep in mind is that when you play back your edited sequences, they must scan correctly, that is, it must sound exactly the way people talk naturally. I’ve had to teach reporters not to edit out all their breaths, nor to edit out the silences between words and sentences. You’d think that would be blindingly obvious, but it’s evidently not, to those who haven’t edited before.

If you are conducting interviews over Skype, if the sound quality is less than desirable, and you must be heard asking the questions, see if you can arrange beforehand with the subject to pause before answering your questions. That way, you can re-read them later and paste the responses after your re-recorded questions.

If you are narrating, and you flub a line, read it again (and again) with the same vocal inflection. Then you can edit out the bad ones, so seamlessly that no one could tell. Eventually, you can learn to edit different readings of a word on a consonant or sibilant sound, but that comes under Advanced.

Your best friend in this regard is Zero Crossings. Look up Keyboard Shortcuts, and on the list you’ll see Adjust Zero Crossing Outward and Adjust Zero Crossing Inward. I map these commands to the Q and W keys, and I use them all the time. What this does is, when you have highlighted a section that you want to delete, hit Q or W and the outer edges of the selection move inward or out to the nearest place where the waveform crosses the center line. Then when you hit Delete, it is seamless. If you don’t do this, it is possible to end up with a click or bump where your edit is. If you edit in the middle of a word, it’s mandatory to use zero crossings, or it sounds very ugly.

There are a few things to consider. As I said, if there are any techniques you’d like to know about, please ask. I really love editing. I’ve done it since the days of razor blades and sticky tape, and computers win hands down.

Wait until you get a good grasp on it, then try editing music. That’s a science unto itself.

I’ve been using Cool Edit to edit music and produce CDs since before Adobe bought it and changed the name (it was recommended to me here on the SDMB), very easy to master, so to speak. Anyone know if the current version is significantly improved enough to warrant an upgrade or repurchase? I’d recommend it without hesitation. Cheers.

I’m not sure if a large diaphragm condenser like the Samson C01 is a good idea for speech. I’d figure a budget dynamic like the Shure SM58 would be better suited - less noise, pop filter, maybe less pronounced breath sounds too. Any opinions?

Slight hijack. I use Sound Forge, both for editing voice (vocals) and music. Have been since around version 4 when Sonic Foundry still made it. Have you used it? If so, in what ways would you say Audition is better? Can it use the same DirectX plugins?

To Watcher of the Skies, I started out on Cool Edit 96, so each revision has been worth the wait. Audition is the hyper-improved version of Cool Edit Pro. The main benefit of Audition is that the time it takes to do processes has been greatly reduced. It features refinements and upgrades to pre-existing functions, and has several hundred other features - many of which you or I may never have reason to use. If you’re happy with Cool Edit for the purposes you use it, I’d say there isn’t really a pressing need to spend $400 on Audition.

Mindfield, I looked at Sound Forge back around the turn of the centrury. Having come from Cool Edit, it seemed counterintuitive to me, sort of like a spaceship that I couldn’t figure out how to drive. So I really have no way to compare it to Audition. I don’t know what the differences are, and I’ve never been anywhere since, where anyone used Sound Forge. Audition does accept all DirectX plugins. The newest version will not run on any Windows earlier than XP (latest SP), and previous versions are discontinued. If they require the next revisions to run on no less than Vista, I’m screwed. I still use Win2K.

That said, I use Audition primarily for restoring records and tapes at home and assembling radio programming at work. It has the best tools for removing scratches and noise in the industry. The multitrack section is great, and I tried out Loopology, which comes with it, to build songs out of supplied licks, chops and riffs played on real, vintage instruments and amps. If you like Sound Forge, and it meets all your requirements, I can’t think of a convincing argument why you should switch.

Interesting. I also do restoration periodically. Sound Force has a nice DirectX noise reduction plugin, but results can be mixed depending on the type and amount of noise. Lots of tape hiss for example can be a real bear to remove without introducing noticeable artifacts into the audio, or at least cutting down the higher frequencies significantly. It’s not too bad if I can feed it a good, clean, long (1-2s) noiseprint to work with, but even that has its limitations, especially if the noise isn’t quite so “white.” Pure tape hiss isn’t bad. Clicks and pops, especially subtle ones, are tougher. Really bad tape hiss (especially from old, well-worn tapes) is almost impossible to scrub completely, even with a long noiseprint and a hyper-aggressive NR (-60dB). I’m curious now as to how well Audition works in that regard, especially in comparison to Sony’s NR.

Don’t get me wrong, I really like Sound Forge. I used CoolEdit Pro and Goldwave back in the later 90s, and Sound Forge didn’t come across to me as counter-intuitive by comparison, just a lot more involved due to the extra capabilities. But I’m always interested if something else can provide better results, even if it’s only in one particular (but important) area.

One of the things about Cool Edit / Audition that is not intuitive is its noise reduction section. If you use the settings that it comes with, it’s horrible and ineffective. I used it with varying, not-quite-perfect results for a few years before I went to the Syntrillium forums, now archived at adobe.com, and read up everything they had to say about noise reduction and click removal. After you change the settings to those which have been discovered by dedicated users, it works like a dream. You can get smooth, clean hiss reduction with no artifacts, but not by using the Hiss Eliminator, just the Noise Reduction function. For clicks, it has a great pop/click section. It has a spectral viewing mode, which turns your waveform into colors. You can zoom in and see each click, highlight each one, measure its parameters, and when you hit the button, they go away completely. One of the posters on the forum came up with a method for declicking and decrackling records that works better than any program, plugin or hardware I have ever used, including Sonic Solutions NoNoise. It has enabled me to remaster records so cleanly that you’d never guess it was a record playing. None of the work I do now has the slightest sonic artifacts of NR manipulation.

I’ll be restoring some open-reel tapes of classical performances made in the 1950s this week, at work. The previous ones I’ve done just sound stellar. No hiss, no pumping or breathing, no softening of attack on piano notes; the music emerges from total silence. They could market Audition as a noise removal tool, but they don’t, and the help section is about as helpful as the Windows help section. You have to dig to find out how to use it, but once you do, it’s unbelievable how well it works. I could send you some samples, if you’re interested.

On a related note, does anyone any idea if audition is any better than audacity for doing phonetics work? (In general, I use it for fiddling with noise and volume when the recording conditions were crappy and correcting for DC offset when the equipment was particularly crappy)

Omi no Kami, could you go into a bit more detail about the kind of thing you want to achieve? What is phonetics work? What are your recordings of? How do you manipulate them? What do they sound like beforehand, and what do you want them to sound like when you’re done?

This sounds interesting. So Audition’s NR is purely parametric? I could live with that as long as the spectrum analysis is robust. That’s one area where Sound Forge falls on its face – odd, considering how high end the software is supposed to be and how important spectrum analysis is to (re)mastering. SF’s spectrum analysis is not at all very intuitive nor particularly useful. The other day I was remastering some very early Information Society (early-mid 80s, from tape and record) and some of it was very hissy. The NR plugin worked, but not nearly well enough, and the spectrum analysis produced an FFT that was virtually unreadable. I fed it a noiseprint, and it turned up an FFT with frequencies all over the spectrum and no obvious spikes in the frequency range where the noise should have been sitting. (They were there, but subtle to the point where it wasn’t clear if it was hiss or something else.) The only semi-reliable method I have is to use a parametric EQ, isolate the noise frequencies manually (very tedious) and adjust the EQ to eliminate it. While I can specify a frequency range and curve to boost/lower the EQ within that range, it’s still relatively imprecise and gives unsatisfactory results, even compared with the NR plugin. SF’s click removal is completely useless, too. Even with an aggressive attack, it will mistake staccato or pizzicato instrumentation for a click or pop, and there’s no way to feed it a noise print of a typical click or pop for it to remove. In all other areas SF is great, it’s just in the area of restoration it seems to be egregiously lacking.

I’d be interested in that, yes – before and after clips would be great so I could get an idea of what I could expect, especially if the audio was in pretty poor shape in the before shot. I might consider Audition if I can get it to clean up my audio with a higher degree of precision and to a much more effective degree, hopefully without a great deal of hassle.

My recordings are almost exclusively of people speaking, so the relevant audio data is in the threshold between 0hz-44,000hz. The problem is that a lot of the utterances are hard to hear over background noise (jackhammers, AC vents, dentures being adjusted, the occasional goat), so I spend a lot of time profiling and subtracting background noise.

The second, and substantially more annoying part, has to do with the DC offset. Most of my recordings are done on a solid-state recorder, so it isn’t an issue, but my input is usually coming from a condenser mic with a cheap digitizer on the cord, as close to the actual mic as possible. The problem is that when I activate phantom power on the recorder, a lot of the cheap mics allow some of the voltage to seep into the digitizer, which results in a really awful DC offset that tends to hang out around 20-40hz off of 0.

A lot of the analysis the files are used for involves looking at integral intervals of formant center frequencies, so I’ve just been making an adjustment chart for each microphone I use, but it’d help if I could find a way to subtract the offset in audacity or a better program.

Human voices tend to fall into a dynamic range of about 200Hz (low baritone) to around 1KHz (soprano)*. Anything below or above this can typically be omitted or EQed down so that the vocal range is highlighted. There are a number of ways to accomplish this. You can either use a graphic or paragraphic equalizer to isolate those frequencies and attenuate everything outside of them. Alternately, you can use a band pass filter to accomplish more or less the same thing. Mainly it depends on what program you’re using to edit and what it offers in terms of functionality. It will almost certainly have an equalizer. A graphic EQ will work but, depending on the number of bands it has, this may not be very precise. A paragraphic works better as it allows you to set handles along a line or curve, allowing you to manipulate specific frequencies or frequency ranges. Band pass filters are quite simple: You just tell it the low and high frequency range, and it will shove the entire audio through this filter and eliminate or attenuate (depending on settings) everything beyond.

As for DC Offset, I don’t know what Audition is capable of, but Sound Forge can automatically detect and correct DC offset with one click. No manual adjustments necessary, as it will find the offset by itself.

  • Sibilants (“S” sounds) occupy their own frequency range, so simply filtering everything outside the range of human voice will typically also remove sibilants. You’ll probably need to isolate those separately.

So to give you guys an update on my progress. It took me about 3 hours to edit an hour’s worth of text, I cut about fifteen minutes of conversation. I’ve posted it but I’m going to try and cut it’s size down some. 52 minutes and approx 50 megs.

Any suggestions on how to best cut down the filesize?

– IG

Reduce the bit rate. If it’s just voice, 64Kb/s would probably be fine without introducing intolerable artifacting and will and lower filesize considerably, but you can always play with the bit rate to see what comes up as th best compromise between size and quality.

If the recording is mono, encode it to mp3 at 96 kbps mono. That’s the equivalent of 192 stereo, but half the size. If you’ve encoded it already at 128 stereo, it’ll be a step up in quality and less than half the filesize. This bitrate will eliminate the swishing and swirling of low-bitrate mp3 encoding. If you don’t really care that much about clarity, try 64 mono. I personally wouldn’t go that low, but see what it sounds like, and whether you can live with it.

The other day, I downloaded a podcast that was 128 stereo, and it took 35 minutes on a T-1 line! It was 81 MB! All the audio except the few seconds of music the guy extracted from a CD was mono, so it was really a waste of space. Anyone who is on dialup could just forget about it!

Edited to add: DO NOT re-encode your current mp3 to a lower bitrate!!! Go back to the .wav file and re-encode that to mono at either of the lower bitrates.

Thanks for the input guys, I’ll try those ideas when I get home. The intro has a short clip of music but it’s only twenty seconds or so.

– IG