"Enhanced" Audio/Video

What exactly are the experts doing when they “enhance” these?

I would imagine in the case of audio, it’s possible that there are other sounds (e.g. static) which they can remove and leave other sounds more audible. But what about video? I would think whatever is there is there. Unless maybe the typical showing of the video smooths out the view and the enhancement removes that in favor of showing fine details.

What I’m driving at with all this is that I’m wondering if the enhancements are really judgment calls, and the experts can magnify certain sounds and sights and distort the audio/video, and do this based on their judgment as to what the actual scene sounded/looked like. Or is this incorrect, and the enhancement version is more faithful to the actual?

The two biggest enhancements are (1) brightening the video and (2) increasing contrast. (1) is pretty self explanatory. They just make everything less dark. This sometimes brings out details in dark areas. (2) is a little bit more complicated. Imagine if you have a spectrum from white to black, with gray in the middle. If you increase the contrast, the dark grays will become darker and the light grays will be lighter. Eventually, you will just have half being black and half being white. This makes lines look clearer, and can be helpful in seeing things like letters, logos, and other features more clearly.

This is assuming that the video is otherwise essentially fine.

There are a whole range of things that might be covered under “enhancements”. The OP asks about “expert” enhancements, which does at least avoid simple automated enhancing systems.

Enhance could be reasonably be broken down into three categories.
[ul]
[li]Those that seek to extract as much information out of the original as technology allows.[/li][li]Those that seek to create new information making assumptions about the nature of the subject.[/li][li]Those that seek to make aesthetic improvements to the content.[/li][/ul]
Clearly there is some blurring of the boundaries here.

Examples of the first kind. Film transfers done well will go back to the original negatives, scan them at very high resolution (8k if you are really lucky), apply colour corrections based upon the film stock and film age, and also fix defects in the film (scratches and worse). At the end of this process you have a copy of the film that is probably better than anyone ever viewed it. Scratch removal however involves inventing new information.

Authoring of the film to a digital medium, Blu Ray for instance, will resample the video down to 2k resolution, and may perform “enhancements” for the media. An obvious one is to remove film grain. Another issue is the colour space of the film. Your TV has a different colour gamut to the original film. Removing grain is controversial - purists feel it is part of the aesthetic of film and should not be removed. Also removing it is never side effect free. It may blur detail elsewhere, and may also lead to strange visual issues in the area where the grain has gone. The colour space is a mix of technical and artistic choices. The original film may have been shot on a particular film stock for artistic reasons. If the transfer is done on the cheap it won’t be scanned at high resolution, nor will there be money to do all the nice colour grading and fixups. The authouring process may simply automate many of these, with automatic scratch removal, smoothing, edge enhancement, and a stock colour space. Edge enhancement is another case of creating new information where none was before. Done carefully it can add snap to the image, done badly and you get a new range of artefacts.

In the audio realm much the same is around. You can go back to a classic recording (where classic typically means something we grew up with) and rebuild the recording almost from scratch. If the 2" tracking tape exists you can almost reproduce the entire production. There are techniques that can extract more information off an old tape than the original recording engineers ever imagined was there. Tricks like sampling at 192 kHz so that the restoration software can see the bias signal, and use that to eliminate tape scrape flutter, and techniques that can correct for the exact physics of the tape formulation and recording head design. If you have the 2" multitrack is possible to recreate the entire mixdown again, and a new digital master of vastly higher quality to the original recording. Even if you only have the 1/4" master tape, you can still do really well here. However there is always the temptation to “re-master” the recording, which is an artistic call. Sadly this often means adding significant compression to “enhance” the punch and slam of the recording, and for many ears is a backwards step. (Counter intuitively, adding compression to a recording makes many people interpret it as more dynamic.) Even if you go back to the 1/4" master it is important to re-equalise it, since it will have been mixed for vinyl.

Noise removal is just like grain removal in films. You can do it, but it is never side effect free. How much, if any, to do is a mix of technical and aesthetic issues. If you go back to very old (like lacquer masters) many recording will still leave a great deal of noise present, as its removal leave a dead sounding recording.

In film restoration you need to meld the video and audio. A serious film restoration will seek as much original audio as it can, and rebuild it. But sadly often this is harder. You might only have an optical track to work with. Here it is probably a matter of doing as good a job as one can. But re-equalising, de-noising, dynamic band selective dynamic range expansion, and so on will all help. This is a case where the modern content delivery is much better than the optical audio track, and by making intelligent assumptions about the nature of the original a true enhancement may be possible.

On cheap consumer devices, tend to mean 3d effects type echo bs for audio, and maybe some color amping on video, which may or may not make something appear nicer to a rube on a very cheap output device. tends to be a bad idea to turn on such features.

Imagine a low resolution picture.
First you double the number of pixels in each direction.
But, if there were a border, the edge would have been grey rather than 1 black and 1 white pixel side by side.
To sharpen borders, you find spots where it goes “white-light grey-grey-dark grey - black” and assume “this is because of blur”. Sharpen by reducing those transition boundaries. There are complex algorithms and bandpass filters to do that.
Of course, this could cause artifacts like sharp edges where there really ought to be gradual transitions. The parameters applied to sharpening make assumptions on how blurry something should be if it was just a lens effect.
Similarly, there are moderately good filters to remove motion blur, out of focus, etc.
Real fancy photoshop, you can mask areas and selectively enhance them to get a moderately good combined picture.

However, you can’t get a sharp face from a few pixels except with the “Hollywood filter”.

Audio does something similar. suppress some frequencies, like higher frequency white noise or hiss. Try to compensate for the poor response of the microphone, or attenuation of difference frequencies because of the recording environment, distance, etc. basically, it’s like fiddling with a real fancy equalizer.
You might also take a loop of repetitive background noise from a lull and subtractively apply that to the whole tape to see if that cleans things up.

Thanks.

But it seems like you’re talking about enhancing in the sense of making some entertainment media better for viewing or listening pleasure. My question was inspired by (but not limited to) the Trayvon Martin story, in which experts enhanced audio and video recordings in order to better determine what they showed.

For example, an initial video of GZ did not appear to show a head wound. Then an enhanced version did. What type of enhancement would make it appear? If the original pixels recorded by the camera did not contain a headwound how could it later show up? Is it possible that the experts simply detected a very very fine line in the original recording and then magnified that line so that it showed up better in the next version? Or was there something that was captured by the original camera but just didn’t show up in the display?

And so on, for other instances.

They take a series of frames that show the back of Zimmerman’s head, shift each one so the area they want to enhance is in the same place on the screen, and the software extracts detail from all the frames. Demo here.

This technology was developed with your tax dollars by NASA to analyze the Space Shuttle Challenger exploding. It SHOULD be free, but this one company bought the technology and has an exclusive on it and extract even more of your tax dollars by selling it back to government agencies.

Here is another product that uses AMD GPUs to enhance in real time.

Anyone watch those videos? Ikena really does seem capable of doing some really CSI-like things, particularly the one on text enhancement or this one where they take you through getting a license plate number from a camera phone.

The camera wobble in the licence plate example actually helps to improve the final quality - it’s effectively increasing the resolution of the capture, taken across multiple frames, because the pixels aren’t each bound to an absolute region of the image. Similar techniques can be used in astronomy, microscopy - it’s even something our own eyes do.

It is essentially spatial dither. although a wobbly camera won’t do as good a job as a dedicated system, the additional information will be captured, and once you know the information is there, it is just a matter of working out how to drag it out. Dither is often badly misunderstood, but one of the critical aspects of any sampling system. It is the way the system reaches under the apparent quantisation floor and meets the Shannon limits on information transfer. On the other hand, understanding Shannon is also critical, as a lot of CSI style enhancements are actually bogus, and used more for the sake of telling a story than a representation of real practice.

Agreed - any time you see ‘enhance’ on TV (and especially in CSI) or in movies, it’s absurdly fanciful and what we’re shown is just a reversed sequence of a detailed image being pixellated.

However, there are certain kinds of unclear image that can be enhanced - where the detail isn’t exactly lost, but has been obscured in a way that can be formulaically undone - unfocused images, or those affected by linear motion blur, for example, can be deconvoluted.

Enhance! A nice compilation of the Hollywood Filter in action.