GorillaMan, there is one sentence from your link with which I can half-agree, and that is where he says:
[I agree with the first half. He’s no image specialist.]
IMHO, the crucial phrase in your OP is:
Whereas I have no particular knowledge of the exact compression algorithms used in the CCTV camers at Luton Station, I do have some experience with video image compression and manipulation.
Good old-fashioned analog video signals used to be recorded on videotape (or, if appropriate, broadcast over the airwaves), with a storage capacity / bandwidth dependent on the spatial resolution, framerate, and intensity depth (including color information). This method could be made fairly free of artifacts, but consumed a large storage capacity or bandwidth for the resulting quality.
In the realm of digital video, compression is used to minimize the storage and/or bandwidth requirements while maintaining the desired image quality. Without this technology, we wouldn’t have DVDs or streaming media. One of the most frequently-used video compression methods is to use key frames and delta frames. Key frames are sent every nth frame, and provide intensity information for every single pixel in the original image coming from the camera sensor (CCD or CMOS). The remaining frames are delta frames (also called P-frames or B-frames in different naming conventions), which provide information only on those pixels whose intensities have changed since the last key frame. Much of the advances in video compression involve determining how often to send key frames, and development of (often very different) algorithms for key frame compression and delta frame compression.
Closed-Circuit Television (CCTV) cameras, such as those in the OP’s link for the Luton Station CCTV, are ideally suited for video compression. Although some of them have pan/scan/zoom capabilities, most unattended ones are focussed on a static image field. Thus, key frames can be recorded fairly infrequently, and most information is in the delta frames, which show the movement of people. In addition, there is a huge incentive to reduce the data rate, since one wants to be able to store CCTV video images for several days per camera location, and it could potentially involve hundreds of gigabytes per day for a multi-camera site.
The more agressive the algorithm used to reduce the data rate, the more likely it is that “artifacts” will enter the video image, and the most common artifact found is often a ghost from the most-recent key image. In many cases, this means that images of static objects appear mixed in with the live image.
The above artifact problem can get worse if one does post-acquisition processing on certain “regions of interest” (ROI), at the expense of the rest of the image. It is entirely possible that, while enhancing the image of the guy with the backpack in the foreground of the Luton image, various key+delta frames were added together. If “backpack kid” and “white hat” were walking at different speeds, it wouldn’t be unexpected to find that image enhancement of the former causes the latter to get a railing through his head. It’s also possible that image enhancement of “white hat” would cause “backpack boy” to grow a second head.
The ideal soultion would be to record all parts of the image at maximum frame-rate, but that makes storage of the information prohibitably expensive. Since we can’t get at the raw image from the camera, I would at least want to see the various “key” and “delta” frames before claiming that someone is inserting phantom characters.
There’s also the point that, if you’re going to go to the trouble to insert a person into the image, wouldn’t you do a better job of it than the OP’s link photo shows?
[My credentials? I’m currently working on an imaging application in which the information from a high-resolution, high-framerate camera is compressed to allow streaming broadcast of sporting events that have previously been considered difficult to broadcast via TV.]