CMOS Sensors in DSLR Cameras

There’s a lot of articles and videos out there about CMOS sensors in digital cameras and the rolling shutter effects that they produce. That’s pretty easy to understand, as the sensors are activated and read line-by-line from top to bottom, causing distortion in fast-moving subjects. A DSLR in photo mode, however, uses a mechanical shutter with two physical curtains that move across the front of the sensor. In longer exposures, the sensor is completely open to light as the front/bottom curtain drops out of the way before the rear/top curtain even begins to descend. At higher shutter speeds, the rear curtain follows the front curtain so closely they leave just a narrow slit between the two curtains moving across the sensor, effectively a mechanical rolling shutter. There’s an excellent video about it all here: Inside a Camera at 10,000fps - The Slow Mo Guys - YouTube

The thing that I can’t quite wrap my head around is how the sensor in a DSLR functions when in normal photo mode. From 3:49 to 4:40 in the video he explains that the mechanical shutter is a rolling shutter (at high shutter speeds anyway), and that in video mode the electronic shutter is also a rolling shutter. That all sounds fine, but then he mentions that the purpose of the electronic rolling shutter is because the camera/sensor doesn’t have enough power to read the entirety of the sensor all at once (which would be a global shutter). So then what is the sensor doing in photo mode? I assume the entire sensor has to be powered while the shutter curtains are moving/open. But then why doesn’t that cause a power issue? Is it because if no light is hitting those pixels on the sensor then there’s effectively no signal/charge to dump out of it? So it’s technically “all on” but only really “active” where the light’s hitting it? I don’t imagine that the sensor would try to “follow” the mechanical shutter with its electronic shutter, that just seems like it would be fraught with timing problems.

Then what of live display/preview that can be enabled on DSLRs and is the primary raison d’etre of mirrorless cameras? Is the active sensor just downsampling the image to such a low resolution that it can handle the constant signal output without issue? Help me out here.

I’m not entirely sure what you’re asking here and I’ve been out of the camera game for a long time, but at least the last time I bought out, you couldn’t get a DSLR that had any type of a ‘live preview’. At best, you could probably get a display that simply showed what you’d see looking in the eye piece. That is, you’d have what would amount to a second camera/sensor that looks where you’re eye would be looking…reflecting off the mirror and through the lens.

By it’s very nature, the sensor on a DSLR camera is blocked by the mirror when it’s not being used.
Most cameras have a feature that holds the mirror up, but that’s meant for specific circumstances. Namely, removing some of the vibration for long exposures. Of course, since the mirror is locked up, you can’t see anything so you have to get everything set beforehand.

My WAG, is that a DSLR-style camera that has a live video feed isn’t actually a SLR at all. It just behaves like one, the ‘shutter’ is entirely electronic (it just reads/records the sensor for a set amount of time when you press the button) and the video preview is just using whatever percentage of the pixels it’s able to process to give you a picture that’s not lagging.

Kind of a ‘best of both worlds’ type thing. It gives people the ability to use the camera like the point & shoot cameras they’re used to but also gives them access to real lenses.

Live view flips up the mirror, opens the shutter, and shows what it sees on the LCD screen in back, so you can’t look through the viewfinder anymore. It’s basically video playthrough, only it’s not actually recording. When you do want to record video, it has to be in live view mode for the same reason, because then it’s using the electronic shutter exclusively. If you use live view for photo taking, it’s very slow because it then has to de-energize the sensor, close the shutter, do a normal actuation, then open the shutter back up and re-energize the sensor for the live view. It may even put the mirror back down to get a proper light reading or refocus, but I’m not sure about that. Either way there’s a lot of ka-chunk ka-chunking and the whole process can take 1-2 seconds.

Consider a single row of a CMOS sensor. All the pixels in the row are reset simultaneously; then the pixels sit there integrating photons for the exposure time; then the pixel values are read out in parallel. Most of the time the pixels are in the integration phase.

The operation of each row is staggered in time - thus the rolling shutter. For a sufficiently long exposure time there is a substantial period where all of the pixels are integrating photons - the last row has been reset, but the first row has not been read out. If the exposure is controlled by a mechanical shutter, the mechanical shutter operates during this period. So the sequence is (1) reset rows in sequence; (2) operate mechanical shutter; (3) readout all rows in sequence.

Hope this makes sense. There are likely many variations on this theme.

Marvin the Martian explained it very well. With a mechanical shutter all the CMOS rows are reset while the mechanical shutter is closed, then the rolling mechanical shutter does its thing, and then after the shutter is closed again the CMOS is read row by row. Effectively, the CMOS is behaving like it was film. In this case, only the mechanical shutter is contributing to the rolling shutter effect.

I agree the video is a bit confusing in that regard. First, I’m not sure what he means by “power”. There are CMOS sensors that are “global shutter”, and while they do have extra components on the chip, which will require a bit more power and therefore generate more heat, I don’t think that’s a very big limitation.

The problem with global shutter seems primarily to be component density on the CMOS chip. The global shutter ability requires more tiny transistors than the standard rolling shutter chip. More transistors also means more wires between them (not really wires), and those transistors and wires have to go somewhere. The solution is to put them between the light sensing parts, so you have to make those smaller or not have as many of them. Smaller ones gather less light and have more noise. Having fewer works better, but then the marketing department isn’t happy because pixel count is what most people pay attention to.

(Disclaimer: I’m not an IC designer by any stretch, so I would welcome someone with experience in that area to correct me if I got something wrong.)

My assumption is that the live display is just using standard electronic shutter mode. No need to downsample because of the CMOS or any power issues. Keep in mind, many modern DSLR cameras can record at 60 FPS at 4K. The only downsampling will be due to the display itself, not power. If the frame rate and image quality of the display was high enough you could probably see the rolling shutter effect in live display mode as well.

So if I understand correctly, after an electronic clear/wipe pass, the sensor is basically blanked out (let’s call it discharged) and ready to be exposed. Then the mechanical shutter opens/slides by, and the image is charged onto the sensor. Once the mechanical shutter has closed, then the image is read out to memory line-by-line. So it’s basically a mechanical shutter actuation followed by an electronic shutter actuation. Do I have that basically right? Does the sensor then need to be wiped again, or does reading out the image also flush it too?

As for global vs rolling electronic shutters, as far as I can tell, heat and power delivery was more of an issue in the past but isn’t much of a factor today. What does remain a factor however is read-out time. The advantage of a rolling electronic shutter is that it can read out the rows as a basically continuous stream of data, it’s scanning constantly like a CRT display or a VCR. With a global electronic shutter, it has to wait for the entire sensor’s data to be dumped out before it can expose another image, much like a film camera that needs a moment to advance the film to the next unexposed frame. That’s a big burst of data coming all at once that needs to be moved out of the way ASAP, followed by a period of inactivity where no data transfer is happening.

I think your above summery is correct, but with a few caveats.

To avoid confusion I would avoid thinking about it being followed by an electronic shutter (the bolded part). Calling it an electronic shutter in that context implies that the shutter timing is up to how long the sensor is allowed to collect light, basically the timing between clear and read, but that’s an incorrect implication with a mechanical shutter. The sensor should not be receiving light while the shutter is covering it, so the delay between clear and read doesn’t dictate the exposure time (although maybe there’s some internal discharge time that might need to be taken into account). The camera could start reading a row as soon as the shutter has passed that row, but I don’t know if any cameras do that. It seems simpler just to wait until the shutter is completely closed.

If I understand the technology correctly, reading the pixel (or cell) doesn’t by itself clear the value, and does need a clear / reset (or as you said discharge). That’s where I come to the the end of my technical understanding, though. It looks like there are many different ways of implementing the reset based on trade offs like lag and noise.

If you haven’t already, you might want to read the Wiki on Active Pixel Display for more information.

That’s an interesting point about the read-out time, and how streaming affects this. Do you have any links to articles discussing this? I don’t have any good numbers on how long current CMOS censors take to read a row, or the entire array. There are many different strategies and technologies for how to read data, and I’m sure it varies chip to chip, or camera to camera, so the question is does this affect the choice between global vs. rolling shutter. My intuition is that it isn’t a primary concern, but I could be wrong.

For a few examples, the video in the OP says his lower resolution high-speed camera uses a global shutter, and it can record at 10,000 FPS, but his 4K camera uses a rolling shutter. I can imagine that this might have been a compromise due to CMOS read speeds, but when you’re dealing with 10,000 FPS it seems sensors can be designed where the read speed isn’t the bottleneck. Modern DSLR cameras are doing 20-30 MegaPixal continuous (still photos with shutter) at 8-9 FPS, which makes me think the the bottleneck is something other than the CMOS read speed, as they can record 4K at 30 FPS (I didn’t find a 60 FPS over 1080p, but I’m sure they are out there).

Another way to look at it is that if you are filming video at 1/120 shutter speed and are recording at 60 FPS, it has half a second, every second, to read data from the sensor. True, it might not be reading every row at video resolution, but it doesn’t seem like CMOS read speed is the issue.

After some thought I think the author of that video probably meant something more akin to how “powerful” the processor is or overall data bus bandwidth. If so those seem to me to be more cost compromises than technical limitations of the CMOS. Also, keep in mind that the author of that video may have less technical understanding of how the camera is actually working than even I do.

If you want to get technical, a paper about global electronic shutter from 1997 can be found here: [PDF] Frame-transfer CMOS active pixel sensor with pixel binning | Semantic Scholar (PDF warning)

Also, what looks to be an interesting lecture about the very low levels of CMOS (but I only skimmed it): http://isl.stanford.edu/~abbas/ee392b/lect04.pdf (PDF warning)

I am an IC designer and have actually just finished a global shutter CMOS image sensor (not for a DSLR or any other consumer application).

Generally everyone’s comments are correct, but I can clear up one or two things.

The photosensitive element in a CMOS pixel is the photodiode. inside the photodiode optical photons are absorbed and generate electron-hole pairs which are swept to opposite side of the photodiode and accumulate as a stored charge across the photodiode junction.

When the photodiode is reset, it is connected to a defined voltage so that the charge stored is at a known level (actually most CMOS sensors use what is called a pinned photodiode which can be reset to a zero charge state).

During exposure, the photodiode is floating (disconnected from everything else) and photon induced charge is accumulated. This is the *integration *time, since the charge in the photodiode is proportional to the integral of the photon flux entering the photodiode since the most recent reset.

And the end of the desired exposure time, the photodiode is read out - the charge on the photodiode is converted to a voltage by transferring the charge onto a capacitance (called the floating diffusion) which is connected to the gate of a source follower buffer amplifier.

The readout is destructive - all charge is removed from the photodiode. In theory, this resets the photodiode; in practice, the readout may not completely remove all the charge, leaving a remnant on the photodiode which causes *lag *(bleeding from one image to the next). Also, using readout as the reset limits integration time to a fixed value depending on the frame rate (frames/second). An explicit reset is generally used for these reasons.

In a rolling shutter imager, the source follower outputs of an entire row of pixels are switched into a set of column amplifiers, one per column of the pixel array. The pixel values are sampled and then converted to digital, either by a high-speed analog-to-digital converter (ADC) shared across all columns or by a number slower ADCs shared across a few columns each.

In a global shutter imager the source follower output of each pixel is sampled within that pixel, requiring at least an extra sampling switch, storage capacitor, and source follower. All the pixel outputs are sampled simultaneously onto these internal nodes, which are then disconnected from the first source follower so the photodiodes can be reset and another integration period can be started. Meanwhile, the sampled pixel voltages are sampled row-by-row onto the column amplifiers and then converted by the ADC(s).

The main issue with global shutter imagers has already been mentioned: more circuit elements and interconnect are needed for each pixel, which reduces the area available for the photodiode and therefore reduces the quantum efficiency (QE) of the imager - the percentage of photons hitting the imager which are converted to useful output signal. However, advances in* backside processing* (flipping the imager die upside down and grinding it so the the bottom of the photodiode is at the top surface of the chip) eliminates light blockage from interconnect (and possibly capacitors) from affecting QE. Microlens design can improve QE by focusing light from the entire pixel area onto the photodiode; and stacked die technology can allow some of the extra circuitry to be put onto a different chip.

Power is also increased in a global shutter imager - primarily the power used to turn the additional switches on and off. Simultaneous switching of the entire imager array does not increase power but can put higher peak currents on power supplies, making power distribution inside the chip more of an issue.

Wow, that’s about as thorough an explanation as one could hope for, Marvin. Thanks!