Camera Lenses Part 4: Aperture Effect and Digital Filters

In this tutorial on the science of lenses (Part 4), John Watkinson examines lens resolution and discusses how to determine the lens performance needed so as to get the maximum performance from your camera sensor.


This article was first published in 2015. It and the rest of this mini series have been immensely popular, so we are re-publishing it for those who missed it first time around.


It’s straightforward to determine from theory or to actually measure how a still image will turn out, because nothing moves. I can see why photographers worry about it, but it’s less obvious why videographers and cinematographers should be hung up on static resolution, because their images are supposed to give the illusion of movement. Although we are discussing static resolution here, whether it tells us much about the portrayal of moving pictures is another matter, to which I will return.

If we want to know the actual static resolution performance of a real digital camera, we have to consider all of the mechanisms that are going to restrict resolution. There are basically four series mechanisms compounding to soften your still picture: the lens, the anti-aliasing filter, sensor aperture effect and the effect of any sensor Bayer pattern.

Camera beam splitters separate the image into three components; red, green and blue.  Image courtesy How Things Work.com.

Camera beam splitters separate the image into three components; red, green and blue. Image courtesy How Things Work.com.

Whilst the three-chip camera with a beam splitter has been a mainstay of television for a long time, the move to UHDTV calls that construction into question because the beam splitter becomes very heavy in large sensor sizes (weight goes as the cube of the size) and the power consumption becomes an issue. Power consumed turns into heat, which in the case of image sensors means higher noise, which we don’t want. It may be that future UHDTV cameras will all use single sensors having Bayer patterns. I wrote about these in an earlier article (Pixels, Photosites, Resolution and all that…) where I explained that the Bayer pattern designates different photo-sites to the sensing of different colours. The ratio of photo-sites to output pixels is about four to one.   

The use of discrete photo-sites means we are sampling the image falling on the sensor and we have to consider sampling theory to understand what is happening. Sampling theory requires that we have to prevent energy at more than half the sampling rate entering the system. Unless such energy is naturally absent, means to prevent aliasing are required. A physically identifiable anti-aliasing filter may be used, but the lens itself also acts like a filter.

In audio, an anti-aliasing filter is pretty easy, because we are filtering a bipolar electrical waveform, say from a microphone, and we can use an electronic filter that can have a bipolar impulse response; one that goes above and below zero Volts. We need bipolar impulses to get a steep cut-off slope. Unfortunately in the optical domain, bipolar impulse responses are simply impossible because there is no such thing as light of negative brightness. Optical anti-aliasing filters work by birefringence, essentially seeing double in a controlled fashion, which is used to create a positive-only point-spread effect. The problem is that the cut-off slope is poor, so if aliasing is completely prevented, a lot of in-band detail is also lost.

Those of us with white hair, reminiscing from our rocking chairs, remember the good ol’ days of NTSC, where a guy in a patterned suit at just the right distance from a camera would generate 3.58 Megacycles per second in our video and turn a funny colour. To test a camera for aliasing, you need to find the same guy, or some other source of repetitive fine detail. Zoom out slowly and see what happens when the modulation depth of the detail starts falling.In all practical sensors, the photo-sites are arranged very nearly to touch one another in order to gather up all the light the lens delivered. Straight away we are violating sampling theory, which holds that we can only use two samples per cycle if the samples are vanishingly small. There is a parallel here with the Point Spread Function (PSF) of a lens, which for the best resolution should also be vanishingly small. The PSF of a lens is sombrero-like, whereas the photo-site has a rectangular aperture because the sensitivity is the same all over. This aperture is sometimes called the Instrument Spread Function (ISF). 

Fig.1. All sensors and displays have a spatial frequency response that falls like this because sampling is done with finite areas instead of with vanishingly small points. Note that the effects of the camera and display are in series.

Figure 1 shows that the frequency response of a 100 per cent rectangular aperture, corresponding to photo-sites that touch, looks rather like a bouncing ball. The part we are interested in is up to half the sampling rate, where the response is 0.64 on a linear scale or -4dB if you want to impress. That’s not all; every electronic display does the same: the light is radiated from photo-emissive sites that practically touch, so the effect is doubled.

It’s easy to compare two cameras without the display confusing the result. Simply pass the pictures through a decent DVE and blow them up by a factor of two before comparing.

Sampling and Anti-Aliasing
The consequence of anti-aliasing and aperture effect is that no n-line TV system can ever have n-lines of resolution, so it is pretty inefficient to transmit n-lines between camera and TV. When it was all done with vacuum tubes, there wasn’t much alternative, but now we have as much cheap signal processing as we want, we continue making systems like that for reasons that have nothing to do with technology. Reasons like tradition, which has always been an easy substitute for thought.

Using oversampling, we can do things rather better. Referring to Figure 2, we begin by specifying the resolution we actually want, in terms of lines, then we build a Bayer sensor and a matching anti-aliasing filter having twice as many lines, which means four times as many photo-sites. We set the lens performance so that the MTF starts falling at the edge of the band we want, so that it augments the anti-aliasing. Most of the loss due to lens MTF, aperture effect, Bayer pattern and filter slope is then outside the band we want.

Fig.2. The effect of camera aperture effect can be diminished by oversampling. The response, centre, down-sampled from a higher line count camera, top, out performs the response of a camera working in its native line count, bottom.

Included in the de-Bayering interpolation, the sensor output is down-sampled to the line count we actually want using a digital filter, which can have a sharp, bipolar impulse response. We then have the video signal that would have come from an unobtainable ideal camera having that number of lines. In order to overcome the aperture effect of the display, we simply build the display with twice as many lines as the signal and up-convert. The result is a sharp picture with no visible line structure and a surprisingly low line count in the signal transmitted between the two.

Some digital photographic cameras use such a sufficiently high oversampling factor that the sensor bandwidth is beyond the diffraction limit of any real lens. In that case no anti-aliasing filter is needed because the lens does the filtering. This is easier in photography because the high pixel count and subsequent heavy computing load does not have to be repeated at a frame rate.

Incidentally, the use of oversampling also allows concealment of defective photo-sites to be essentially invisible. Interpolation from adjacent sites produces a value that, after down sampling, is indistinguishable from the correct value. This is important because it means the chip manufacturers have fewer rejects, which lowers product cost. It’s the same reason error correction is used in flash memory cards.

Lens Resolution
We can lump together the effects of the anti-aliasing filter, the aperture effect of the sensor and the effect of any Bayer pattern after our de-Bayering and down-sampling process and call that the effective sensor resolution, and we can compare that with the lens resolution.

Clearly, to obtain the best value for money, we want a system where the loss of sharpness due to both effects is about the same. In other words if we had too many pixels in our sensor, we would be wasting money on the sensor because the picture would be limited by the lens, whereas if we had insufficient pixels in the sensor our lens would be over-specified.

What we are looking for is a reasonable match. This means that the lens point spread function needs, more-or-less, to be the same diameter as four Bayer photo-sites in a square, because those create more-or-less one pixel value.

It’s not difficult to estimate the match. You need to know the f-number of the lens, the width of the sensor and the number of pixels in a line. The width of the lens point spread function can never be smaller than the diffraction limit, and for green light, which is in the middle of the visible spectrum, the diameter in micrometres is given by 1.22 x f.

Make the Calculation
Let’s take an actual example that I know works and it is based on my medium format digital photographic camera. I decided that there were some remarkably good lenses available for medium format (60mm x 45mm film frames) and upgrading such a camera to digital would be a killer combination. I figured f8 would be a common aperture to run at, and the lens PSF could not be better than about 10 micrometres at that aperture.

Typical digital sensors for medium format images are about 50mm across, so dividing 50mm by 10 micrometres I came up with an effective pixel count of 5000. Blow me down, as they say, if there wasn’t a 22 Megapixel sensor available with 5344 effective pixels across. Although higher pixel counts are available, I figured they would be beyond what physics allows the lenses to deliver and would simply be noisier.

What About 4K?
Smaller formats can use larger apertures, so for a TV camera let’s pick f4. For an f4 lens the PSF would be nearly 5 micrometres across. The sensitive area of a 2/3 inch camera chip is about 9mm across, so if we divide the width by the PSF size we get 9000/5 or 1800. From a resolution standpoint, that’s the rough number of pixels we need to match the lens performance. That’s not far from the 1920 pixels per line of an HDTV format, so we can say that a 2/3 inch sensor is a reasonable choice for an HDTV camera because the pixels are about the same size as the lens PSF. However, if we stop down to f8, the PSF size doubles, the resolution is halved and our HD camera becomes an oversampling SD camera.

It should be clear that in the case of consumer HD cameras that use 1/3 inch sensors the only thing that is HD is the printing on the box. Even though the sensors have the full HD pixel count, the images are lens limited.

What about UHDTV, the so called 4K format? Well, firstly 4K has 4096 pixels across the screen, as used in digital cinema, whereas UHDTV has only 3840 pixels across the screen but is still referred to as 4K. Don’t ask me why or I might get grumpy. Let’s say we want to work down to f4, where our lens PSF is nearly 5 micrometres. If we want to resolve about 4000 of those, we need a sensor about 20mm across. That’s about the width of a 35mm movie film frame. Isn’t life strange?

Links to the preceding three parts of this series on camera lens technology can be found below. 

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Demands On Production With HDR & WCG

The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.

NDI For Broadcast: Part 3 – Bridging The Gap

This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…

Designing An LED Wall Display For Virtual Production - Part 2

We conclude our discussion of how the LED wall is far more than just a backdrop for the actors on a virtual production stage - it must be calibrated to work in harmony with camera, tracking and lighting systems in…

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.