Visual Communication - Discussion of Information Theory of Visual Communication

Discussion of Information Theory of Visual Communication

 
 
This characterization addresses image systems that combine image gathering and display with digital processing for signal coding and image restoration. These systems enable the user not only to acquire images (take pictures or make videos) but also to transmit them and to alter their appearance. The alterations may vary widely, according to need or taste. Normally, however, the goal is to produce images that are sharp and clear, without perceptually annoying or misleading distortions, and to do so for a wide variety of scenes at the lowest cost in data transmission and storage. This goal, to consistently and economically produce the best possible images, leads the proper characterization of image systems into the realm of communication theory.

Communication theory is concerned explicitly with the transmission and processing of signals. Modern formulations of this theory are based largely on the classical works that Shannon1 and Wiener2 published over 50 years ago, in the late 1940's. Shannon introduced the concept of the rate of transmission of information (or information rate) in a noisy channel, and Wiener introduced the concept of the restoration of signals perturbed by noise with the minimum mean-square error (or, equivalently, the maximum-realizable fidelity). These concepts deal inherently with statistical properties of stochastic (random) processes and probability of occurrences rather than with prescribed periodic functions or transients and deterministic outcomes.3

Fellgett and Linfoot4,5 were the first, in 1955, to extend these concepts of communication theory to the characterization of image systems. Their extension addressed photographic image systems in which an objective lens forms a continuous image of the captured radiance field directly on film. The formation of this image is constrained by the blurring due to the objective lens and by the granularity of the film, similarly as the transmission of a signal in a communication channel is constrained by bandwidth and noise.

Digital image systems, however, are more complex. Normally, these systems encompass the following three stages (See Visual Communications Channel).

  1. Image gathering, to capture the radiance field that is either reflected or emitted by the scene and transform this field into a digital signal.

  2. Signal coding, to encode the acquired signal for the efficient transmission and/or storage of data and decode the received signal for the restoration of images. The encoding may be either lossless (without loss of information) or lossy (with some loss of information) if a higher compression is required.

  3. Image restoration, to produce a digital representation of the scene and transform this representation into a continuous image for the observer. The digital processing first minimizes the mean-square error due to the perturbations in image gathering, lossy coding, and image display; and it subsequently, if desired, enhances the visual appearance of the displayed image according to need or taste.

Human vision may be included as a fourth stage, to characterize image systems in terms of the information rate that the eye of the observer conveys from the displayed image to the higher levels of the brain.

These four stages are similar to each other in one respect: each contains one or more transfer functions, either continuous or discrete, followed by one or more sources of noise, also either continuous or discrete. The continuous transfer functions of both the image-gathering device and the human eye are followed by sampling that transforms the captured adiance field into a discrete signal with analog magnitudes. In most image-gathering devices, the photodetection mechanism conveys this signal serially into an analog-to-digital (A/D)converter to produce a digital signal. In the human eye, however, the retina conveys the signal in parallel paths from each photoreceptor to the cortex as temporal frequency-modulated pulse trains. Analog signal processing in a parallel structure has many advantages that are increasingly emulated by neural networks.6

This model of digital image systems differs from the classical model of communication channels in two fundamental ways:

  1. The continuous-to-discrete transformation in the image-gathering process. Whereas the performance of classical channels is constrained critically only by bandwidth and noise, the performance of image-gathering devices is constrained also by the compromise between blurring due to limitations in the response of optical apertures and aliasingdue to insufficient sampling by the photodetection mechanism. This additional constraint requires a rigorous treatment of insufficiently sampled signals throughout the characterization of image systems.

  2. The sequence of image gathering followed by signal coding. Whereas the characterization of classical channels accounts only for the perturbations that occur atthe encoder and decoder or duringtransmission, the characterization of image systems must account also for the perturbations in the image-gathering process priorto coding. These perturbations require a clear distinction between the information rate of the encoded signal and the associated minimum data rate or, more generally, between information and entropy.

The extension of modern communication theory to the characterization of digital image systems7-10 establishes a cohesive mathematical foundation for optimizing the end-to-end performance of these systems from the scene (the original source) to the observer (the final destination) in terms of the following figures of merit:

  1. The information rate e of the encoded signal, the associated theoretical minimum data rate e, and the information efficiency /e. For lossless coding, the information rate e becomes the information rate of the acquired signal.

  2. The information rate d of the displayed image, the associated maximum-realizable fidelity d, and the visual information rate o that the human eye conveys from the displayed image to the higher levels of the brain.

To be useful, the theory must account for the physical behavior of the real world and lead to a close correlation between predicted and actual performance. These two basic requirements are not easily satisfied, for there exists a wide disparity between the complexity of the image-gathering process and the simplicity of the model of this process for which the mathematics of communication theory yields expressions for information rate and fidelity. This disparity must be bridged carefully by judicious assumptions and approximations in the development of a suitable mathematical model of digital image systems and of the associated figures of merit for optimizing their end-to-end performance.

The mathematical model of the visual communication channel and the associated figures of merit developed in Refs. 7 to 10 are sufficiently general to account for a wide range of radiance field properties, photodetection mechanisms, and signal processing structures. However, the results that are presented emphasize (a) the salient properties of the irradiance and reflectance of natural scenes that dominate the captured radiance field in the visible and near-infrared region under normal daylight conditions and (b) the photodetector arrays that are used most frequently as the photodetection mechanism of image-gathering devices. This emphasis fosters the comparison of image gathering and early human vision,11 which is of interest because both are constrained by the same critical limiting factors:

  1. The sampling lattice formed by the angular separation of the photodetector-array apertures (or the photoreceptors in the eye). This lattice, which determines the angular resolution (or visual acuity), depends on the focal length of the objective lens (or pupil) and the separation of the apertures (or photoreceptors). The Fourier transform of the sampling lattice leads to the sampling passband that, analogous to the bandwidth of a communication channel, sets an upper bound on the highest spatial frequency components that the image-gathering device (or eye) can convey.

  2. The spatial response (or its Fourier transform, the spatial frequency response) and the signal- to-noise ratio. Both depend on the aperture area and focal length of the objective lens (or pupil) and on the aperture area and responsitivity of the photodetectors (or photoreceptors). The signal-to-noise ratio and the relationship between the spatial response and sampling lattice - or, equivalently, between the spatial frequency response and sampling passband - determine the sharpness and clarity with which images can be restored (or the scene can be observed) within the fundamental constraint set by the sampling lattice or passband.

  3. The data rate of the transmission link (or of the nerve fibers from the retina to the brain).

When the above factors are accounted for properly, as demonstrated in Ref. 10, then it emerges that the design of the image-gathering device that is optimized for the highest realizable information rate corresponds closely, under appropriate conditions, to the design of the human eye that evolution has optimized for viewing the real world. This convergence of communication theory and evolution toward the same design not only corroborates the validity of the extension of communication theory to the characterization of image systems, but it also demonstrates the robustness of the approximate model of the image-gathering process on which this characterization is based.

Computer simulations are a valuable tool for correlating the predicted performance based on the statistical properties of ensembles of natural scenes with the perceptual and measurable performance based on single realizations (or members) of these ensembles. Moreover, these simulations can illustrate the performance of image systems for features of natural scenes and other targets that communication theory cannot account for directly; and they can relate perceptual degradations in the displayed image to the perturbations that occur in image gathering and signal coding.

The characterizations in Refs. 7 to 10 are constrained, as is commonly done, to a uniform irradiance of the scene and to linear transformations in the image system. Reference 11 extends these characterizations to irradiances that vary across the scene (e.g., due to shadows) and to a nonlinear transformation that seeks to suppress this variation and preserve only the spatial detail and reflectance (or color) of the surface itself. This transformation, which is based on a model of color constancy in human vision, illustrates several aspects of the characterization of image systems that are of general interest. One, this transformation addresses the goal of many currently evolving coding strategies for high data compression, which is to preserve mostly the edges and boundaries in the scene together with their contrast. Two, it stresses the close relationship that must be maintained between the characterization of image systems and the appropriate model of the real world (which, in this case, requires the distinction between the properties of the irradiance and the reflectance of natural scenes). And three, it continues to foster the comparison of image systems with human vision.

Computations and simulations demonstrate that an image system ordinarily can be expected to produce the best possible images at the lowest data rate only when the following two conditions are met: one, that the design of the image gathering and display devices is optimized for the highest information rate from the scene to the observer; and two, that the digital processing algorithm for image restoration accounts properly for the salient properties of the captured radiance field and for the perturbations in the image system. A vitally important trait of the image system that emerges from this optimization is the robustness of its performance. The optimization leads to image systems that not only provide the best possible performance for scenes with some a priori prescribed statistical properties, but that also provide nearly the best possible performance for a wide range of other random scenes and even for targets with periodic detail.

References

  1. C. E. Shannon, Bell Sys. Tech. J. 27: 379-423, and 28: 623-656 (1948); C. E. Shannon and W. Weaver, The Mathematical Theory of Communication. U. Illinois Press, Urbana, 1964.
  2. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series. John Wiley and sons, New York, 1949.
  3. Y. W. Lee, Statistical Theory of Communication. John Wiley and Sons, New York, 1964.
  4. P. B. Fellgett and E. H. Linfoot, Philos. Trans. Roy. Soc. London 247: 369-407 (1955).
  5. E. H. Linfoot, J. Opt. Soc. Am. 45, 808-819 (1955).
  6. C. Mead, Analog VLSI and Neural Systems. (Addison-Wesley Pub. Co.,Reading, 1989).
  7. C. L. Fales and F. O. Huck, Information Sciences 57-58: 245-285 (1991).
  8. F. O. Huck, C. L. Fales and Z. Rahman, Philos. Trans. Roy. Soc. London A354: 2193-2248 (1996).
  9. C. L. Fales, F. O. Huck, R. Alter-Gartenberg and Z. Rahman, Philos. Trans. Roy. Soc. London A354: 2249-2287 (1996).
  10. F. O. Huck, C. L. Fales and Z. Rahman, Visual Communication: An Information Theory Approach (Kluwer Academic Publishers, Boston, 1997).
  11. F. O. Huck, C. L. Fales, R. E. Davis and R. Alter-Gartenberg, Appl. Opt., April 2000.

Home Site map


 
 
Web site curator: Glenn Woodell