Dec 14, 1998
Hans P. Moravec
Robotics Institute, Carnegie-Mellon University
January 1981
This is the Scribe source for the paper (edited; missing figures & bibliography).
"3D Graphics and the Wave Theory" by Hans Moravec, presented at the 1981 Siggraph conference, Dallas, TX, August 1981, published in "Computer Graphics, v15#3 August 1981, pp. 289-296.
A continuing trend in computer representation of 3D synthetic scenes is the ever more accurate modelling of complex illumination effects. Such effects provide cues necessary for a convincing illusion of reality. The best current methods simulate multiple specular reflections and refractions, but handle at most one scattering bounce per light ray. They cannot accurately simulate diffuse light sources, nor indirect lighting via scattering media, without prohibitive increases in the already very large computing costs.
Conventional methods depend implicitly on a particle model; light propagates in straight and conceptually infinitely thin rays. This paper argues that a wave model has important computational advantages for the complex situations. In this approach, light is represented by wave fronts which are stored as 2D arrays of complex numbers.
The propagation of such a front can be simulated by a linear transformation. Several advantages accrue. Propagations in a direction orthogonal to the plane of a front are convolutions which can be done by FFT in O(n log n) time rather than the n² time for a similar operation using rays. The wavelength of the illumination sets a resolution limit which prevents unnecessary computation of elements smaller than will be visible. The generated wavefronts contain multiplicities of views of the scene, which can be individually extracted by passing them through different simulated lenses. Lastly the wavefront calculations are ideally suited for implementation on available array processors, which provide more cost effective calculation for this task than general purpose computers.
The wave method eliminates the aliasing problem; the wavefronts are inherently spatially filtered, but substitutes diffraction effects and depth of focus limitations in its stead.
The realism in computer synthesized images of 3D scenes has gradually increased from engineering drawings of simple shapes to representations of complex situations containing textured specular reflectors and shadows. Until recently the light models have been to first order; the effect of only one bounce of light from the light sources to the eye via a surface was simulated. The stakes have recently gone up again with the introduction of ray tracing methods that simulate multiple non-dispersive refractions and specular reflections ([Kay79], [Whitted80]).
With each increment in the faithfulness of the process new properties of real light had to be added. Thus far the evolution has been in the direction of the particle theory of light, probably because straight rays preserve most of the properties of 3D computer graphics' engineering drawing roots.
An obvious and desirable next step in the process is the simulation of scattering, as well as specular, reflection of light from surface to surface. That such effects are important is shown graphically by the subjective impressions of increasing realism as the models become more complex (see references in [Whitted80]) and objectively in visual psychology experiments such as those by Gilchrist [Gilchrist79], in which subjects were able to intuitively separate the effects of surface albedo and light source brightness by noting the relative intensity of scattered illumination.
While a specular/refractive bounce can convert a ray from a simulated camera or lightsource into two rays, a scattering can make hundreds or thousands, heading off in a multitude of directions. Since pictures modelling only specular bounces themselves take about an hour of computer time to generate, simulation of multiple scattering (and incidentally, extended light sources) by extension of conventional methods is out of reach for the immediate future.
One solution may be a different approach in simulating the physical reality. The ray representation implicitly assumes an effectively infinite precision in the position of objects and the heading of rays. This leads to some unnecessary calculations, as when several rays bouncing from nearly the same surface point in nearly the same direction (perhaps originating from different light sources) are propagated by separate calculations.
We can evade these and other inefficiencies by using the particle theory's major competitor, the wave theory of light. Rather than multitudes of rays bouncing from light sources and among objects, the illumination is done by complex wavefronts described over large areas. Such wavefronts can be represented by arrays of complex numbers, giving the amplitude and relative phase of portions of the front, embedded in the simulated space. Lesem et al. used this approach [Lesem68] to make physical holograms (though of disappointingly low quality).
Note that a plane array of complex wave coefficients is very like a window into a scene. The wavefront, fully described by the coefficients, can represent multiple point and extended sources of light at various distances behind (or even in front of) the window.
Figure 1 is a simple example, a slice through a window generating light that seems to come from a point source very far away in the lower left direction. A picture of the scene behind the window can be formed by passing the wavefront through a simulated lens onto a simulated screen. Moving the lens center changes the apparent point of view. The size of the lens and its distance from the scene determine the depth of focus of the image (Finite depth of focus is one of the "features" of this method. The wave representation could be used to do the lighting for a scene later imaged by a ray method if infinite depth of focus is desired). The maximum number of resolvable pixels in the image can be no greater than the number of wavefront coefficients - no more information is present and any attempt to get more resolution in the image than in the wavefront will result in blurring. Also, no object feature smaller than a wavelength of the simulated light can be represented. Thus the wavelength must not be too large. It should also not be smaller than necessary, since the computational cost goes up dramatically as the wavelength drops.
Here's one way to use the light wave idea. It deviates from a true physical simulation in the interests of simplicity and computational efficiency. Goodman [Goodman68] presents the fundamental mathematics. Feynman [Feynman63] is a good source for aspects of real light I've maligned.
We will use light of a single wavelength, and we assume that all optical effects are linear. The light is propagated according to scalar diffraction theory, which behaves realistically except for very short paths. The wavelength must be no larger than the smallest object feature we wish to see, but should not be too small because the amount of computation increases dramatically as the wavelength drops.
To faithfully represent a wavefront, there must be a coefficient every half wavelength across the width of a description, so the number of complex numbers needed in a description increases as the square of the inverse wavelength. We also assume that only the steady state of the illumination is important. This allows us to play cavalierly with time. For instance, a wavefront may be propagated from one plane description to a parallel one some distance away in a single step, even though different portions of the effect have different transit times.
The scene is represented as a box of half wavelength on a side cells, roughly 1000 cells cubed. The content of each cell is characterized by 2 complex coefficients, one giving the absorption and phase shift (simulating refractive index) for transmitted light, the other giving the reflection attenuation and phase shift (simulating surface scattering). Some cells may also contain a term to be unconditionally added during a wavefront calculation, making them light sources. This 3D array need not be explicitly stored, but 1000² cell slices perpendicular to at least one of the major axes should be efficiently extractable at will. One suitable storage scheme is the recursively subdivided "oct-tree" representation [Reddy78], which permits large uniform volumes to be kept as single nodes of a tree.
The pictures in this paper are of scenes contrived to make them computationally cheap. Each consists of a number of very thin objects confined to one of a small number of parallel planes. The wavefront calculations are done in a direction parallel to these object planes. The lenses and in the scenes are actually thin regions of slowly varying transmissive (a bit like Fresnel optics).
The light sources in the the first object plane (plane @b(a)) generate a wavefront which we will call @b(WF)@-(a). @b(WF)@-(a)(i,j) where i and j are integers between 0 and 511, represents one complex coefficient in a half wavelength grid across the area of plane @b(a).
The effect of @b(WF)@-(a) on @b(b), the second plane, (call it @b(WF)@-(ab)) is the sum of the effects of every @b(WF)@-(a)(i,j) on every coefficient @b(WF)@-(ab)(k,l). Each such contribution involves noting the distance between the @b(WF)@-(a)(i,j) and the @b(WF)@-(ab)(k,l) cells and adjusting the phase (because of the time delay introduced by travelling over such a distance) and the amplitude (because of the inverse square attenuation or equivalently because of the amount of solid angle cell @b(WF)@-(ab)(k,l) occupies when seen from @b(WF)@-(a)(i,j)). This operation is equivalent to a multiplication of the complex valued wavefront coefficient by a complex valued propagation coefficient to obtain a complex valued contribution to the new wavefront. Doing this process straightforwardly means multiplying each of our 1024@+(2) wavefront coefficients by 1024@+(2) propagation coefficients and summing the products appropriately into 1024@+(2) resultant waveform coefficients. This is over a trillion complex multiplications and additions, and too expensive for present day conventional hardware. Fortunately there is a cheaper way to achieve the same result. Note that for a given k and l the distance from @b(WF)@-(a)(i,j) to @b(WF)@-(ab)(i+k,j+l), and thus the propagation coefficient, is the same for all values of i and j. Call this coefficient @b(P)@-(a,b)(k,l). We can now express the resultant wavefront by
@b(WF)@-(ab)(m,n) = Sum@-(i,j) @b(P)@-(ab)(m-i,n-j) @b(WF)@-(a)(i,j)
This is the convolution of @b(WF)@-(a) with @b(P)@-(ab), and will be denoted by @b(WF)@-(a)@b(*)@b(P)@-(ab). Such a convolution can be accomplished by taking the Fourier transforms, multiplying them term by term then taking the inverse Fourier transform of the result:
@b(WF)@-(ab) = @b(WF)@-(a) @b(*) @b(P)@-(ab) =
@b(F)@+(-1)(@b(F)(@b(WF)@-(a)) @b(.) @b(F)(@b(P)@-(ab)))
where @b(F) denotes Fourier transform and @b(.) denotes the termwise product of two arrays of like dimension to obtain a third. Since @b(P)@-(ab) is an invariant of the scene, @b(F)(@b(P)@-(ab)) can be calculated once and for all when the scene is created. The symmetries
@b(P)@-(ab)(i,j) = @b(P)@-(ab)(j,i) = @b(P)@-(ab)(-i,j) = @b(P)@-(ab)(j,-i)
can help cut down that cost a little.
The Fast Fourier Transform (FFT) is a method of arranging the calculations so that only about @i(n)@ log@-(2)@i(n) multiplications and a similar number of additions are required. A two dimensional transform like we require can be accomplished by taking a one dimensional FFT of each row, replacing each row by its transform, then doing one dimensional FFT's of the columns of the result.
In our case the one dimensional transforms have 1024 points, and require about 10,000 complex multiplies each. There are 2x1024 of them to be done per 2D transform, for a total of about twenty million multiplies. The inverse transform is computationally equivalent to the forward one, so the FFT approach requires 60 million multiplies, compared to the million million of the direct method. If the F(P) is assumed to be precomputed the cost drops to 40 million.
The FFT approach is thus about 20,000 times faster than the direct convolution method. If the naive ray tracing schemes were applied to the same scene they would work essentially by the direct method, though probably in a less orderly fashion. No organization analogous to the FFT method for wavefronts has been offered for the ray schemes, so this 20,000 times speedup is a major advantage for the wave approach.
Getting back to our scene, we have now calculated @b(WF)@-(ab), the wavefront impinging on plane @b(b). The transmission coefficients of plane @b(b) (@b(T)@-(b)) which tell of the attenuation and refraction caused by the substances at @b(b) are multiplied term by term with the incoming wave front (a mere million complex multiplies) to generate a modified front, to which is added the effect of any light sources at @b(b) (@b(L)@-(b)) to give us @b(WF)@-(b), the wavefront leaving plane @b(b).
@b(WF)@-(b) = @b(WF)@-(ab)@b(.)@b(T)@-(b)+@b(L)@-(b) =
(@b(WF)@-(a)@b(*)@b(P)@-(ab))@b(.)@b(T)@-(b)+@b(L)@-(b)
The reflection coefficients of @b(b) (@b(R)@-(b)) are also multiplied by @b(WF)@-(ab) to obtain the wave reflected back towards @b(a) from @b(b). We call it a waveback, and designate it @b(WB)@-(a). @b(WB)@-(a) is stored for future reference.
We are now ready to carry the wavefront from @b(WF)@-(b) to and through the next plane @b(c) in exactly the same manner in which we moved it from @b(a) to @b(b):
@b(WF)@-(c) = (@b(WF)@-(b)@b(*)@b(P)@-(bc))@b(.)@b(T)@-(c)+@b(L)@-(c)
we also save for future reference
@b(WB)@-(b) = (@b(WF)@-(b)@b(*)@b(P)@-(bc))@b(.)@b(R)@-(c)
(doing the convolution just once for the two results, of course).
Each such move costs about 50 million complex multiplies (= 200 million real multiplications) and can be done in about five compute minutes on a typical medium sized machine like our VAX 11/780. With a 15 megaflop array processor like the CSPI MAP-300 the time is less than 15 seconds.
After a complete sweep across the scene (Figure 2) we have all the first order lighting effects if the light sources were all in plane @b(a), and some higher order ones, as when light is scattered by @b(b) then @b(c) to illuminate @b(d). We now sweep backwards, from the far end of the scene towards @b(a). Each step is similar to the forward sweep except that at each plane the waveback left behind in the first pass is added to the outgoing wavefront, like the light sources at that plane, simulating the contribution of reflected light from the last pass. For example, the return sweep modifies the wavefront leaving plane @b(c) into the one leaving @b(b) by the transformation
@b(WF)@-(b) =
(@b(WF)@-(c)@b(*)@b(P)@-(ab))@b(.)@b(T)@-(b) + @b(L)@-(b) + @b(WB)@-(a)
in the same step @b(WB)@-(c), the effect of light from the return sweep reflected from @b(b) to @b(c) is stored
@b(WB)@-(c) = (@b(WF)@-(c)@b(*)@b(P)@-(ab))@b(.)@b(R)@-(b)
After several back and forth passes, each involving @b(WB)'s from the previous pass, the wavefront contains the effects of highly indirect light. The iteration can simply be stopped after a fixed number of sweeps, or the wavefront can be checked for significant changes between forward/back cycles. If @b(WF)@-(a) is nearly the same before and after two sweeps (by a sum of squares of coefficient differences, for instance), the process has settled down. Ten passes is usually enough, though pathological cases like scenes containing two facing mirrors may require more.
Images are generated from the ultimate wavefront in a final step by passing the wavefront through a new plane which contains a simulated lens that projects the image onto a virtual surface where magnitude (absolute value) of the complex wavefront coefficients are extracted, scaled and transcribed into a pixel array.
Color pictures are made of three such images, one each for red, green and blue, with different reflection, transmission and emission properties of the things in the object planes. The simulated wavelength is the same in all cases, so the simulated color has nothing whatever to do with the physiological color property of real light, which depends on wavelength
Figure 4 contains two planes, and was subjected to eight wavefront passes (four each way). The first plane contains a coherent light source which projects a beam onto the scattering "lampshade" in the second plane. The scattered light reflects back to the first plane and illuminates the "checkerboard" there, and so on for eight bounces. A thin lens in the second plane distorts some of the checkerboard behind it. The two planes are 300 wavelengths apart, and 512 half-wavelengths on a side. Light from the scene was intercepted by a square lens of the same size 10,000 wavelengths away, which formed the displayed image yet another 10,000 wavelengths further on.
The illumination is monochromatic, but color is used in one version of the image to represent the relative phase of the incident light. The red, green and blue components of the color are varied with phase like the three components of three phase power; red varies with the sine of the phase angle, green with the angle shifted 120 degrees, blue by phase shifted 240 degrees.
We don't yet have an array processor, so the compute time was about an hour. With an FPS-100 it could have been generated in 8 minutes. CSPI's MAP-300, with twice as many multipliers, could have done it in 4. It should also be noted that the calculations are highly suitable for division among parallel processors (most of the cost is in the 1024 point complex FFT's, and 1024 of these, one for each row or column, could be carried out in parallel). Since the scenes contain extended light sources and model multiple indirect light scattering, all conventional ray methods would have required at least thousands of compute hours to produce the same result.
Since the illumination is by simulated monochromatic light, potentially all in phase, there is a tremendous opportunity for probably undesirable effects such as interference fringes and "laser speckle" to make themselves manifest. These are made worse by the maximally long wavelength of the light chosen to reduce the amount of computation to the bare minimum required for the image and object resolution desired.
If the scene is lit mainly by light sources extending over many half-wavelength cells, most of the problem can be eliminated by randomizing the phase of the light emitted by the individual cells. This effectively simulates a spatially non-coherent source, and blurs interference patterns into oblivion. If the light comes mostly from a moderate number of very localized regions, the picture can be recomputed a few times with different relative phases between the sources, and the results averaged. This simulates phase non-coherence between the different sources.
If the scene is lit by a very small number of very localized sources simple randomization of light source phases won't help. If much of the actual illumination in those scenes arrives indirectly by scattering of the light from objects of high albedo, randomizing the reflective and transmissive phase shifts of the scattering objects will improve things a little.
In general it also helps if objects have "soft" edges which blend gradually over the distance of a few cells from the properties of their surrounding to the optical properties of their interiors.
The above techniques remove low frequency interference fringes but do not significantly affect the high frequency phase interferences analogous to the "laser speckle" seen when viewing objects in coherent light. This pixel graininess can be smoothed by computing the image to higher resolution than needed and averaging, or by recomputing it several times with different random phases in the extended light sources or the light scattering surfaces, and averaging the results.
The problems are highly analogous to the "aliasing" problem encountered in conventional 3D methods.
So far we have dealt with very simple scenes in which all the matter was confined to a small number of parallel planes. Not co-incidentally these are particularly suitable subjects for the wave approach.
More general scenes can be constructed by increasing the number of matter-containing planes. In the most general case there is such a plane every half wavelength, and fully three dimensional objects are described as a stack of parallel slices.
This increases the required computation by a factor of several hundred, and also creates a major storage problem. A special purpose Fourier device able to compute a 1024 point transform in 100 usec would allow fully general scenes with complete illumination to be calculated in about an hour by the straightforward application of the above method. It would be necessary to store about one billion complex numbers, representing the approximately 1000 wavebacks generated, temporarily during the computation.
Storage for a billion complex numbers (the precision need not be great) may not be excessive in a machine than can compute at 500 megaflops, but it doesn't make the problem easier. It may be possible to avoid explicit storage of the wavebacks by combining them as they are generated, dragging them along backwards with the wavefronts in a time-reversed way. The composite waveback would be subjected to transformations inverse to the ones it will experience when the sweep reverses. Or maybe not. All versions of this idea I've examined have some unavoidable high order light interactions that do not have analogs in the physical world.
The above schemes, particularly the fully general one, have an unpleasant feature. They do as much work on the empty spaces in a matter-containing plane as they do on the substance. Propagating a wave through a dense stack of matter planes involves hundreds of half-wave baby steps, even if the planes contain only a speck of matter each.
We have seen that a wavefront can be transmitted across an arbitrarily large gap of empty space by a single FFT convolution. In fact, it is just as easy to send it through an arbitrarily large mass of any substance of uniform attenuation and phase shift; only the propagation coefficients change. Since typical scenes contain large homogeneous volumes, we may be wasting some effort; computing over volumes when only the surface areas require the full power of the incremental method.
The fundamental space underlying our 3D models is a cubic array of half-wavelength cells. We can organize this into convenient large regions by the following process. Suppose our scene is contained in a cube 1024 half wavelengths on a side. If this cube is homogeneous, stop; we have our simple description. If it is not, divide it into eight 512 on a side subcubes, and represent it by an eight way branching node of a tree pointing to the results of the same procedure applied recursively to the subcubes. The division process at every level stops if the subcube it is handed if homogeneous, or if it is a primitive half-wavelength cell (in which case it is homogeneous by definition).
A wavefront can be propagated through such an "oct-tree" structure irregularly, advancing through large undivided subcubes quickly, creeping through the hopefully few heavily subdivided regions. There is a serious complication.
While the wavefronts in the "stack of planes" approach of the last section were sent between parallel planes, with light leaving at the sides assumed lost (or wrapped around, depending on the details of the convolution), waves leaving at right angles to the principal wavefront direction in a subcube of the oct-tree must be accounted for, because it affects adjacent subcubes. Part of the problem is a matter of bookkeeping, solved by having two wavefront storage arrays at every boundary surface between distinct subcubes, one for each direction of light travel (an alternative way of looking at it is that each distinct subcube of the tree has one outgoing wave coefficient array on each of its six faces. Incoming waves are stored in the outgoing wave arrays of adjacent subcubes).
The other half of the problem is calculating the outgoing wavefront on a surface orthogonal to the plane of the incoming wavefront description. A wave entering one face of a cube leaves by one parallel face and four at right angles. While the parallel face calculation is a simple convolution, the orthogonal ones are not so easy.
Whereas the propagation coefficients between two co-ordinates in an incoming and outgoing waveform in a parallel propagation are a function simply of displacement in X and Y, of the form @b(F)(@i(x@-(out)-x@-(in),y@-(out)-y@-(in))), the coefficients in an orthogonal transfer look like @b(F)(@i(x@-(out)@+(2)+x@-(in)@+(2),y@-(out)-y@-(in))), reflecting that the rows (say) of the outgoing array are parallel to the rows the incoming array, but the columns in this 3D situation are at right angles. In this example the Y direction can be done by an FFT convolution, but the transformation over the X coordinate must be done directly (unless I've missed something). For a face size of 1024 by 1024 we must first do 1024 FFT convolutions for the Y direction (30 million multiplies) giving us 1024 vectors each with 1024 elements. These vectors must then each be multiplied by 1024 coefficients and summed into the output array, for a total of over a thousand million multiplications, a number which dominates the FFT step. A cube has four such difficult faces, but the calculations can be shared between parallel pairs of outgoing faces and a 1024 on a side cube requires a total of about two billion multiplications.
A cube of side @i(n) needs about 2@i(n)@+(3) multiplies. Since the amount of work thus goes up approximately linearly in the volume, we gain little by subdivision, and the method is of limited interest unless a faster way of calculating the sidegoing wavefronts is found.
Note that in the stack of planes approach, the wavefront propagation consists of a chain of linear transformations; a waveform is alternately convolved with a propagation array then multiplied by an array of transmission coefficients. Both these operations are special cases of a general linear transform which can be expressed as a huge array. If we treat our 1024 by 1024 wavefront arrays as a 1024² (call it a million) element vector (name it @b(V)), then a general linear transform is a million by million matrix (@b(A)) to multiply this vector by, plus a million element vector offset (@b(B)) to be added (@b(V'=AV+B)). A convolution is simply a special case of such a matrix @b(A) where every row is the same as the previous one except shifted right one place. The transmission coefficients can be represented in a matrix zero everywhere except on the main diagonal where the coefficients reside. Light sources are simply represented in the constant vector @b(B).
Now a chain of linear operations can be condensed into a single one by multiplying the matrices and transforming and adding the vectors. This implies that a complex scene consisting of a stack of planes (or otherwise described) can in principle be multiplied out into a single matrix and offset. The effect on a wavefront of a scene so expressed can be determined by a single matrix multiply.
The matrix may contain the effect of multiple passes through the scene, and thus provide the effect of high order light indirection in a single multiply. Multiplying a complex wavefront (representing external illumination) by such a matrix produces an image-bearing front which shows the scene encoded in the matrix freshly illuminated.
Although manipulation of such matrices, with a trillion elements, is beyond the means of present day machinery, they may be practical someday when computers become another factor of one million more powerful. For 1024 on a side planes they become cheaper than the explicit scene representation when the stack of planes is more than a million deep. More importantly, the matrix representation is far more general than the matter plane approach, and can represent any linear effect.
A new approach to generation of shaded 3D computer imagery has been presented. It models light as wavefronts rather than rays, and has major computational advantages when representation of extended light sources and multiply scattered light is desired. Even so, the simplest applications of the method tax the resources of present day computers. The saving grace is that conventional ray techniques don't even come close.
Some half baked future applications were also presented, useful only when several times more computation power becomes available. They barely scratch the surface.
Page created & maintained by Frederic Leymarie,
1998.
Comments, suggestions, etc., mail to: leymarie@lems.brown.edu