Sketch in Perspective


Interface issues

There are 3 main phases with respect to the "interface" look and feel:

In summary, we start with a "simple" 2D drawing mode. Then, we go to the "usual" JOT mode (with some modifications). Finally, we have an hybrid mode combining the 2D perspective cues drawn initially with the 3D drawing mode of JOT. In the following we provide more details for each of these 3 phases or drawing modes.


Phase 1 : 2D drawing (no ground plane)

In phase 1 we will have up to 4 windows:

  1. Main window to process the canvas. Its coordinates range from -1 to 1 in both X (Left --> Right) and Y (Bottom --> Up) directions.
  2. Information window to incorporate specific data (numerical mostly).
  3. Help window to guide the user, provide definitions, examples, etc.
  4. Cartographic visualization window (used only for one set of methods).

Note that in Phase 1, the usual 3D drawing/display capacity of JOT has to be turned OFF.

1. Main Window

The Main window is defined in size by the "Working space" (see below), and by containing, centered, the "canvas". The user will "draw" or select features (lines, points) in this window, only in a 2D fashion.

Canvas: This is the image, or painting, or postcard, or snapshot we work upon. Say the canvas is of size MxM.

Working space (WS): The rectangular area in which we can draw; ideally its extent would be defined by how far the vanishing points turn out to be. Can be "saved" for continuing later a working session, i.e., we save the filename of the image used as canvas, as well as the information drawn upon it (and the other inputs, like numeric data).

Thus, in practice, we need a 2D working space much larger than the canvas.

The working space can be of size 2M x 2M for example - this really depends on how far the vanishing points (VP) are defined with respect to the canvas. However, note that the further away these are, the lesser the precision on the loci of the VP. For now, we will limit ourselves to a fixed size. We could imagine having sliders (horizontal & vertical) to access areas of the working space out of the screen ... in latter versions.

In this mode, we do not want the ground plane to appear and be used (no drawing of 3D objects either).

2. Information window

This second window contains what has been computed, and retrieved through user inputs. It will also be used to display messages.

This window is initially used to load image(s): for the canvas, for a previously exploited WS and possibly for the cartographic window (to load a map). Ideally we need a browser-like pop-up menu.

Once an image is loaded (which defines the canvas and max. WS extent), the user selects a (calibration) method to apply to the canvas. We will have a "bag" of tricks, for each of the following 4 classes of methods:

  1. One VP : frontal perspective .
  2. Two VP : angular perspective .
  3. Three VP : oblique perspective .
  4. Map : calibration through Chasles Theorem .

For each of the above, different scenarios are possible depending on the information most easily perceived in the scene. The user picks the scenario which seems the most appropriate and the machine starts asking for data in a certain sequence. The information about each method and associated scenarios will always be available through the Help window (see below).

Thus, we need a window with a frame to display information (on choices between methods and scenario, on the next piece of information needed, on the actual calibration data computed), a frame to permit the user to make choices (buttons or menus), a frame to input data (numeric) and some associated pop-up menus (at least one browser for loading an image, a map, or a previously saved session).

3. Help window

We will also have a Help window, but this is independent from JOT for now: it will simply be an HTML document loaded via Netscape.

In this first implementation, information displayed on the Help window will not be directly correlated to the Information window; thus the user will have to take care of the sequencing of information (the navigation through the document).

4. Cartographic window

This is a window of size defined by the size of the loaded map (loaded through the Information window above). The user need to be able to select point features and relate these to corresponding feature points in the WS (on the canvas: in this case, the WS is of max size equal to the canvas).

In this case we require JOT to be able to draw conics - going through sets of 4 points taken from the selected features, usually 5 of them. Thus, for each set of 4 points we derive a conic. The equation of the conic we compute. We have 5 possible conics then. What we are interested in are the intersections. One of these is the camera position (in X-Y); it also provides an estimate of the camera attitude. In general, each pair of conics will intersect at 4 different points. But 3 of these correspond to 4 of the features used to draw each conic. The fourth remaining intersection is the grail.

In summary, the user picks 5 points on the map. JOT draws the associated conics. The user then selects the intersection of interest (maybe this can be done automatically ...). Thus we need JOT to be able to draw ellipses, hyperbolas and parabolas (the equations will be provided to JOT from the coordinates of each set of 4 points). These conics are drawn on the map only.

Vanishing Points Retrieval

This represents a major challenge here.

VP are retrieved, in most cases, as the intersection of pencils of rays. In fact this represents the most difficult way of retrieving them: because of the pixel-based array we use for drawing and because of the sensitivity to errors (by the user) in drawing direction lines upon the canvas. These lines are meant to correspond to parallel line in 3D Euclidean space. In projective space, they meet at "infinity", that is on an horizon defining VPs (a visual loci of convergence). Many such lines meeting at a VP forms a pencil (or cone) of directions.

For the moment, the drawing of such pencils of lines will be manually-driven. Thus we need:

To be more precise, an Horizon:

Indeed, when looking at the (Euclidean) 3D space, at most 3 horizons (loci of convergence for parallel lines) are possible. These define the "plane at infinity" (a triangle for 3 VPs, a rectangle otherwise). This plane is always parallel to the image and camera planes (this is the idea of a "back plane" in TIP). We will need:

Other Inputs

Other inputs will be provided by the user to achieve camera calibration. This is done through the "Information window", but may require drawing lines or selecting features (points or pair of points) over the canvas at the same time numerical data is provided via the keyboard.


Phase 2 : Sketching in Perspective

Phase 1 ends when camera parameters have been retrieved. Then we are back in the usual "Sketch" mode. Thus, we hide from the user the perspective cues that were just drawn in Phase 1 (but we keep the traces in memory ... for use in Phase 3). However, with respect to the present JOT interface, we require certain modifications:


Phase 3 : Calibration refinement

After the user has positioned the ground plane and extracted some 3D objects, it may become apparent that the projection is not quite right. This is a sign the the calibration was inaccurate somehow (or that the scene is not quite in linear perspective ...). Thus, the user enters now a third phase where calibration (camera parameters) will be refined interactively by modifying the perspective cues obtained in Phase 1.

Thus, we need to re-display some of the cues obtained in Phase 1, as an overlay in the WS:

The plane at infinity should be semi-transparent (warning: that plane is not the ground plane).

All these elements can still be moved as in Phase 1 (which modifies interactively the camera parameters, which in turns changes the projection of reconstructed 3D primitives ...).

However, the pencils are not made visible here (not useful anymore we think .. but that remains to be demonstrated in practice).

After the user has modified the perspective cues and is satisfied with the obtained matching of 3D objects over the canvas, we can go back to Phase 2 mode and pursue 3D reconstruction.

It might prove useful to go back and forth between Phase 2 and 3 modes of interaction more than once, as new 3D primitives are built.