Sketch in Perspective

Interface issues

There are 3 main phases with respect to the "interface" look and feel:

Phase 1 : 2D Drawing mode for initial Camera calibration
The goal here is to retrieve camera parameters via perspective (or projective geometry) cues.
During this phase we are in a 2D drawing mode only: thus, we require to turn OFF some of JOT functions. Essentially, we draw "over" the loaded image some lines, points (vanishing points: intersection of lines), and a plane ("at infinity", defined by specific lines: the horizons - up to three). The latter "plane" is not to be mistaken with the "ground plane"; in fact, it is rather similar to the "back plane" in TIP.
Note: we need to be able to draw outside the range of the image as well; thus the drawing window (or main JOT window) needs to be larger than the loaded image (more details below).
Phase 2 : 3D Drawing mode for Object/Scene reconstruction
This is essentially equivalent to the actual available JOT 3D drawing mode with, however, some adjustments like transparency and use of wire-frame visualisation. Camera parameters are now provided from the work done in phase 1. Thus the perspective cues drawn in the previous phase are kept hidden here (but will be used again in Phase 3).
Note: rather than opaque or photo-textured 3D primitives being drawn by the user (with respect to the ground plane), we need semi-transparent or more simply wire-frame representation of 3D primitives. Same thing for the ground plane itself (e.g. to be displayed as a grid).
Phase 3 : 3D Drawing mode for Calibration refinement
Interactive refinements on calibration will be needed in practice. Thus, once the user has reconstructed some 3D primitives in Phase 2, it might be needed to re-adjust calibration as these primitives may not match exactly (enough) the contours seen in the image. In this phase we (re-)display the perspective cues first drawn in Phase 1, as an overlay.
Note that the user will be allowed to interactively modify the positions of these cues as in Phase 1. This in turn permits to recompute camera parameters and adjust the "perspectiveness" of the scene so that the 3D primitives better match the image (this is the main reason why we need semi-transparency or wire-frame like display of the 3D objects).

In summary, we start with a "simple" 2D drawing mode. Then, we go to the "usual" JOT mode (with some modifications). Finally, we have an hybrid mode combining the 2D perspective cues drawn initially with the 3D drawing mode of JOT. In the following we provide more details for each of these 3 phases or drawing modes.

Phase 1 : 2D drawing (no ground plane)

In phase 1 we will have up to 4 windows:

Main window to process the canvas. Its coordinates range from -1 to 1 in both X (Left --> Right) and Y (Bottom --> Up) directions.
Information window to incorporate specific data (numerical mostly).
Help window to guide the user, provide definitions, examples, etc.
Cartographic visualization window (used only for one set of methods).

Note that in Phase 1, the usual 3D drawing/display capacity of JOT has to be turned OFF.

1. Main Window

The Main window is defined in size by the "Working space" (see below), and by containing, centered, the "canvas". The user will "draw" or select features (lines, points) in this window, only in a 2D fashion.

Canvas: This is the image, or painting, or postcard, or snapshot we work upon. Say the canvas is of size MxM.

Working space (WS): The rectangular area in which we can draw; ideally its extent would be defined by how far the vanishing points turn out to be. Can be "saved" for continuing later a working session, i.e., we save the filename of the image used as canvas, as well as the information drawn upon it (and the other inputs, like numeric data).

Thus, in practice, we need a 2D working space much larger than the canvas.

The working space can be of size 2M x 2M for example - this really depends on how far the vanishing points (VP) are defined with respect to the canvas. However, note that the further away these are, the lesser the precision on the loci of the VP. For now, we will limit ourselves to a fixed size. We could imagine having sliders (horizontal & vertical) to access areas of the working space out of the screen ... in latter versions.

In this mode, we do not want the ground plane to appear and be used (no drawing of 3D objects either).

2. Information window

This second window contains what has been computed, and retrieved through user inputs. It will also be used to display messages.

This window is initially used to load image(s): for the canvas, for a previously exploited WS and possibly for the cartographic window (to load a map). Ideally we need a browser-like pop-up menu.

Once an image is loaded (which defines the canvas and max. WS extent), the user selects a (calibration) method to apply to the canvas. We will have a "bag" of tricks, for each of the following 4 classes of methods:

One VP : frontal perspective .
Two VP : angular perspective .
Three VP : oblique perspective .
Map : calibration through Chasles Theorem .

For each of the above, different scenarios are possible depending on the information most easily perceived in the scene. The user picks the scenario which seems the most appropriate and the machine starts asking for data in a certain sequence. The information about each method and associated scenarios will always be available through the Help window (see below).

Thus, we need a window with a frame to display information (on choices between methods and scenario, on the next piece of information needed, on the actual calibration data computed), a frame to permit the user to make choices (buttons or menus), a frame to input data (numeric) and some associated pop-up menus (at least one browser for loading an image, a map, or a previously saved session).

3. Help window

We will also have a Help window, but this is independent from JOT for now: it will simply be an HTML document loaded via Netscape.

In this first implementation, information displayed on the Help window will not be directly correlated to the Information window; thus the user will have to take care of the sequencing of information (the navigation through the document).

4. Cartographic window

This is a window of size defined by the size of the loaded map (loaded through the Information window above). The user need to be able to select point features and relate these to corresponding feature points in the WS (on the canvas: in this case, the WS is of max size equal to the canvas).

In this case we require JOT to be able to draw conics - going through sets of 4 points taken from the selected features, usually 5 of them. Thus, for each set of 4 points we derive a conic. The equation of the conic we compute. We have 5 possible conics then. What we are interested in are the intersections. One of these is the camera position (in X-Y); it also provides an estimate of the camera attitude. In general, each pair of conics will intersect at 4 different points. But 3 of these correspond to 4 of the features used to draw each conic. The fourth remaining intersection is the grail.

In summary, the user picks 5 points on the map. JOT draws the associated conics. The user then selects the intersection of interest (maybe this can be done automatically ...). Thus we need JOT to be able to draw ellipses, hyperbolas and parabolas (the equations will be provided to JOT from the coordinates of each set of 4 points). These conics are drawn on the map only.

Vanishing Points Retrieval

This represents a major challenge here.

VP are retrieved, in most cases, as the intersection of pencils of rays. In fact this represents the most difficult way of retrieving them: because of the pixel-based array we use for drawing and because of the sensitivity to errors (by the user) in drawing direction lines upon the canvas. These lines are meant to correspond to parallel line in 3D Euclidean space. In projective space, they meet at "infinity", that is on an horizon defining VPs (a visual loci of convergence). Many such lines meeting at a VP forms a pencil (or cone) of directions.

For the moment, the drawing of such pencils of lines will be manually-driven. Thus we need:

to be able to draw lines, and move (in 2D) each of them;
to be able to define a VP for a set of line which intersects or come close to each other;
such a VP then snaps the lines together at the point where it is defined;
a default horizon line should then be displayed (aligned with the horizontal axis of the WS); this horizon can be roamed around by the user (in 2D), with associated VP(s) staying attached;
a VP can then be moved around, the pencil of lines staying attached at their other end

To be more precise, an Horizon:

is a special line to which many other VPs may be attached;
can be moved, but then all attached VPs move as well as the associated pencils;
there are either 3 (3VP case) or 1 (2VP or 1VP cases) such horizon(s) in any perspective view one may get when drawing or looking at a 3D Euclidean world.

Indeed, when looking at the (Euclidean) 3D space, at most 3 horizons (loci of convergence for parallel lines) are possible. These define the "plane at infinity" (a triangle for 3 VPs, a rectangle otherwise). This plane is always parallel to the image and camera planes (this is the idea of a "back plane" in TIP). We will need:

To be able to move the plane at infinity by moving one of the 3 VPs at a time: this moves 2 horizons and all associated intermediate VPs and pencils.
In the case of 2 finite VPs, be able to move one of the VP only.
For 1 finite VP, be able to move this VP (the "central" VP) along the horizon (sliding), or move the horizon itself around in 2D, like a "rod".

Other Inputs

Other inputs will be provided by the user to achieve camera calibration. This is done through the "Information window", but may require drawing lines or selecting features (points or pair of points) over the canvas at the same time numerical data is provided via the keyboard.

Phase 2 : Sketching in Perspective

Phase 1 ends when camera parameters have been retrieved. Then we are back in the usual "Sketch" mode. Thus, we hide from the user the perspective cues that were just drawn in Phase 1 (but we keep the traces in memory ... for use in Phase 3). However, with respect to the present JOT interface, we require certain modifications:

The ground plane should be displayed as a non-opaque grid (the foreshortening is now provided through the retrieved camera parameters).
This ground plane can be moved around, but initially shall be aligned with one horizon.
The ground plane, which is the reference to "build" 3D primitives, can be rotated and moved. It can also be resized.
Object primitive should be drawn in either:
- opaque photo-textured (already provided)
- wire-frame (transparent)

Phase 3 : Calibration refinement

After the user has positioned the ground plane and extracted some 3D objects, it may become apparent that the projection is not quite right. This is a sign the the calibration was inaccurate somehow (or that the scene is not quite in linear perspective ...). Thus, the user enters now a third phase where calibration (camera parameters) will be refined interactively by modifying the perspective cues obtained in Phase 1.

Thus, we need to re-display some of the cues obtained in Phase 1, as an overlay in the WS:

Make the VPs visible, the plane at infinity and the horizon(s).

The plane at infinity should be semi-transparent (warning: that plane is not the ground plane).

All these elements can still be moved as in Phase 1 (which modifies interactively the camera parameters, which in turns changes the projection of reconstructed 3D primitives ...).

However, the pencils are not made visible here (not useful anymore we think .. but that remains to be demonstrated in practice).

After the user has modified the perspective cues and is satisfied with the obtained matching of 3D objects over the canvas, we can go back to Phase 2 mode and pursue 3D reconstruction.

It might prove useful to go back and forth between Phase 2 and 3 modes of interaction more than once, as new 3D primitives are built.