As described in the introduction, the bulk of this thesis addresses the issue of cloth motion capture. In this chapter, some of the details of the first stage of the cloth motion capture system are discussed, covering the construction of a disparity map from input multibaseline stereo images.
The output of this stage of our system is three images of equal size: a rectified greyscale camera image of the cloth and backdrop; a mask to distinguish the cloth from the backdrop; and a disparity map, from which the depth at every pixel can be inferred. The greyscale image and disparity map can be generated with a standard stereo vision system, and the mask can be easily defined using background subtraction.
Rectified images and epipolar geometry are a wellunderstood subject in computer vision. Given a suitable system for camera calibration, it is easy to produce rectified images [84]. Stereo correspondence algorithms take two (or sometimes more) rectified greyscale images as input, and produce a disparity map d(x,y) for each pixel in one of the input images, typically stored as a greyscale image. Figure 4.1 demonstrates this process, although the disparity map shown here is somewhat idealised. The term disparity was originally used to describe the 2D vector between the positions of corresponding features seen by the left and right eyes. It is inversely proportional to depth, and it is possible to define a mapping from an (x,y,d) triple to a threedimensional position.

There are a wide range of stereo correspondence algorithms. We refer the reader to the excellent survey by Scharstein and Szeliski [97,98,99] for a taxonomy of the available techniques. We used the Sum of Absolute Differences (SAD) correlation method to reconstruct disparity maps. This is a very simple approach with a number of major artefacts, and in the remainder of this chapter we discuss our solutions to the shortcomings of SAD correlation. However, it should be noted that a more sophisticated stereo correspondence algorithm (such as the graph cuts approach) might be a more suitable solution, and could be easily substituted.
The SAD correlation method yields three major types of artefacts. First, in some regions disparities are uncertain, and are left as "holes'' in the disparity map, as demonstrated in Figure 4.2(a). Uncertainty can occur for a variety of reasons, including insufficient texture, depth discontinuities or noisy images.
Second, most disparity maps are only computed to integer precision, i.e. . When these disparities are inverted to obtain depth, the resulting depthmap is visibly quantised, yielding a very jagged surface. Some algorithms attempt to calculate a fractional part for each disparity using subpixel estimation, but such techniques are still tentative and can produce incorrect results [100]. An example of the errors corrected by subpixel estimation is shown in Figure 4.2(b).
Third, windowbased stereo correspondence algorithms often exhibit a "foreground fattening'' effect near depth discontinuities between two objects. When this happens, samples from the far object are mistakenly measured as having the same disparity as samples on the near object, as demonstrated in Figure 4.3.
The stereo system we used is prone to all three of these problems. We have developed a technique that smoothly fills holes and finds a fractional part to each disparity to create a smoother surface, but we have no way to solve the problem of foreground fattening. The fractional part is not measured from the input images, as in the traditional subpixel estimation algorithms used by the vision community, but is instead smoothly interpolated from the measured integer disparities.
We do not claim that our solution is a novel contribution to the field; we merely document our approach in the interest of thoroughness. Our solution is adapted to the particular needs of the stereo system we used, but it is likely possible to find an existing stereo system that exhibits none of these problems. In the future, we expect that standard stereo systems will solve these problems, and output from a stereo system can be used directly without the modifications described here.
We have the option of operating on either the twodimensional disparity map, or the corresponding threedimension surface. Given the structured nature of our input data, we choose to operate directly on the disparity map in a twodimensional manner for the sake of efficiency.
Both holefilling and subpixel estimation can be formulated as image interpolation problems. Image interpolation is an imagebased technique that involves filling holes in an image with plausible data. The hole is not necessarily filled with smooth data, but may sometimes involve extending discontinuities at the hole edge into the hole. It has been well studied by researchers such as Bertalmio et al. [10], Caselles et al. [26] and Pérez et al. [86], and is also studied as part of the larger problem of image inpainting. The general approach involves fixing the boundary of the hole, and then solving a boundaryvalue partial differential equation (PDE) to interpolate the interior of the hole. This is equivalent to applying a diffusion process, such as isotropic diffusion or one of the many forms of anisotropic diffusion. For more details on isotropic and anisotropic diffusion, refer to Black et al. [12]. For a detailed description of the relation between diffusion and partial differential equations, see Sapiro's excellent book [96].
Caselles et al. also considered a different problem. They started with a quantised image, i.e. only a limited set of intensity values, such as multiples of 30. From this, they wished to produce a smooth image with integer intensity values using an image interpolation algorithm. This would also produce a satisfactory solution to our subpixel estimation problem; the disparity map can be viewed as a quantised image, and smooth interpolation of the data would be a satisfactory solution.
Caselles' approach can be explained using a onedimensional example, as shown in Figure 4.4. Suppose that the image I(x) is quantised to image I_{q}(x) by rounding image intensities to the nearest integer multiple of δ. We aim to interpolate I_{q}(x) to form a smooth interpolated image I_{i}(x). Consider two adjacent points, x_{0} and x_{1} at a step edge, . Clearly, the intensity of the unquantised image crossed the midintensity point somewhere between x_{0} and x_{1}. We can take advantage of this and force the high point of each step edge down to the midintensity point, i.e. fix . Finally, these fixed points are interpolated. A special case is required for step edges larger than δ, in which case both the high and low side of the edge are fixed. Extension to a twodimensional image is straightforward, with the fixed points forming closed Jordan curves in the plane. Caselles called these the boundaries of the level sets, but it should be noted that his level sets are not directly related to standard levelset methods in computer graphics [83].
We implemented Caselles' approach, using a simpler isotropic diffusion process for both holefilling and interpolation of the quantised disparity map. An isotropic diffusion process finds a solution to
Initially, the results from this approach seemed reasonable. Performance was quick, requiring only about two minutes per frame. Holes were smoothly filled, showing C^{0} continuity and C^{1} continuity with the fixed hole boundary. Quantisation artefacts were resolved, with C^{0} continuity. However, there were C^{1} continuity problems at the fixed boundaries of the level sets used to interpolate the quantised disparity map. The diffusion process did yield C^{1} continuity on either side of the levelset boundary line, but the derivatives on either side of the boundary were not identical. Consequently, the image gradient had clearly visible discontinuities along the boundaries of the level sets.
Ultimately, more control was needed at the boundaries during diffusion. Like Caselles, we fixed the value of the function at the boundary, known in the PDE literature as imposing Dirichlet conditions on the boundary value partial differential equation. Cauchy conditions are a standard alternative, where both the value and the gradient are specified at the boundary as described in [87]. Unfortunately, Cauchy conditions cannot be used, since the gradient at the boundary is unknown; we only want the gradient on either side of the levelset boundaries to be the same. It is possible that the biharmonic equation could provide this control over the system, but we leave this as future work.
PDE methods were satisfactory for holefilling, but could not solve the subpixel estimation problem. Using a diffusionbased approach, the only way to interpolate the existing data to perform subpixel estimation was by introducing fixed boundaries of the level sets. However, the interpolation can be achieved in a different manner.
As before, a solution to Laplace's equation (Equation 4.1) is desired, starting from a variant,
(4.3) 
We use a finitedifference representation of the Laplacian,
As shown in Figure 4.5 the matrix A is sparse with the standard 5point Laplacian structure, sometimes known as "tridiagonal with fringes.'' The main diagonal represents the central count in the finitedifferencing scheme, the upper and lower diagonals correspond to the right and left neighbours respectively, and the fringe diagonals correspond to the lower and upper neighbours. The value a_{i} on the main diagonal is initially adjusted to ensure that the sum of each row is zero.
The problem can now be expressed in a simple, classic form. Find x that minimises
(4.4) 
This is a constrained linear leastsquares minimisation problem. We use Matlab to solve this equation, and Matlab uses a subspace trustregion method based on the interiorreflective Newton method, as described by Coleman et al. [29].
Some small modifications are made to this scheme for practical purposes. Not all entries in I are valid, and this must be accounted for. Invalid samples are excluded from both the A and b matrix. We adjust the central count a_{i} to include only the number of valid neighbour samples. Finally, for holefilling, the boundary values surrounding the hole are included in b but excluded from A.
In practical terms, this algorithm performs quite slowly. A hierarchical version performs much better, and some minor edits to Matlab's source files can improve performance to about three minutes per frame. The appearance of the results is satisfactory, exhibiting both C^{0} and C^{1} continuity.
As noted before, the optimisation approach is only needed for subpixel estimation. Both the PDE and optimisation approaches produce satisfactory results for holefilling, and the PDE approach yields better performance.