Trifocal Relative Pose From Lines at Points

We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of (i) three points and one line and the novel case of (ii) three points and two lines through two of the points. These problems are too difficult to be efficiently solved by the state of the art Gröbner basis methods. Our method is based on a new efficient homotopy continuation (HC) solver framework MINUS, which dramatically speeds up previous HC solving by specializing hc methods to generic cases of our problems. We characterize their number of solutions and show with simulated experiments that our solvers are numerically robust and stable under image noise, a key contribution given the borderline intractable degree of nonlinearity of trinocular constraints. We show in real experiments that (i) sift feature location and orientation provide good enough point-and-line correspondences for three-view reconstruction and (ii) that we can solve difficult cases with too few or too noisy tentative matches, where the state of the art structure from motion initialization fails.


INTRODUCTION
S CIENTIFIC research on 3D reconstruction from multiple views has made an impact [1] by mostly relying on points in Structure from Motion (SfM) [2], [3], [4], [5].Still, even productionquality SfM technology fails [1] when the images contain (i) large homogeneous areas with few or no features; (ii) repeated textures, like brick walls, giving rise to a large number of ambiguously correlated features; (iii) blurred areas, arising from moving cameras or objects; (iv) large scale changes where the overlap is not sufficiently significant; or (v) multiple and independently moving objects each lacking a sufficient number of features.
The failure of bifocal pose estimation using RANSAC on hypothesized correspondences, e.g., using 5 points [6], is highlighted in a dataset of images of mugs, Fig. 1 (similar to the dataset in [7] but without a calibration board), for which the failure rate using Manuscript received Month Day, Year; revised Month Day, Year.
Fig. 1.A deficiency of the traditional two-view approach to bootstraping SfM: not enough features detected (red dots) and thus a SOTA SfM pipeline COLMAP fails to reconstruct the relative camera pose.In contrast, the proposed trinocular method requires only three features corresponding in three views: two point-tangents (points with SIFT orientation, green and cyan) and one point without orientation (purple, also SIFT).
Red cameras are computed by our approach and green is ground truth.
the standard SfM pipeline COLMAP [63] is 75%.The failure of just directly applying the 5-point algorithm in this example is even higher.A similar situation exists for images containing repeated patterns where there are plenty of features, but determining correspondences is challenging.Most traditional multiview pipelines estimate the relative pose of the two best views and then register the remaining views using a P3P algorithm [2], for robustness.The focus of this paper is to address the failure of traditional bifocal algorithms in such cases, while tackling strategic problems that have long-term potential for breakthrough for a myriad of other minimal problems we jointly discovered and tackled [8], [9], [10], [11], and in the case of curve features for SfM which critically depend on trifocal geometry [12], [13], [14], [15], [16].
The failure of bifocal algorithms motivates the use of (i) more complex features, i.e., having additional attributes and (ii) more diverse features (partial visibilitiy also helps in robustness, see [17]).We propose that orientation (in the sense of inclination) is a key attribute to disambiguate correspondences and we show that SIFT orientation in particular is a stable feature across views for trifocal pose estimation.Orientation can also arise from curve tangents [14], [15], [18], and the orientation of a straight line in multiple views also constrains pose.Observe, however, that orientation cannot be constrained in two views alone: SIFT orientation or line orientations in two views are uncorrelated, but together can identify their 3D counterparts and thus can constrain orientation in a third view.This motivates trinocular pose estimation based on point features endowed with orientation or including straight line features [12], [13], [14].
Camera estimation from trifocal tensors is long believed to augment two-view pose estimation [19], [20].Although no significant improvements over bifocal pairwise estimation have been documented [21], recent work reiterate the advantages of wellcrafted trifocal algorithms for relevant near-degenerate configurations such as approximately collinear camera centers [20], [22].The calibrated trinocular relative pose estimation from four points, 3V4P, is notably difficult to solve [15], [23], [24], [25], and is not a minimal problem -it is over-constrained.The first working trifocal solver [23] effectively parametrizes the relative pose between two cameras as a curve of degree ten representing possible epipoles.A third view is then used to select the epipole that minimizes reprojection errors.In this sense, trinocular pose estimation has not truly been tackled as a minimal problem.
Trifocal pose estimation requires the determination of 11 degrees of freedom: six unknowns for each pair of rotation R and translation t, less one for metric ambiguity.Three types of constraints arise in matching triplets of point features endowed with orientation.First, the epipolar constraint provides an equation for each pair of correspondences in two views.Second, in a triplet of correspondences, each pair of correspondences are required to match scale, providing another constraint; a total of three equations per triplet.It is easy to see, informally, that three points are insufficient to determine trifocal pose, while four points are too many.Third, each triplet of oriented feature points provides one orientation constraint.Thus, with three points, only two points need to be endowed with orientation, giving a total of 11 actual constraints for the 11 unknowns.We refer to this novel problem of three triplets of corresponding points, with two of the points having oriented features as "Chicago", which evolved out of the work by Fabbri, Giblin and Kimia on absolute pose estimation from two points endowed with tangents [13], [14].In the second scenario, i.e., using straight lines as features, with three points, only one free (unattached to a point) straight line feature is required.We refer to the problem of three triplets of corresponding points and one triplet of corresponding free lines as "Cleveland."This paper addresses trifocal pose estimation for the above two scenarios, shows that both are minimal problems, and develops efficient solvers for the resulting polynomial systems.
Specifically, each problem comprises eleven trifocal constraints that in principle give systems of eleven polynomials in eleven unknowns.These systems are not trivial to solve and require techniques from numerical algebraic geometry [26], [27], [28] (i) to probe whether the system is over or under constrained or otherwise minimal; (ii) to understand the range of the number of real solutions and estimate a tight upper bound; and (iii) to develop efficient and practically relevant methods for finding solutions which are real and represent camera configurations.This paper shows that the Chicago problem is minimal and has up to 312 solutions (the area code of Chicago) of which typically 3-4 end up becoming relevant to camera configurations.Similarly, we show that the Cleveland problem is minimal and has up to 216 solutions (the area code of Cleveland).The minimality of combinations of points and lines for the general case [29] is a parallel development to the more concrete treatment presented here.
The numerical solution of polynomial systems with several hundred solutions is challenging.We devised a custom-optimized Homotopy Continuation (HC) framework MINUS which iteratively tracks solutions with a guarantee of global convergence [27].Our framework specializes the general HC approach to minimal problems typical of multiple view geometry, thereby dramatically speeding up the implementation.Specifically, our Chicago and Cleveland solvers are not only the first solvers for such high degree problems, but are orders of magnitude faster than solvers for such scale of problems: 660ms on average on an Intel core i7-7920HQ processor with four threads.They share the same generic core procedure with plenty of room to be further optimized for specific applications.Most significantly, since finding each solution is a completely independent integration path from the others, the solvers are suitable for implementation on a GPU, as a batch for RANSAC, which would then reduce the run time by the number of tracks, i.e., by two orders of magnitude.We hope that our developments can be a template for solving other computer vision problems involving systems of polynomials with a large number of solutions, and in fact the provided C++ framework is fully templated to include new minimal problems seamlessly.
Experiments are initially reported on complex synthetic data to demonstrate that the system is robust and stable under spatial and orientation noise and under a significant level of outliers.Experiments on real data first demonstrates that SIFT orientation is a remarkably stable cue over a wide variation in view.We then show that our approach is successful in all cases where the traditional SfM pipeline succeeds, but of course at higher computational cost.What is critically important is that the proposed approach succeeds in many other cases where the SfM pipeline fails, e.g., on the EPFL [30] and Amsterdam Teahouse datasets [31], Fig. 9 and 10.Those cases where the bifocal scheme fails -flagged by the number of inliers, for example -can consider the application of a currently more expensive but more capable trifocal scheme to allow for reconstructions that would otherwise be unsolved.

Literature Review
Trifocal Geometry.Calibrated trifocal geometry estimation is a hard problem [23], [24], [25], [32].There are no publicly available solvers we are aware of.The state of the art solver [23], based on four corresponding points (3V4P), has not yet found many practical applications [33].A solver for a relaxed version of this problem has been recently made available by our coauthors based on techniques originated in the present paper [34].
For the projective case, 6 points are needed [35], and Larsson et al. solved the longstanding trifocal minimal problem using 9 lines [36].The case of mixed points and lines is less common [37], but has seen a growing interest in related problems [38], [39], [40].The calibrated cases beyond 3V4P are largely unsolved, spurring sophisticated theoretical work [41], [42], [43], [44], [45], [46], [47].Kileel [43] studied minimal problems in this setting, such as the Cleveland problem solved in the present paper, and reported studies using HC.He stated that the full set of ideal generators, i.e., a set of polynomial equations provably necessary and sufficient to describe calibrated trifocal geometry, was unknown.
Seminal works used curves and edges in three views to transfer differential geometry for matching [48], [49], and for pose and trifocal tensor estimation [16], [50], beyond straight lines for uncalibrated [51], [52] and calibrated [38], [53] SfM.Pointtangents -not to be confused with point-rays [54] -can be framed as quivers (1-quivers), or feature points with attributed directions (e.g., corners), initially proposed in the context of uncalibrated trifocal geometry but de-emphasizing the connection to tangents to curves [55], [56].Point-tangent fields can be framed as vector fields, so related technology applies to surface-induced correspondence data [15].In the calibrated setting, point-tangents were first used for absolute pose estimation by Fabbri et.al. [13], [14], from only two points, later relaxed for unknown focal length [57].The trifocal problem with three point-tangents as a local version of trifocal pose for global curves was first formulated by Fabbri [15], presented here as a minimal version codenamed Chicago.
As an early example, HC was used to find an early bound of 600 solutions to trifocal pose with 6 lines [64].In the vision community HC is mostly used as an offline tool to carry out studies of a problem before crafting a symbolic solver.Kasten et.al. [79] recently compared a general purpose HC solver [61] against their symbolic solver.However, their problem is one order of magnitude lower degree than the ones presented here, and the HC technique chosen for our solver [27] is more specific than their use of polyhedral homotopy, in the sense that fewer paths are tracked (cf.. the start system hierarchy in [59]).

TWO TRIFOCAL MINIMAL PROBLEMS
We formulate a new minimal problem for points and incident lines in three views, codenamed Chicago.We present its fundamental equations in explicit parametric form that shed light on the geometric properties relevant to vision, as well as a more specific set of equations with 14 unknowns used in our best-performing solver MINUS.While we focus on the Chicago problem, our formulations, analysis and solver framework generalize to important similar problems, and has lead to companion work by our coauthors [29].To illustrate this, we present a second trifocal problem for points and a free line, codenamed Cleveland.The formulation, characterization and practical solver approach for Cleveland, in direct analogy to Chicago, are also a contribution of this paper.Specific details on Cleveland are left for the appendix, since our focus is on Chicago and the analysis is analogous.
Fig. 2. Notation illustrated for a single point with a curve tangent vector or feature orientation, e.g., SIFT.Multiple features may be explicitly indexed with an additional first subscript.

Formulation and Notation
We follow notational style from Hartley and Zissermann [51] with explicit projective scales.A more elaborate notation [14], [50] can be used to express the equations in terms of tangents to curves and derivatives of relevant quantities such as depth.Fig. 2 illustrates the notation for a single feature consisting of a point and an incident line in three views.Symbols may be given two subscripts p, v = 1, 2, 3 to index multiple feature points and views, respectively; indices p may be omitted for brevity.Let R v , t v denote the rotation and translation transforming coordinates from camera 1 to camera v (so that R 1 is identity and t 1 = 0).Symbols X p and Y p denote inhomogeneous coordinates of 3D points, and x pv , y pv homogeneous coordinates of their respective projections on P 2 at view v, with α pv , β pv their respective depths.Let l pv and L p denote column vectors of homogeneous coordinates of image lines and underlying 3D lines in (P 2 ) ∨ and (P 3 ) ∨ , resp.We use both parametric and homogeneous equations for lines, the latter obtained by eliminating the line parameter from the former.Symbol d pv represents a line direction or unit curve tangent vector in homogeneous coordinates at view v (point at infinity, i.e., third coordinate is zero); and D p is the underlying 3D line direction or space curve tangent in inhomogeneous world coordinates.Displacements ε p along D p correspond to displacements δ pv along d pv .Let π pv denote the homogeneous coordinates of the backprojection plane in (P 3 ) ∨ of l pv .For simplicity, we use concrete coordinate representations even in coordinate-independent statements.By default, all coordinates are assumed real and without the action of intrinsic parameters.
Definition 1 (Chicago trifocal problem).Given three corresponding points x 1v , x 2v , x 3v and two lines l 1v , l 2v in views v = 1, 2, 3, such that l pv meets x pv for p = 1, 2 and v = 1, 2, 3, Examples of data for Chicago: 1) Three oriented features (e.g., SIFT) corresponding across three views, using feature orientations; 2) General curves in three views (e.g., linked subpixel edges), and three corresponding curve points (e.g., subpixel edgels), using tangent vectors; 3) Trajectories of three moving points observed by three cameras, using velocity vectors.While a third orientation triplet is usually available and exploited in practice, we show the core pose solution requires only two.

Essential Equations
The essential equations of Chicago (and Cleveland) are obtained by writing constraints per feature independently, while keeping the pose unknowns in general form.They are used for analyzing the fundamental properties of the new problems and as a basis for further variable elimination and exploring other formulations.See [80] for a general framework for navigating different formulations.The final solver that offered the best performance uses a formulation that further eliminates variables across these perfeature equations using specific algebraic manipulations connecting features pairwise, as described further in Section 2.3.
Theorem 2.1 (Essential trifocal constraints for points and incident lines, parametric form).The constraints on relative pose from points and incident lines observed in three views are given by for v = 2, 3 (point indices omitted, R 1 = I and t 1 = 0).We call (1) the parametric essential trifocal point constraints, and (2) the parametric essential trifocal incident line constraint.Moreover, (1) imposes three constraints per triplet point, while (2) imposes one constraint per incident line triplet: 1) Point epipolar constraints: (1).Lines in space through X are modeled here in parametric form by a displacement parameter and points Y = X + εD, which are projected as and eliminate εD using the equation for v = 1 and . It follows that the trifocal essential point constraints in parametric form (1) are logically equivalent to the existence of a triangulation X from views 1 and 2 equal that from views 1 and 3.In parametric form, it simply means that these solutions can be linked by the same depth α 1 .By construction, these imply the existence of a triangulation from views 2 and 3, also equal to X, so (2) for views 2 and 3 does not provide an additional constraint. 1  The trifocal essential incident line constraints in parametric form are logically equivalent to the existence of a 3D line direction D that, when rooted at X, projects to direction d 1 and d 2 , and that D also projects to d 3 .In the point case the equation from views 1 and 2 provides a constraint, i.e., (1) for v = 2 does not always have a solution, while the incident line equation from views 1 and 2 does not provide a constraint on pose -there is always a solution µ and η for (2) for v = 2 that parametrizes some consistent D irrespective of R and the data x and d.Each triplet of oriented point features provides a single orientation constraint expressed as two coupled equations (2) in η and µ in addition to pose.
1. Conversely, having three pairwise epipolar constraints is not equivalent to two pairwise epipolar constraints and a relative scale constraint [22].
Corollary 2.2.The correspondence of points across three views constrain relative rotations and translations, while the additional correspondence of an incident line constrains only rotation.
Proof.This is a direct consequence of Theorem 2.1.
Having an incident line thus works like an additional point correspondence -in a precise sense like a third of a point -yet constraining only rotations.This allows us to construct Chicago as an exactly constrained trifocal problem that can be applied, e.g., with conventional SIFT features.We can get an expression of these constraints free of auxiliary parameters by further elimination.
The parametric point epipolar constraints of Theorem 2.1, in particular, state that x 1 , x v and the first camera center t v are coplanar when written in the coordinates of camera v; this is the classical Essential constraint, readily expressed without parameters via a scalar triple product trilinear in t v and the points, the standard expression that is bilinear in image coordinates.Although we arrived at this constraint explicitly from first principles through but the simplest logic, it is a general constraint of two-view geometry with recent results in trifocal geometry [20].Algebraically, the classical expression for the Essential constraint ammounts to eliminating depths α v from (1) while keeping R v and t v .However, there are successful arguments for eliminating R v and t v first in camera pose problems, writing the equations in terms of depths only α [13], [14] (e.g., the classical P3P equations).Though not performed here, this further motivates stating the trifocal essential constraints in parametric form.Moreover, the parametric form more readily lends itself to modeling general curves [12] for which trifocal geometry plays a pivotal role.
The trifocal relative scale constraint in Theorem 2.1 guarantees that 3D rays converge, which may not be the case if we had used three pairwise epipolar constraints instead; in fact, this scale constraint is a fundamental and classical condition of photogrammetry, called the scale-restraint equations, see [22] for general results.Such a constraint may be substituted by an additional epipolar constraint between views 2 and 3, but it turns out that this is only adequate for oriented points, i.e., together with the incident line constraint, which guarantees a consistent 3D incident line.Without this, having three pairwise epipolar constraints is not enough to guarantee there is a 3D point X that projects to the observed points, specially near non-generic configurations [22], namely 1) if the camera centers are far from collinear, when the corresponding rays lie in or near the trifocal plane 2) if the centers are approximately collinear, when the rays lie near any plane containing the baseline [22].In this sense, points with incident lines are natural features in trifocal geometry.

Corollary 2.3 (Chicago Essential Equations, Parametric Form).
The Chicago problem is equivalent to finding the solutions of for v = 2, 3, which are 30 scalar equations in the relative camera pose R 2 , t 2 , R 3 , t 3 , along with 9 unknown depths (α p v) and 12 unknown line parameters (6 each for η p v and µ p v).
Proof.Theorem 2.1 lists all the available constraints that arise.
That actual equations used in our solver amount to an elimination of the auxiliary parameters in ( 4) and ( 5), leading to vanishing minors, Section 2.3.Note that (4) are homogeneous in α and Fig. 3. Visible line diagrams for Chicago.Cleveland uses the same numbering for pairwise lines and l4 is a free line.
t, so that a multiple of a particular solution are also solutions, i.e., translations and depths are constrained up to scale, giving 11 constrainable degrees of freedom.By Theorem 2.1, the essential equations used in Chicago express 3 independent constraints per point, and 1 per incident line, yielding 11 constraints on 11 degrees of freedom.Rigorous computational arguments in Section 3 confirm that these constraints are also independent across points.In other words, Chicago is a minimal problem.
One can also see the parametric trifocal essential equations for Chicago as a square system of 30 scalar equations in the 30dimensional space SO(3) × SO(3) × P 14 × P 5 × P 5 of unknowns We model the 9 depths α v and t 2 , t 3 as a single point in P 14 , since they are unknown up to a common scale.Similarly, since only the directions of tangents matter, we regard these solution components as points in two P 5 factors, one per oriented feature.
There are many ways to proceed with elimination from the essential parametric equations to obtain alternate formulations, as discussed above.A particular eliminated formulation based on vanishing minors, which produced the first working solver for Chicago, and which are used in MINUS, is described in Section 2.3.

Equations based on minors used in our solver
Experiments show that judicious elimination of additional variables from the basic equations leads to faster and more reliable solvers, with tradeoffs, e.g., in the number of variables vs. nonlinearity and degeneracy of the resulting representations.This section describes a particular way to eliminate variables down to a 14×14 system that has proven most successful and general to date.
Futher elimination of certain variables from the basic equations leads to minor-based constraints, i.e., enforcing the determinants of certain sub-matrices to vanish.Examples are coplanarity or multilinear constraints, e.g., the essential constraint.In particular, this eliminates parameters describing coordinates of vectors in constraints on lines (depths α's) and planes (η's and µ's).While this approach has long been used for describing trifocal constraints for points [22], in full generality it is novel and has spawned companion work by our coauthors [29].Additionally, equations based on minors are multilinear, allowing for possible numerical improvements, Section 4.1.
An instance of Chicago may be described by a configuration of 5 visible lines in each view, Fig. 3.We denote each line by l 1v , . . ., l 5v for v = 1, 2, 3, where the first three l 1v , l 2v , l 3v pass through all pairs of points in each view, and the last two l 4v , l 5v represent the associated point-tangent pairs.The minorbased equations split into three sets summarized as: Lines correspond: π i,1 , π i,2 , π i,3 meet at a 3D line L i .Pairwise lines meet: L 1 , L 2 , L 3 meet pairwise in 3D.
The latter two are so-called common point constraints.
Line correspondence constraint.These equations express that there must be an underlying 3D line L j , j = 1, . . ., 5 associated to the set of backprojection planes ].These planes define a single line if the underlying system of equations has a 1D solution, leading to the rank constraint Equivalently, we obtain a polynomial system by setting all 3 × 3 minors of each L j to zero.As explained Section 4, MINUS employs a heuristic to select one such minor per L j , fixed for given HC starting solutions, yielding 5 final equations for this constraint.
Pairwise line intersection constraint.That L 1 , L 2 , L 3 intersect pairwise can be expressed by or that all maximal 4 × 4 minors vanish.We use only rank [L 2 L 3 ] ≤ 3 corresponding to X 3 , as the other pairwise intersections will be implicit in the constraint of incident tangents.
For MINUS we pick only one minor equation for this constraint using the aforementioned heuristics.
Incident tangents constraint.That tangents intersect at the same point with two other lines can be expressed by forming matrices and requring All 4 × 4 minors must vanish, 5 of which are used in MINUS.The final number of equations consists of 11 fixed, specific vanishing minors.The total number of minors associated with the rank constraints ( 6),( 7),( 8) far exceeds the number of unknowns used in our formulation of Chicago.The number of unknowns, as described in the next section, is 14, and the total number of equations implied by these rank constraints is 287 = 5 4 3 + 2 9 4 + 6 4 .Nevertheless, these 287 equations together with 3 dehomogenization equations (12) will have 312 solutions for almost all line configurations encoding an instance of Chicago.In our HC solvers, we work with a 14 × 14 subsystem of these equations which determine a full-rank submatrix of the 290 × 14 Jacobian matrix.
In this approach, the selection of the actual equations out of a large pool of possibilities is done through computer-assisted heuristics, Section 4. While these general tools aid in understanding the underlying geometry, this becomes concealed.Selecting the appropriate subset of minors, e.g., that ensures the 3D rays for matching points always intersect, is a known problem in the projective case [22].In that scenario, a different subset of minors may be used depending on a priori assumptions on camera configuration (e.g., collinear vs non-collinear camera centers) [20].
An explicit set of vanishing minors for point trifocal geometry and the resulting constraints is studied in a general setting by Trager et.al. [20].A geometric interpretation is that four minors encode constraints that are trilinear in image coordinates and express that 3D rays must meet at a single point.When 3D rays are viewed from four different appropriate image planes, each vanishing minor may be expressed as requiring three coplanar projected lines meeting at a point [20].We verify experimentally that our chosen set of minors provides a working solver.

PROBLEM ANALYSIS
A general camera pose problem is defined by a list of labeled features in each image, which are in correspondence.The image coordinates of each feature are given, and we aim to determine the relative poses of the cameras.The concatenated list of all the feature coordinates from all cameras is a point in the image space Y , while the concatenated list of the features' locations and orientations in the world frame or camera 1 is a point in the world feature space W .The scale of the relative translations is indeterminate, so relative translations are treated as in projective space.For N cameras, the combined poses of cameras 2, . . ., N relative to camera 1 are points in SE(3) N −1 .Let the pose space be X, the projectivized version of SE(3) N −1 , and so dim X = 6N −7.Given the 3D features and the camera poses, we can compute the image coordinates of the features by considering a viewing map V : W × X → Y .A camera pose problem is: given y ∈ Y , find (w, x) ∈ W × X such that V (w, x) = y.The projection π : (w, x) → x is the set of relative poses we seek.Definition 3. A camera pose problem is minimal if V : W ×X → Y is invertible and nonsingular at a generic y ∈ Y .
A necessary condition for a map to be invertible and nonsingular is that the dimensions of its domain and range must be equal.Let us consider three kinds of features: a point, a point on a line (equivalently a point with tangent direction), and a free line (a line with no distinguished point on it).For each feature, say F , let C F be the number of cameras that see it.The contributions to dim W and dim Y of each kind of feature are in the table below, where a point with a tangent counts as one point and one tangent.Thus, a point feature has several tangents if several lines intersect at it.= max(0, x).A necessary condition for a N -camera pose problem to be minimal is For trifocal problems where all cameras see all features, i.e., C P = C T = C L = 3, a pose problem with 3 feature points and 2 tangents meets the condition.A pose problem with 3 feature points and 1 free line also meets the condition.Adding any new features to these problems will make them overconstrained, having dim Y > dim W × X. Definition 4. The algebraic degree of a minimal pose problem is the number of solutions (w, x) ∈ V −1 (y) for generic y ∈ Y .
Both Gröbner bases and HC offer probability-one methods for computing all solutions for a particular problem instance specified by y ∈ Y .Gröbner bases also offer an exact method, when working over Q.However, it is difficult to say when any particular y ∈ Y will satisfy the necessary genericity conditions to have have this many solutions without knowing the algebraic degree a priori.Thus, the following statement has two components: that both problems are minimal (rigorously proven) and that their algebraic degrees are as stated (true with probability one).Theorem 3.2 (Computational).The Chicago trifocal problem is minimal with algebraic degree 312, and the Cleveland problem is minimal with algebraic degree 216.Proof.To show that a N -camera pose problem is minimal, it is enough to find (w, x) ∈ W × X where the Jacobian of V (w, x) is full rank.For exact values of (w, x) ∈ W × X in rational arithmetic, we compute the exact rank of this Jacobian.This proves that the problem is minimal.To compute the algebraic degree of a given problem, we write down a system of polynomial equations in unknowns (w, x) ∈ W × X for a randomly chosen y.Since the problem is minimal, we expect that the ideal generated by these polynomials is 0-dimensional.Gröbner bases give standard methods [81] both for checking that this ideal is 0-dimensional and computing its degree.Finally, to verify that the degree of the ideal is equal to degree of the minimal problem, we have computed all solutions to the system of polynomials specified by y ∈ Y and verified that they correspond to valid points (w, x) ∈ W × X.We carried out this procedure with the minors equations and confirmed the degree using the essential equations and HC.

Remark:
The previous argument depends on the system of equations chosen to model the problem.For instance, if (4),(5) are used, then there exist 312 solutions corresponding to valid points in W × X, plus a small number of degenerate solutions where certain values of the depths α equal zero.Additional polynomial equations which exclude these solutions may be generated using the symbolic technique of saturation [81,Sec 4.4].Such a saturation step is also necessary if rotation matrices are modeled with the quaternion parametrization in (11), since we must rule out degenerate solutions with w 2 i + x 2 i + y 2 i + z 2 i = 0.A companion work by our coauthors [8] provides Macaulay2 tutorial for the Gröner basis degree proof and other general techniques presented in this section for analyzing Chicago, Cleveland, and a number of related minimal problems using the minors approach.Since Gröbner bases can be used to compute the algebraic degrees of both minimal problems, it is natural to hope that they also can be used to design effective minimal solvers.However, the current leading methods for building minimal solvers (eg.[82], [83], [84]) do not scale well for problems of degree 100 or larger.This is our main motivation for using optimized HC.

OPTIMIZED HOMOTOPY CONTINUATION SOLVER
Like other minimal problems in vision, the Cleveland and Chicago problems require us to solve a system of polynomial equations.Crucially, these equations are polynomial in both the input data (points and lines in images) and the unknown quantities to be estimated (cameras and world features.)It is common to call these systems parametrized polynomial systems, as the input data parametrize the space of all instances of a given problem.In Section 4.1, we review basic facts about coefficient parameter homotopy, a very general framework for solving parametrized polynomial systems based on HC methods.The parameter homotopies arising in this framework lie at the core of our HC solvers.To make this general framework concrete, Section 2.3 describes in precise detail one possible strategy for formulating the Cleveland and Chicago problems, in which the depths and displacements are eliminated from the essential equations of Section 2.2.Although these formulations are used in our best-performing solvers to date, we stress that the exact formulation is not essential to the underlying technique.Other formulations of the problem will also give rise to parameter homotopies which can be successfully used within general-purpose software [26], [28] or within our optimized C++ framework MINUS described in Section 4.2.
Acknowledging the promise of further speedups brought by experimenting with different formulations, we observe that our specific parameter homotopies can already be used to solve Chicago and Cleveland in a relatively efficient manner, Section 5. We attribute relatively good run times to two factors.First, the inherent specificity of parameter homotopies when compared to other HC methods; the number of paths to track in a parameter homotopy is precisely the algebraic degree of the problem.Second, we optimize various aspects of HC, such as polynomial evaluation and numerical linear algebra, Section 4.2, along with more aggressive optimization opportunities and tradeoffs.

Algorithm
We assume that F (R; A) is a system which is polynomial in both the variables R and the parameters A. One is interested in efficiently computing the solutions for many instances of the parameters.To compute all nonsingular complex isolated solutions of F (R; A) = 0 for any given set of target parameters A * , one may use the parameter homotopy for s ∈ [0, 1), Algorithm 1.It is assumed that solutions for some starting parameters A 0 have already been computed via some offline, ab initio phase, described below, by default hardcoded in MINUS.This initial phase determines representatives of nonsingular isolated solutions, making for faster, more efficient solves for any other parameter values desired, e.g., within RANSAC.Generically, the homotopy paths are smooth and do not intersect each other.To ensure this (genericity) condition for every homotopy path with probability 1, we may employ the so-called gamma trick.This consists in choosing a (random) γ ∈ C so that the homotopy equation becomes where φ(s) parametrizes an arc, depending on γ, connecting A 0 to A * in the parameter space.More explicitly, we define φ(s) = (1− τ (s))A 0 + τ (s)A * , with τ (s) = γs 1+(γ−1)s , as in Algorithm 1.In this way, φ(s) is a generic path in the complex space without singularities, even if the endpoints are real.However, even though the circular arc depending on γ misses the non-generic points in C with probability 1, it might happen that the arc is close to these non-generic points; this can cause instability, increase the error or decrease speed in computations.If we run MINUS multiple times with the same data but using different (random) γ's, it results in a dispersion of run times and even occasional failures.The slower running times and the occasional failures happen when γ lands close to certain rays in C which intersect an appropriately-defined discriminant in the tracking parameter s.
For systems which are linear in the parameters A, it is possible to adapt the gamma trick to work with a simpler linear segment homotopy, due to the following calculation: where the coefficient 1 1+(γ−1)s is never zero for real s ∈ [0, 1) and can thus be ignored when solving H(R; s) = 0.This variant of the gamma trick may be preferable to the general version, since it results in cheaper evaluation of the homotopy and its derivatives, and may also lead to better numerical stability.
Minor-based constraints are multilinear in the coordinates of each line l suggesting that a simple variant of the aforementioned "linear" gamma trick will work for related formulations.This will indeed work for Cleveland, where we may treat each coordinate of each line as an independent parameter.However, for Chicago there is an additional subtlety due to the fact that the associated configuration of lines is not general and must satisfy For Chicago, treating each coordinate of each line as an independent parameter will not give a valid parameter homotopy; even if A and A * encode valid configurations of lines, points on a circular arc or linear segment connecting them will not.We thus represent the lines encoding tangents with 2 independent parameters as a pencil of lines.
A full accounting of the variables and parameters used for Chicago in MINUS is as follows: 14 variables: Each translation vector has three unknown components, and the entries of matrices R 2 and R 3 are written as rational homogeneous functions in four unknowns (homogenized Cayley): 56 parameters: 27 = 3 × 3 × 3 parameters represent three independent lines l 1v , l 2v , l 3v in each view; 12 = 3×2×2 parameters of the form a iv b iv represent two dependendent lines l 4v , l 5v in each view; The remaining 17 = 56 − 39 parameters consist of v 1 , v 2 ∈ C 5 and v 3 ∈ C 7 which are random coefficients of 3 inhomogeneous linear equations that determine affine charts on homogeneous coordinates given by r 1 = (w 2 , x 2 , y 2 , z 2 ), r 2 = (w 3 , x 3 , y 3 , z 3 ), and (t 2 t 3 ).In summary, Chicago may be formulated as a system of 290 equations in 14 variables and 56 parameters.A similar accounting lets us formulate Cleveland as a system of 64 equations in 14 variables and 53 parameters.As previously remarked, we may select a square subsystem F to define the homotopy in (9), provided that the Jacobian d F d R (R 0 ; A 0 ) has full rank for every starting solution R 0 .We note that the 276 excess equations need not be algebraic consequences of the 14 that are selected.
Nevertheless, the fact that each initial solution R 0 satisfies all 290 equations implies that we do not need to enforce these excess equations explicitly -see, e.g., the discussion in [85, SM Section 16], or the discussion of "side conditions" in [59,Section 7.4].
For Chicago, a precomputed set of 312 starting solutions to the 290 × 14 system for starting parameters A 0 may be numerically continued to 312 solutions for target parameters A via the parameter homotopy (9), where F is a suitable 14 × 14 square subsystem.To obtain the starting solutions, we first compute a single, random problem-solution pair (R 0 , A 0 ), first computing R 0 by fabricating a random scene and cameras, then A 0 by projecting features in each image.From this initial problemsolution pair, we may then generate a complete set of 312 solutions by parameter continuation along random monodromy loops in the space of parameters.Such monodromy-based heuristics are standard in numerical algebraic geometry.A complete description is beyond the scope of this paper, see e.g., [86] or [87], where the latter work describes the implementation we used.
For the minors-based formulation of Chicago, an ad-hoc variant of the gamma trick may be be used with the linear segment homotopy (10).The variant is used in the implementation of MINUS, and is based on the following idea: pick γ 1 , γ 2 , . . ., at random from the complex unit circle, and consider the parameter values A γ 1 ,γ 2 ,... obtained by the following replacements These replacements are designed so that systems parametrized by A and A γ 1 ,γ 2 ,... have the same solution sets.Thus, for generic starting and target parameters A 0 and A * , real or complex, we may numerically continue the solutions of F (R; A 0 ) = 0 to those of F (R; A * ) = 0 using the linear segment connecting A γ 1 ,γ 2 ,... 0 and (A * ) γ 1 ,γ 2 ,... in the space of parameters.We conclude this section with Algorithm 1, which contains a high-level description of our HC solver in pseudocode.

Implementation
We devised an optimized open source package called MINUS -MInimal problem NUmerical Solver, available at github.com/rfabbri/minus.This is an HC framework specialized for minimal problems, templated in C++ enabling efficient specialization for different problems, formulations, and precisions.The most reliable and high-quality solver to date uses a 14 × 14 minors formulation in double precision (64-bit).The most important optimization is exploiting fixed-length C-style arrays to optimize memory layout for size and locality.We also hardcoded evaluators and used Eigen [88]'s LU decomposition with partial pivoting for linear algebra solves, which proved accurate as long as double precision is used.The most important compile flag is --ffast-math; despite aggressive floating point optimizations, this only affected output within 10 −10 error.
As shown in Section 5, MINUS runs on average at hundreds of miliseconds and up to 100× faster than general-purpose HC.
It can run at a few miliseconds at the cost of reduced success rate in finding the solution, due to more aggressive optimization parameters.Such reduced success rate might be mitigated within RANSAC, if adequately assessed.For instance, we successfully devised a "lossy" HC parameter to constrain the number of predictor iterations per solution path, which have yielded an effective speedup at negligible loss in success rates, Section 5.
The second most important algorithm parameter to vary is the maximum number of correction steps; 4 is the current safe default.Increasing it to 5-7 cuts the runtime down to 280 ms.Another is corrector tolerance, which affects how many correction iterations are performed: increasing it 10 4 × brings the runtime down to less than 200 ms.The error rate for these extreme cases can be as high as 50%, although testing reprojection error to larger practical levels of 1 px precision may bring this figure up.
Like MINUS, widespread fast numerical algorithms to compute simple functions such as sqrt solve polynomial equations iteratively, and the key lies in the starting point [89].The start system in MINUS is by default precomputed from random parameters; it could instead be sampled from our synthetic data, and the closest camera could be selected matching a similar configuration of correspondences.See also companion work by our coauthors [34].Varying the problem formulations also has potential for speedup.Further eliminating variables to, say 6 × 6, could bring improvements since linear solves could be explicitly inverted.A GPU implementation is explored in companion work by our coauthors [90].

EXPERIMENTS
Experiments are conducted first for synthetic data for a controlled study, followed by challenging real data.We present results for the more challenging Chicago problem, since the exact same core solver is used for Cleveland.Synthetic data experiments: The synthetic data from [12], [14] consists of 3D curves in a 4 × 4 × 4 cm 3 volume projected to 100 cameras (Fig. 4), and sampled to get 5117 points endowed with orientations (tangents of curves) that are projections of the same 3D analytic points and tangents, and then degraded with noise and outliers.Camera centers are randomly sampled on an average sphere around the scene along normally distributed radii of mean 1 m and σ = 10 mm.Rotations are constructed via normally distributed look-at directions with mean along the sphere radius looking to the object, and σ = 0.01 rad such that the scene does not leave the viewport, followed by uniformly distributed roll.This sampling is filtered such that no two cameras are within 15°o f each other.Each camera encompasses a 500 × 500 px viewport, where the entire dataset is visible at sub-pixel precision with no more than one sample per pixel.
Our first experiment studies the numerical stability of the MI-NUS solver.The dataset provides veridical point correspondences, which inherit an orientation from the tangent to the analytic curve.For each sample set, three triplets of point correspondences are randomly selected with two endowed with the orientation of the tangent to the curve.Only real solutions that generate positive depth are retained.The unused tangent of the third triplet is used to verify the solution as it provides an unused equation.For each of the remaining solutions only one pose is determined.
The error in pose estimation is compared with ground-truth as the angular error between the normalized translation vectors and between the quaternions.The process of generating the input to pose computation is repeated 10 3 times and averaged.This experiment demonstrates that: (i) pose estimation errors are negligible, Fig. 5(a); (ii) the number of actual solutions is small: 35 real solutions on average, pruned down to 7 on average by enforcing positive depth, and even further to about 3-4 physically realizable solutions on average employing the unused tangent of the third point as verification, Fig. 5(b); note that these extra solutions can then be detected by RANSAC; solver fails about of cases, which, while for RANSAC, can eliminated by running the solver for that solution path with higher accuracy more parameters at a higher computational cost.
The second experiment shows that we can reliably and accurately determine camera pose with correct but noisy correspondences.Using the same dataset and a subset of the selection of three triplets of points and tangents -200 in total -zero-mean Gaussian noise at various levels was added both to the feature locations and to the orientation of the tangents, reflecting expected feature localization and orientation localization error.The noise levels on points and tangents reflect those found in curve extraction methods [91].A RANSAC scheme determines the feature set that generates the highest number of inliers.Experiments indicate that the translation and rotation errors are reasonable.Fig. 6 (top) shows how localization error affects pose under a fixed orientation perturbation of 0.1 rad; Fig. 6 (bottom) shows how the extent of orientation error affects pose under a fixed localization error of 0.5 px (pixels).The more meaningful reprojection error, i.e., the distance of a point from the location determined by the other two points, is shown in Fig. 6(bottom), averaged over 100 triplets.
The third experiment shows the system can consistently es- The ratio of outliers is varied over 10%, 25% and 40%, with the experiment repeated 100 times each.The resulting reprojection error is small and extremely stable, with median 2 px and maximum 3.6 px for all outlier ratios.Computational efficiency: Each solve using our software MI-NUS with conservative parameters takes 440 ms (660 ms in the worst case), compared to over 1 minute on average for the best prototypes using general purpose HC software [26], [28], on an Intel core i7-7920HQ with processor, GCC 5, and four threads.More aggressive but potentially unsafe optimizations towards microseconds are feasible, but require assessing failure rate.
To assess putting a cap N max on the number of predictor iterations per root, we first observe that after 10 4 random solves on synthetic data, the maximum number of iterations for paths leading to ground-truth was close to 10 3 , vs. about 254 × 10 3 for the wasted paths.Given that the solve is ≈ 1 − 4 µs per iteration, this leads to concrete routes to optimization.Fig. 7 shows that the time for roots leading to ground truth vs. undesired paths differ but remain strikingly stable across 140 different random input configurations.For each configuration out of 140, MINUS was run 500 times with different randomizations γ's and affine patch parameters.The minimum number of iterations for all configurations was 18.
Setting N max < 10 3 costs a decrease in success rate.However, we can regain success rate by re-runing N rep times with different randomizations.Fig. 8 shows that running once with N max = 500 yields a success of 97%, which is the current default for MINUS, providing the average figure of 401ms.Running twice with N max = 200 yields a similar success rate.For each (N max , N rep ) operating point, a success is counted if MINUS found the solution in any N rep runs; the final success rate is averaged by performing this procedure 7000 times.If all points have tangents, e.g., 3 SIFT features, as soon as a root reached an HC stop condition we test for positive depth and stop upon compliance with the third tangent to produce a hypothesis for RANSAC, cutting down average execution time further with a modest decrease in success rate.The run time remains on the order of 100 ms.
Real data experiments: Much like the standard pipeline, SIFT features are first extracted from all images.Pairwise features are found by rank-ordering measured similarities and making sure each feature's match in another image is not ambiguous and is above accepted similarity.Pairs of features from the first and second views are then grouped with the pairs of features from the second and third views into triplets.A cycle consistency check enforces that the triplets must also support a pair from the first and third views.Three feature triplets are then selected using RANSAC and the relative pose of the three cameras is determined from two SIFT orientations and a third point without orientation.
Fig. 9 shows that camera pose is reliabily and accurately found using triplets of images from the EPFL dense multi-view stereo image dataset [30].Our quantitative estimates on 150 random triplets from this dataset give pose errors of 1.5 × 10 −3 rad in translation and 3.24 × 10 −4 rad in rotation.The average reprojection error is 0.31 px.These are comparable to or better than the interest pointbased trifocal relative pose estimation methods reported in [21].Our conclusion for this dataset, whose purpose is to validate the solver, is that our method is at least as good and often better than the traditional ones.Note that we do not advocate replacing the traditional method for this dataset.We simply state that our method works just as well, of course at a higher cost.
The EPFL dataset is feature-rich, typically yielding on the order of 10 3 triplet features per image triplet.As such it does not portray some of the typical problems faced in challenging situations when there are few features available.The Amsterdam Teahouse Dataset [31], which also has ground-truth relative pose data, depicts scenes with fewer features.Fig. 10 (top) shows a triplet of images from this dataset where there is a sufficient set of features (the soup can) to support a bifocal relative pose estimation followed by a P3P registration to a third view (using COLMAP [3]).However, when the number of features is reduced, as in Fig. 10 (bottom) where the soup can is occluded, COLMAP fails to find the relative pose between pairs of these images.In contrast, our approach, which relies on three and not five features, is able to recover the camera pose for this scene.
We also created another featureless dataset similar to the one in [7] but with the calibration board manually removed.This scene lacks point features, which is extremely challenging for traditional structure from motion.We built 20 triplets of images within this dataset.Within these 20 triplets, camera poses of only 5 triplets can be generated with COLMAP, but with our method, 10 out of 20 camera poses can be estimated.We reached a 100% improvement over the standard pipeline on image triplets.The sample successful cases are shown in Figs. 1 and 11.
A quantitative comparison with other trifocal methods reported in [21] on datasets Fountain P-11 and Herz-Jesu-P8 is shown in Table 1 for the Chicago problem, illustrating that our method is comparable to or better than other trifocal methods.

CONCLUSION
We presented a new calibrated trifocal minimal problem, an analysis demonstrating its number of solutions, and a practical solver by specializing computation techniques from numerical algebraic geometry.We showed our approach generalizes to characterize and  solve a similar difficult minimal problem with mixed points and lines in three views.Both problems are representative of a myriad of similar minimal problems in multiple views analyzed with the techniques initiated with the present work [8], [9], [10], [11], [29].
The increased ability to solve trifocal problems with points and lines is key to future work on broader problems appearing when observing general 3D curves, e.g., in scenes without enough point features, using differential geometry [12], [15].As a first step, our trifocal solvers have been partially integrated into the SfM pipeline OpenMVG [92] for use with SIFT orientation, and we are working to integrate and verify their robustness advantages also with COLMAP.Our "100 lines of custom-made solution tracking code" have also already been employed to build practical, fast solvers [34] for other minimal problems which have not been efficiently solved with Gröbner bases [84].

•
R. Fabbri is with the Department of Computational Modeling, Polytechnic Institute, Rio de Janeiro State University, Nova Friburgo, Brazil.He is supported by UERJ Prociência award, NSF IIS-1910530 and FAPERJ Jovem Cientista do Nosso Estado E-26/201.557/2014.E-mail: rfabbri@gmail.com• A. Leykin is at Georgia Tech and are supported by NSF DMS-1151297.• T. Duff is at University of Washington.He acknowledges support from an NSF Mathematical Sciences Postdoctoral Research Fellowship (DMS-2103310), as well as partial support from NSF DMS-1151297.• B. Kimia and H. Fan are with the School of Engineering, Brown University.They are supported by the NSF grant IIS-1910530.• D. de Pinho is with UENF, Brazil • P. Giblin is with the University of Liverpool.• M. Regan is with Duke University and is supported by NSF CCF-1812746 and the Schmitt Leadership Fellowship in Science and Engineering.• E. Tsigaridas is with INRIA Paris.• C. Wampler is with the University of Notre Dame.• J. Hauenstein is with the University of Notre Dame and is supported by NSF CCF-1812746 and ONR N00014-16-1-2722.• T. Pajdla is with the Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, and is supported by the EU Regional Development Fund IMPACT CZ.02.1.01/0.0/0.0/15003/0000468 and EU H2020 project ARtwin 856994.• This work initiated while the authors were in residence at Brown University's Institute for Computational and Experimental Research in Mathematics -ICERM, in Providence, RI, during the Fall 2018 and Spring 2019 semesters, with early versions at arXiv 23Mar2019 04:26UTC and CVPR 2020.(NSF DMS-1439786 and the Simons Foundation grant 507536).

2 •
C L Summing all the contributions to dim Y − dim W , we have Theorem 3.1.Let x .

Fig. 5 .Fig. 6 .
Fig. 5. (a) Errors of computed pose are small showing that the solver is numerically stable.(b) The distributions of the numbers of solutions.

Fig. 9 .
Fig. 9. Trifocal relative pose results for EPFL dataset.Each row shows images with ground truth (green) and estimated poses (red outline).

Fig. 10 .
Fig. 10.Trifocal relative pose results for Amsterdam Teahouse: a triplet of images that COLMAP is able to tackle (top) and where it fails (bottom).Results: COLMAP (blue outline), ours (red), and ground truth (green).

Fig. 11 .
Fig. 11.Trifocal relative pose results for a dataset comprising three mugs, which is challenging for traditional SfM.Each shows images with ground truth (green) and estimated poses (red outline).
Pose error of our method vs. other trifocal methods.Our method has better rotation error and comparable translation error.