Learning contact corrections for handle-based subspace dynamics

This paper introduces a novel subspace method for the simulation of dynamic deformations. The method augments existing linear handle-based subspace formulations with nonlinear learning-based corrections parameterized by the same subspace. Together, they produce a compact nonlinear model that combines the fast dynamics and overall contact-based interaction of subspace methods, with the highly detailed deformations of learning-based methods. We propose a formulation of the model with nonlinear corrections applied on the local undeformed setting, and decoupling internal and external contact-driven corrections. We define a simple mapping of these corrections to the global setting, an efficient implementation for dynamic simulation, and a training pipeline to generate examples that efficiently cover the interaction space. Altogether, the method achieves unprecedented combination of speed and contact-driven deformation detail.


INTRODUCTION
Subspace simulation models define a compact space for the animation of complex objects, without the constraints of mesh resolution. They have demonstrated the ability to produce expressive simulations under low computational cost ; Barbič and James 2005;Hauser et al. 2003;Krysl et al. 2001;Pentland and Williams 1989]. They are not free of limitations though, as they suffer to produce high-frequency details, e.g., resulting from contact.
Learning methods find effective parameterizations to model complex nonlinear functions. Despite recent important breakthroughs [Battaglia et al. 2016;Chang et al. 2017;Greydanus et al. 2019;Li et al. 2019;Sanchez-Gonzalez et al. 2018;Wiewel et al. 2019], it is yet unclear whether learning approaches can represent the generality of high-resolution dynamics under contact. However, they have succeeded at capturing high-resolution skeletal and rig-based animations [Patel et al. 2020;Santesteban et al. 2020Santesteban et al. , 2019Song et al. 2020].
In our work, we want to combine the best features of subspace and learning models for dynamic simulation of deformable objects. To do so, we design a new subspace simulation model, presented in Section 3. The model aggregates a linear global deformation and a nonlinear local correction, both parameterized by a common set of subspace handle-based degrees of freedom. We also define an efficient mapping of the nonlinear corrections to the global setting, as a function of the subspace. Thanks to learning-based modeling of the local corrections, we attain highly detailed and accurate contact-driven deformations. At the same time, thanks to the subspace parameterization of the aggregate deformation, we attain fast dynamics and overall contact-based interaction. Combining the features of subspace and learning models, we achieve dynamic simulations with unprecedented combination of speed and contactdriven deformation detail.
Our method captures in a consistent way nonlinear corrections due to both internal deformations and external interactions. We have designed a data-generation and training pipeline that supports different types of corrections and interactions. As we describe in Section 4, this pipeline requires mapping interactions and full-space deformations to the linear subspace, and we discuss how the choice of subspace can simplify this task.
All in all, we introduce a simple method that allows the efficient simulation of many interesting phenomena. We showcase simulations where dynamics are efficiently resolved in a subspace, and they are enriched with accurate data-driven quasi-static corrections. As the human eye is less perceptive of detail under motion, and highfrequency oscillations tend to dampen quicker than low-frequency oscillations, we find that our approach produces simulations that are barely distinguishable from full-space results. We also tackle the simulation of local contact deformations, a classic challenge for subspace methods. We showcase simulations where we learn these high-detail deformations as a function of the relative configuration between colliding objects and the subspace, and seamlessly aggregate the corrections with the subspace dynamics. We demonstrate the method on examples that signify its applicability, such as the simulation of microtextures, soft robots, or soft skeletal bodies, as shown in Figure 1 and throughout the paper.

RELATED WORK
Our work is related to multiple methods for simulation and animation of deformable objects. We classify them into three categories: methods that combine global and local deformations for simulation or animation; subspace simulation or model reduction; and learning-based simulation.

Aggregation of Global and Local Deformations
Our subspace deformation model can be regarded as a relative of pose-space deformation (PSD) [Lewis et al. 2000], in the sense that local deformations are parameterized by the subspace pose, they are defined in an unposed setting, and they are mapped to the global setting as a function of pose. There are many differences though. PSD is typically used for artist-or mocap-driven animation, not for dynamic simulation; the subspace of PSD is a skeletal pose, not a generic linear deformation model; local corrections model internal deformations, not external interactions; and the mapping of local corrections is explicitly defined by the skinning transformation, not derived from an arbitrary global deformation model as in our case.
Nevertheless, the basic PSD scheme has been extended in multiple ways, and some bring closer ties to some of our model's features. EigenSkin [Kry et al. 2002], for instance, reduces the dimensionality of the correction field for human hands using principal component analysis, as we do. SMPL [Loper et al. 2015] models corrections of human bodies using blendshapes, which are trained from multiple scans. Bailey et al. [2018] recently proposed a machine-learning method to efficiently approximate complex character skinning rigs as a local nonlinear correction to the base linear skinning. Many other works also train correction models from example data, and the approach has been successfully applied to the skeletal animation of cloth [Wang et al. 2010]. As an alternative to learning corrections, the method of Wang et al. [2007] learns deformation gradients and then reconstructs the deformation. Other animation methods use more diverse definitions of pose, to extend beyond skeletal animation, such as local surface deformation for faces [Bickel et al. 2008] or cloth [Kavan et al. 2011;Zurdo et al. 2013]. The recent work of Song et al. [2020] uses an animation rig as a generalization of pose, and learns both global and local deformation as a function of the rig parameters. In contrast, we use a linear subspace model for global deformation and learn only nonlinear corrections, which largely simplifies the model, in particular for its use in dynamic simulation. Recent works leverage machine-learning methods to learn dynamic corrections as a function of pose and its time evolution. The approach has been applied to bodies  and cloth [Santesteban et al. 2019].
Three major differences stand out in our work in contrast to PSD and its many derivatives. First, we define a compact subspace model of global plus local deformations for general deformable objects, not just skeleton-driven shapes. Second, our local corrections are also parameterized by external interactions, and hence allow datadriven contact simulation. And third, we use the subspace model for dynamic simulation, which requires derivatives of the model, and careful interpretation and approximation of these derivatives for efficiency.
If we look at dynamic simulations, there are other ways of combining global and local deformations. In this area, the focus is not necessarily on dimensionality reduction, and the local corrections are represented in a high-dimensional space. Separation into global and local deformations may have other advantages, such as better modeling of mechanical phenomena or faster solvers. Two prominent examples of aggregate global-local dynamic simulation methods are Eulerian-on-Lagrangian simulation [Fan et al. 2013] and multifarious hierarchies [Malgat et al. 2015]. Our model shares with these works the need to define a mapping for the local corrections, and we define the mapping similar to Malgat et al., through linearization of the global deformation. However, because in our case both global and local deformations are parameterized by the same subspace, we pay more attention to the interpretation of the mapping and its approximation in the context of dynamic simulation. To the best of our knowledge, no previous work applied this idea to linear subspace models. We show that the mapping can be defined by a modulation of the linear subspace, computed by finding gradients of the basis components.

Subspace Simulation Methods
In the context of dynamic simulation, the creation of subspace models allows fast approximation to the equations of motion, by ignoring high-frequency deformations. The most popular approach is to use a linear subspace model to approximate global deformations, built from modal analysis [Pentland and Williams 1989], principal component analysis of deformation examples [Krysl et al. 2001], modal derivatives [Barbič and James 2005], sparse frames [Brandt et al. 2018;Gilles et al. 2011], or sparse handles [Wang et al. 2015]. Some notable exceptions, which use nonlinear subspaces, are based on animation rigs [Hahn et al. 2012[Hahn et al. , 2013 or rotation-strain coordinates [Pan et al. 2015]. Recently, Fulton et al. [2019] introduced the use of variational autoencoders to automatically infer compact nonlinear subspaces for dynamic deformations. On the contrary, we use machine learning to model only nonlinear corrections, and we parameterize these corrections explicitly, as a function of both the linear subspace and external interactions. We achieve detailed deformations not shown by purely learning-based subspace models.
Linear subspace models have also been used to simulate only local deformations on top of some different global deformation model [Kim and James 2011]. Moreover, subspace local deformations can be aggregated with other deformations, such as pose-based blendshapes [Tapia et al. 2021]. As a way to increase the accuracy of subspace deformations, several works [Hahn et al. 2014;Xu and Barbič 2016] blend local linear subspace models in a pose-dependent manner. This approach can be regarded as an alternative to our local nonlinear model. However, finding the appropriate local linear models and blending functions is not a simple task for general deformable objects. We demonstrate that modern machine-learning methods allow one to bypass this task, learning a nonlinear model instead. Note also that in our model local deformations are parameterized by a combination of the global deformation and an interaction state, while the methods cited above use additional degrees of freedom.
A different approach to increase the accuracy of subspace simulation methods, particularly for contact-based interactions, is to locally enrich the simulation model. Harmon and Zorin [2013] enrich a linear subspace model with locally supported basis functions precomputed using a Boussinesq contact model. Teng et al. [2015] select submeshes that are simulated with nodal degrees of freedom, while the rest of the object uses a linear subspace representation. Both regions are coupled accurately and efficiently using a condensation method. Enrichment methods present pros and cons with respect to our approach, and we see them better suited for different types of applications. Our approach is highly optimized for modeling external interactions that admit a compact parameterization, such as interaction with rigid colliders. Enrichment methods, on the other hand, support general interaction, but cannot be optimized for particular interactions. An additional challenge for contact simulations with subspace methods is the efficient yet accurate evaluation of contact forces. To this end, Teng et al. [2014] presented a cubature method for self-collisions in subspace skeletal deformations.

Learning-Based Deformable Simulation
Prior to the explosion of neural-network methods, de Aguiar et al.
[2010] designed a learning-based second-order model of cloth deformation with stability guarantees. Kim et al. [2013] showed how to encode complex dynamics of cloth using motion graphs. In recent years, and mostly based on neural networks, machine learning methods have been used in very diverse ways in deformable object simulation.
NNWarp [Luo et al. 2020] learns the correction between linear and nonlinear materials as a warping function, and thus simplifies the simulation of complex nonlinear materials. Holden et al. [2019] propose a learning-based representation of the full dynamic interaction between a dynamic object and some collider(s). Their method bears some similarities to ours; however, by trying to learn the full dynamic behavior, they pay a loss of detail and accuracy. They also model the deforming object using a subspace representation, but deformation detail is limited to the global linear subspace, whereas our approach learns nonlinear corrections with fine detail. Their learning model also takes as input the interaction space described by the configuration of the collider, but dynamics are strongly damped on test scenes, whereas our approach retains full dynamics of the linear subspace. As shown in Figure 2, learning local nonlinear corrections, as we do, is a simpler problem, and leads to higher detail and better generalization.
Several other works have modeled deformations driven by skeletal motion, a problem that falls in the scope of the PSD methods described above in Section 2.1. Some of the interesting developments include the use of convolutional networks for mesh-based deformations [Chentanez et al. 2020], and robust learning of deformation dynamics under scarce training data [Santesteban et al. 2020].
Beyond computer graphics, recent efforts on machine learning look at representations of the common invariants and/or processes Global corrections are more difficult to learn, and suffer artifacts (right). In the example, both local and global corrections use the same training data and neural-network architecture.
involved in mechanics. Some of the examples include modeling collisions and deformations using graph representations [Battaglia et al. 2016], producing generic neural-network representations of mechanical evolution using composable objects and their interactions [Chang et al. 2017], modeling multi-physics phenomena through learned particle-based models [Li et al. 2019], or modeling physical processes by learning invariants and training with measurable functions of these invariants [Greydanus et al. 2019].

CORRECTED SUBSPACE DEFORMATIONS
In this section, we present our subspace deformation model. We start by formulating the combination of a linear subspace model, nonlinear local corrections, and the mapping of these corrections to the global setting. All these components are parameterized by the same reduced handle-based degrees of freedom (DoFs). To allow the computation of forces and velocities, we also derive the Jacobian of our aggregate subspace model, and we analyze computationally efficient approximations. We conclude by discussing the application of variational solvers for dynamic and static deformations.

Formulation of the Subspace
As outlined in the introduction, we construct the subspace model as the addition of a reduced-order linear deformation and a nonlinear local correction. For the linear deformation, we choose the biharmonic generalized barycentric coordinates (BGBC) [Wang et al. 2015]. BGBC allow an intuitive definition of the linear subspace basis, formed by the transformations of points and rigid frames (referred to as handles), and we leverage this intuitive basis to construct compact parameterizations of the corrections in Section 4. Other choices of frame-based models [Brandt et al. 2018;Gilles et al. 2011] would also be suited for the definition of the subspace. With handle DoFs q and BGBC basis U, the linear portion of our subspace model is U q. To extend the accuracy of the linear handle-based  (4). However, deformation errors are evident (bottom) if we use the Jacobian of the linear subspace and ignore the change in the corrections r q ; hence we retain this term.
subspace, we construct a nonlinear correction. In Section 4 we will discuss the details of this correction; for now we consider a general correction with nonlinear dependency on the reduced DoFs, r(q). Similar to pose-space deformation [Lewis et al. 2000], we express the correction in a local setting. This choice simplifies learning and hence maximizes the accuracy of the correction field, as shown in Figure 3. We map the local corrections to the full space using the deformation gradient F(q) of the linear subspace deformation. Notice that this mapping corresponds to a first-order approximation of a correction applied to the undeformed setting [Malgat et al. 2015]. Adding the linear and nonlinear components together, we can express our nonlinear subspace deformation model as: (1) In practice, we compute the deformation gradient on tetrahedral elements [Irving et al. 2004] and then perform a moving least-squares approximation on nodes ]. In the remainder of the section, we drop the explicit dependency of q from the various terms in (1). Interestingly, the nonlinear correction to the linear subspace deformation can also be interpreted as a modulation of the linear basis. To this end, we rewrite (1) by reversing the order of F r as mat(r) vec(F). Furthermore, the deformation gradient can be expressed through a linear operation mat(∇) on the subspace deformation u. Using the matrices mat(r) and mat(∇), we rewrite our corrected subspace model (1) as with W = mat(r) mat(∇). As evidenced in this expression, the nonlinear correction can be interpreted as an incremental modulation W U to the linear subspace basis U. This modulation weights the gradient of the subspace basis by the corrections. In practice, for the evaluation of full-space positions x, we use (1), after computing F explicitly. For the transformation of forces to the subspace, however, it is convenient to analyze the basis modulation (2), as we see next.

Jacobian of Subspace Kinematics
A key ingredient of the subspace model is the Jacobian J that linearizes the mapping between the subspace DoFs q and the full-space deformation x. Differentiating (1) and (2), we obtain: Using this Jacobian, one can transform subspace velocities q to the full-space as x = J q, and full-space forces f x to the subspace as Most of the computational overhead of our subspace corrections lies in the evaluation of the Jacobian. Therefore, we pay attention to the relevance of the terms W and r q in (3). Figure 4 shows a representative example where we evaluate different approximations of J. We have observed that the term r q bears an important role in the computation of forces and resulting deformations, hence it should not be ignored.
On the other hand, the term W, which carries the Jacobian of the deformation gradient, can be safely discarded in the computation of forces. This is no surprise; as subspace deformations are smooth, the Jacobian of their deformation gradient is comparatively small. Dropping this term can be paralleled to ignoring the derivative of rotations in corotational elasticity models Xu et al. 2015], but the effect is even milder for subspace deformations.
Based on our experiments, we conclude to approximate the Jacobian (3) as We have also experimented with using this Jacobian for force computations and the approximation J ≈ U to build the Hessian. However, this approximation results in excessive damping and slows down the convergence of Newton solves.

Dynamics and Integration
In To time-step the rigid frames in the BGBC reduced DoFs q, we parameterize the rotations in their tangent-space [Taylor and Kriegman 1994]. M denotes the full-space mass matrix and the potential energy. Full-space forces are defined as f x = −∇ . Our work admits general elasticity models and discretizations for the definition of full-space forces. In our examples, we have used a Neo-Hookean material [Smith et al. 2018] with tetrahedral FEM discretization. The optimality of (5) yields the following nonlinear equations: We solve these equations using a quasi-Newton method, where we approximate the Hessian of (5) as J 1 ℎ 2 M − f x x J. As done often for subspace methods, we use a cubature approximation of forces and Hessians ]. After training cubature points {x } and weights { } [von Tycowicz et al. 2013], one can approximate subspace forces (and similarly their Jacobian) as f q ≈ J f x, , where f x, and J are, respectively, the force and the Jacobian at the cubature point. In our implementation, we use the same cubature approximation to project the mass matrix M to the subspace. Fig. 7. Data generation pipeline. First, a) we interactively record a linear-subspace dynamic simulation, and b) use the recorded interaction to generate an offline full dynamic simulation. c) For each frame, we extract a representative subspace stateq. Then, we fix the DoFs corresponding to the subspace (in purple) and run two full static simulations, d) ignoring and f) including, external interactions. Nonlinear corrections are then computed by considering the difference between these full static deformations and the linear subspace solution Uq in e). Internal corrections are generated by g) mapping the difference to the undeformed setting using F −1 . Finally, external corrections are generated in two steps: first, h) the difference w.r.t. the linear subspace solution is again mapped to the undeformed setting; and second, i) internal corrections are substracted to account only for the effect of external interactions.

LEARNING CORRECTIONS
In a fully dynamic setting, the deformation of an object depends on its velocity, acceleration, and external forces. We represent dynamics in the linear subspace, but we want to retain the accuracy of nonlinear (quasi-)static deformations. We consider two sources of error in the linear subspace, and therefore model two separate corrections: internal corrections r int , which correct the linear subspace deformation in the absence of contact, and external corrections r ext , which correct the additional deviation introduced by contact. This separation into internal and external corrections, highlighted in Figure 5, simplifies the generation of representative training data, and hence maximizes the accuracy of the aggregate correction, as shown in Figure 6.
We start this section with a detailed definition and formulation of the internal and external corrections. Then, we describe the generation of training data for both types of corrections, following the pipeline outlined in Figure 7. And we conclude with a discussion of implementation details of the learning architecture.

Internal and External Corrections
Given a subspace state q and constant external forces (i.e., gravity), but no other external interactions, the internal corrections r int represent the deviation between the full-space equilibrium deformation and the full-space positions given by the linear subspace, U q. On the other hand, given a subspace state q and an external interaction state, the external corrections r ext represent the deviation between the full-space equilibrium deformation and the full-space positions given by the internally corrected subspace, U q + F r int . Figure 5 demonstrates the aggregation of internal and external corrections.
We have considered external interactions due to kinematic colliders, but the formulation could be extended to other types of interactions. Note that interactions produced by prescribing some subspace DoFs q (e.g., moving handles of the subspace model) can be represented as part of internal corrections. We denote the interaction state as z, which in our case may include the state and size of rigid colliders. For better learning ability, we parameterize the corrections expressing the interaction state relative to the subspace state q. Here, handle-based reduced models such as BGBC come handy. We can define rigid transformations A(q) for the handles, and invert them to define relative external interactions A(q) −1 z.
Formally, our nonlinear correction is then split into internal and external corrections as: r = r int (q) + r ext (A(q) −1 z).
By separating internal and external corrections, we avoid the combinatorial complexity of training for all possible internal and external interaction states. We can train internal corrections free of external interactions, and we can train external interactions only in the vicinity of the deformable object. Next, we describe our data generation pipeline.

Data Generation
The generation of training data follows a strategy parallel to the decoupling of internal and external corrections. We visit separately (i) the configuration space of the deformable object, and (ii) the relative configuration space of the collider. For (i), as the space is very large and difficult to predict, we follow a user-guided sampling approach [Barbič and James 2005]. For (ii), we follow an automated sampling approach, and traverse with the collider the surface of the deformable object on the configurations obtained in (i). Our decoupled sampling of (i) and (ii) is beneficial in two ways: it removes the need to explore (i) and (ii) together, which is hard even through user interaction, and it naturally produces training data to separately learn internal and external corrections. Based on this decoupling, the data generation pipeline proceeds in three steps: generation of representative states, generation and training of internal corrections, and generation and training of external corrections. A detailed representation of the data generation pipeline is shown in Figure 7.
To generate representative states, we first execute fast dynamic simulations using the baseline linear subspace model (Figure 7, a). These simulations are interactive in our examples, and one can move colliders and apply forces to quickly visit a large number of states. Next, we replay the same interactions, but we simulate deformations using the full-space model (Figure 7, b). For each frame of these simulations, we project the full-space positions x to the subspace, using a least-squares mapping q = (U U) −1 U x. This projection yields a set of representative subspace states {q } and the corresponding full-space positions {Uq } (Figure 7, c and e).
To generate internal correction targets, we must remove the effect of dynamics and external interactions from the full-space deformations described above, but leaving the subspace state unchanged. To this end, we compute constrained static deformations (Figure 7, d). For each representative subspace stateq , we compute the static full-space deformationx int, , such that it is constrained to the given subspace state. With our choice of handle-based subspace model, enforcing the constraints is as simple as fixing the full-space DoFs corresponding to the handles. From the subspace and full states, we obtain target internal corrections simply by undoing our nonlinear subspace formulation (1): At this point, we use these internal correction targets to train the internal correction r int (q) (Figure 7, g).
To generate external correction targets, we need to reintroduce the effect of external interactions on the representative subspace states. For each representative subspace state, we generate multiple interaction states, traversing with the collider the surface of the deformable object at varying depths. Without loss of generality, in the remainder we refer to one pair of subspace and interaction states. Given an interaction z , we compute the static full-space deformationx ext, that is constrained to a given subspace stateq (Figure 7, f). From the subspace and full states, we obtain target external corrections simply by undoing the subspace formulation (1) (Figure 7, h). However, this time we also subtract the internal corrections: At this point, we use these external correction targets to train the external correction r ext (A(q) −1 z) (Figure 7, i).

Learning Architecture and Training
We learn separate models for internal and external corrections, but we follow the same methodology for both. Therefore, in this section we refer to arbitrary corrections r. We have observed that corrections exhibit high coherence, hence we use principal component analysis (PCA) to reduce their dimensionality.
We use a fully connected, 2-layer neural network to model each type of nonlinear correction. For internal corrections, the input is the subspace state q, and for external corrections, the input is the relative interaction state A(q) −1 z, as shown in (7). In both cases, the output of the network is the PCA representation of the corrections. We use tanh as activation function, and we have implemented the neural networks using PyTorch.
We use as training data the target corrections discussed in the previous section, together with their corresponding subspace and interaction states. We use as loss function the 2 norm of the difference between target and estimated corrections, and we optimize the networks using Adam, 1000 to 2000 epochs, a batch size of 512, and learning rate of 1e-3. As done typically in machine learning methods, we separate a random subset of the training data and we use it as test data to monitor the convergence of the optimization of the neural network. This test data is different from that shown in the examples and video, which is made of completely new interactions, not used during training at all.
At runtime, the neural network is needed for the evaluation of displacements, but also for the transformation of full-space forces and the system Hessian to the subspace, as discussed in Section 3.3. Recall that the approximation of the Jacobian of our subspace model (3), requires the Jacobian of nonlinear corrections r q , as shown in (4). We use a matrix-free implementation of a conjugate-gradient solver, which in turn uses products r q v and r q v with vectors v. Our implementation of the neural network on PyTorch includes gradient back-propagation capabilities, which address the evaluation of r q T v. For r q v, we do the following. We implement a function y = r q w once per system solve, by back-propagation of an arbitrary vector w through the network. Then, on each conjugate gradient iteration, we back-propagate the vector v through the function y(w) to obtain r q v = y w v.  Table 1 summarizes the settings, model size, and performance of the examples shown in the paper. Please watch the accompanying video for results produced by spontaneous user interactions, which demonstrate generalization outside the training data. All the examples were executed on an Intel Core i7-7700K 4-core 4.20 GHz PC with 32 GB of RAM. Next, we discuss in detail the different experiments.

EXPERIMENTS AND DISCUSSION
Microstructures. The combination of different materials at a microscopic level can produce interesting macroscopic mechanical behaviors. However, the simulation of such microstructures at full resolution yields a very high computational cost. One approach to avoid this cost is to use numerical coarsening methods [Schumacher et al. 2015]. Nevertheless, numerical coarsening methods assume a linear response of the microstructure with respect to coarse DoFs [Chen et al. 2018;Kharevych et al. 2009;Torres et al. 2016].
We have explored the use of our model for the simulation of microstructures, by augmenting the linear subspace model with nonlinear internal corrections. In Figure 8 we show the application of our model to two different microstructures: an auxetic microstructure (top), and heterogeneous accordion-like stripes (bottom). In both cases, the difference with respect to the full-resolution simulation is almost imperceptible. With just 2 rigid handles (controlled by the user) and 16 point handles, both dynamics and detailed static deformations are reproduced very accurately. Notice how the purely linear model misses the fundamental behavior of the auxetic material (negative Poisson's ratio) and the ripples of the accordion-like stripes. Both effects are matched with our learning-based nonlinear corrections.
Jelly. This example (see Figure 1) showcases soft 2D dynamics augmented with data-driven contact. The object is modeled on a subspace defined by just 1 rigid handle (controlled by the user) and 8 point handles, and we learn separately, as two disjoint models, external corrections produced by a comb-like collider which combines both a large contact area and small protrusions, and a star with pointy features. The purely linear subspace model suffers notable distortions and misses detailed contact deformations. Previous methods for local enrichment of subspace models [Harmon and Zorin 2013;Teng et al. 2015] assume a moderate contact area to be efficient, and would not scale well on the comb example. Our model, on the other hand, matches accurately the deformations of the full model, with a performance that comes close to the linear subspace model. Note also that dynamics are well captured in the subspace, i.e., high-resolution dynamics of the full model are quickly damped.
The proposed subspace model succeeds to capture detailed contactdriven deformations, but the challenge to accurately learn these deformations grows with the complexity and configuration space of the collider. In Table 2, we compare quantitatively the accuracy of the subspace jelly model for three different colliders: (i) the comblike collider of Figure 1, which produces a large and complex contact area and has a 3D configuration space (translation and rotation in 2D); (ii) the same comb-like collider but restricted to a 2D configuration space (with no rotation); (iii) and a small circle-like collider, which produces a small contact area and has a 2D configuration space. As summarized in the table, our model learns well the interaction with the small circle even with a small neural network Table 2. Evaluation of model accuracy as a function of the complexity of the collider and its configuration space, the size of the training data set, and the complexity of the neural network architecture. The benchmark for the comparisons is the jelly object in Figure 1, using as colliders a small circle and a large comb-like object. Accuracy is measured as the RMSE of vertex displacements w.r.t. the linear model across all vertices in the object and all frames of the test data set, normalized by the RMS of the same vertex displacements. See also Figure 9 for a visual comparison of some cases. Comb (no rotation) Comb Fig. 9. Our subspace model successfully represents contact deformations due to both small and large colliders with high-resolution features. Nevertheless, large colliders with larger configuration space (e.g., the comb-like object on the right) require a larger training set and larger network architecture. A quantitative analysis of the error is summarized in Table 2. and a small training set. However, as the complexity and configuration space of the collider grow, both the complexity of the neural network and the training set must grow. With small network and training set, the model captures well the global correction to the linear deformation, but fails to learn high-frequency details of the interaction with the complex comb. A qualitative comparison of results is shown in Figure 9. The star collider of Figure 1 also requires a complex network and a large training set, like the comb, due to the size of its configuration space and its pointy features, as indicated in Table 1.
Soft-Robot Worm. We have modeled a worm-like soft robot, with three bones surrounded by soft material (See Figure 10). We simulate the worm using just 3 rigid handles, colocated with the bones, and no point handles. Even under such a compact subspace, we show that our corrected model succeeds to match the dynamic and contact-driven deformations of a full simulation.
We produce training data by pulling with springs from the bones, and interacting with just one circular pin. Note that we use up to six pins in one example at runtime, as discussed below. We train internal and external corrections, following the procedure described in Section 4. Figure 5 showcases both the internal and external corrections during spontaneous interactions outside the training data. Internal corrections are most evident in the soft regions between the bones. Conversely, the purely linear model suffers evident locking, and as a result it cannot stretch as it should. External corrections are most evident at the head of the worm. Conversely, the head of the purely linear model remains locally rigid.
We also test the worm model on a more complex setting, well outside the training settings. Figure 10 shows the worm being pulled through a narrow passage, where it collides against six pins. By modeling external corrections separately from internal corrections, their effect is mostly local. Then, if multiple colliders act sufficiently far from each other, we can safely assume superposition of their effects. Therefore, in this example, we train with just one pin, but we run the simulation with six pins, reusing six times the same neural network of external corrections. As we pull the worm through the passage, we monitor the necessary pulling force. With our model, the peak force is 1.8× larger than with the full model. With the purely linear model, however, the peak force is 5.6× larger. The linear model suffers strong distortions in the regions between bones, and cannot deform locally to conform to the shape of the pins. Our model does not suffer any of these limitations, and hence the force and overall motion are closer to the full model.
In our model, we decouple different sources of deformation, as we hypothesize that this explicit disentanglement simplifies learning and provides higher accuracy. To validate this hypothesis, we try to learn a fully nonlinear subspace model for the worm, on a simple example with no contact, as shown in Figure 2. We use the same DoFs as in our model, i.e., the rigid transformations of the bones, and we use a neural network with the same complexity. To define the output of the model, we run PCA on the full positions of the training data. With our subspace model, the RMSE of vertex displacements w.r.t. the linear model across all vertices in the worm object and all frames of the test data set, normalized by the RMS of those same vertex displacements, is just 15%. With the fully learned model this error grows beyond 4, 000%. Learning corrections on a global frame is not sufficient, and the error remains high at 75%, as depicted in Figure 3. We cannot claim that it is not possible to learn full deformations directly; in fact Holden et al. [2019] managed to Fig. 12. We model a finger with just 3 frame handles located at the phalanges. The full nonlinear deformation of the surrounding tissue is captured by our learning-based corrections. Moreover, in this example we learn external corrections as a function of the size of the spherical collider, opening the possibility of using parametric shape models.
learn full deformations, albeit with a more complex neural network, and with poor generalization and overdamped dynamics. Nevertheless, we confirm that our explicit disentanglement simplifies the problem. The decoupling of internal and external corrections is also critical for the accuracy of our model. Figure 6 shows a different comparison, this time including contact, of our model vs. a model of the same total network complexity with coupled learning of internal and external corrections. In this comparison, the normalized RMSE with our model is 19%, and grows to 51% with coupled corrections.
Bunny. In Figure 11 we show how we use our model to augment a linear subspace model of Big Buck Bunny with data-driven contact deformations.
The model contains a rigid core surrounded by a soft layer, and the linear subspace model is built using the inner rigid core as a frame handle, together with 24 point handles on the outer surface. We train external corrections due to contact with a spherical collider, following the pipeline described in Section 4.2. In the test simulation, it becomes apparent that the linear subspace model fails to produce correct contact deformations, as the point handles are too sparse. Our model, on the other hand, succeeds at producing contact deformations very close to those of the full model. Please watch the video to see contact deformations together with dynamics.
Finger. To conclude, we have also used our model to simulate deformations of a soft skeletal finger model, shown in Figure 12. The finger is modeled with rigid anthropomorphic phalanges, surrounded by homogeneous soft tissue. We have built the linear model using just the 3 moving phalanges and the fixed palm as rigid handles, with no point handles. We fix the pose of the finger with springs, and as a result the change of finger pose in the example is small. Therefore, we have opted not to model internal corrections, and we have trained external corrections on a fixed pose of the phalanges. The results demonstrate that our approach produces an extremely compact subspace model, with contact-driven deformations that are far from the resolution that can be achieved with the purely linear model. At this point, the limiting factor was the resolution of the mesh, not the size of the collider. We would need to increase the resolution of the mesh to ensure smooth contact as a smaller collider traverses the surface.
On this example, we also explored the ability to learn corrections as a function of other interaction parameters, such as the size of the collider. We generated training data with 4 different sphere radii, and at runtime we tested arbitrary in-between values. Even though the example explores a very limited shape parameterization, it opens up the possibility of training interactions with parametric and generative shape models [Loper et al. 2015;Wu et al. 2016].

LIMITATIONS AND FUTURE WORK
We have presented an approach to design compact yet accurate subspace simulation models, by aggregating global linear handle-based subspace deformations and local nonlinear corrections. We have shown that local corrections can be effectively learned from deformation examples, through a separation into internal and external (contact-driven) corrections. The model enables fast simulations with a combination of interaction dynamics and deformation detail that is unprecedented to the best of our knowledge. Nevertheless, there are interesting avenues for future work, which could address current limitations and extend the applicability of the approach.
The method is heavy on preprocessing, as it requires extensive precomputation of high-resolution contact simulations in order to accurately learn contact-driven deformations. This could be alleviated through more sparse sampling of contact simulations, perhaps thanks to changes to the neural network architecture and/or optimization method to achieve better generalization under sparse data, through self-supervised training methods, or by designing more atomic correction strategies not at the whole object level as we do.
The model cannot handle arbitrary contact, and it is currently limited to rigid kinematic colliders. The formulation is general and it admits deformable colliders, by inputting their state to the external correction model. However, scalability is unclear, both in terms of training complexity and generalization ability. At a theoretical level, the formulation can also be extended to support simulated colliders, and in that case the elastic energy of the object under study would depend on the state of the collider through the corrections. At a practical level, this dependency could complicate runtime efficiency though, potentially introducing dense coupling in the Hessian of the full simulation.
Our implementation of the subspace model uses a frame-based approach for the linear basis, but the formulation admits more general linear subspace models, such as modal bases [Pentland and Williams 1989] or modal derivatives [Barbič and James 2005]. The parameterization of nonlinear corrections and the data generation process exploit the handle-based basis, and would need to be reformulated for more general linear models. As described in Section 4.1, we learn external corrections as a function of relative transformations of the collider, A(q) −1 z. A(q) −1 works only for handle-based models, but for general linear models relative transformations could be encoded using more general feature vectors, such as pairwise distances between sets of points in the object and the collider. As described in Section 4.2, in a couple steps of the data generation process we must constrain the full-space deformation to the subspace. For general linear subspace models, this can be executed using the least-squares mapping q = (U U) −1 U x from full space to subspace, and setting a constraint on the resulting subspace configuration. Currently, the model admits only quasi-static corrections, but no dynamic corrections. Learning-based dynamic corrections could be approached in two ways, as done by other methods: by explicitly inputting previous states to the learning architecture [Casas and Otaduy 2018;Holden et al. 2019], or by building a recurrent learning architecture [Santesteban et al. 2019]. Friction is another source of trajectory-dependent deformations. Friction could be handled in a way similar to dynamics, e.g., by introducing previous states of the collider to the learning architecture, or also by modeling the friction state as an input explicitly [Verschoor et al. 2020].