Hi guys,

I've been working on a small step XPBD-based implementation of FEM off and on for the last several months, and I think I finally have something that someone might find interesting. Here's a WASM demo showing the current state of things: https://jak-xyz.github.io/xpbd-fem/. At some point I had delusions of grandeur of building an interactive walkthrough to show off the features, but unfortunately right now you're just presented with a wall of controls and an even bigger wall of text explaining them.

I think the two most interesting things about the demo are that it attempts to tackle incompressible FEM and it shows off higher-order elements (like quadratic quads/hexahedra). Until you peek at the demo this next sentence won't make too much sense, but the performance of the Sub energy function with Q4 and H8 element types seems promising. You can keep the number of steps/second pretty low and still maintain good stability and volume preservation, especially with 2-3 extra volume constraint passes.

Interestingly, I've found the WASM version running in Chrome to perform indistinguishably from native version. I have not attempted to add WASM SIMD support, but with an older version of the code I was getting a full 8x speedup with a trivial structure of arrays AVX2 implementation in a native build. The one issue I had to work around is that small step XPBD tends to require double precision floats and, in fact, single precision floats were insufficient for storing nodal position values. The constraints themselves, on the other hand, are calculated in normalized element space, which is not so sensitive, so the vast majority of calculations can be on regular floats.

The code is here: https://github.com/jak-xyz/xpbd-fem, though I abandoned all attempts at tutorializing anything, so it's probably only readable to someone with a pretty good handle on FEM to begin with. That said, someone implementing their own XPBD FEM may find it useful to look at the code for normalizing things like compliance and damping against element size.

In the not-too-distant future I intend to add support for higher-order triangle and tetrahedron element types. I may also investigate storing additional values than just positions per node. I briefly tried storing volume per node, and that fixed hourglassing in the fully integrated mixed energy function, but I didn't really dive into its performance characteristics, so I might do that later.

Finally, I'd like to give a shout out to Matthias Müller and his SIGGRAPH(?) 2008 course notes on linear, co-rotational FEM (https://cgl.ethz.ch/Downloads/Publicati ... enotes.pdf). I've attempted to learn about FEM several times in the past, but always bounced off. Coming from a game development background, this is still the clearest intro to FEM I've seen, and it was enough to let me get a toe-hold into the wider subject.

## Small Step XPBD FEM Demo

### Re: Small Step XPBD FEM Demo

Found a sudden burst of inspiration to post an update to the demo ;p. The 3 major additions are a few new element types (second-order triangles and first- and second-order tetrahedrons), Stanford armadillo meshes for each of the element types, and a new set of energy functions that constrain per-vertex instead of per-element. The demo is still at https://jak-xyz.github.io/xpbd-fem/. If you've loaded the page before, you may need to press the "Reset Settings" button to clear your settings cache.

In terms of the new element types, there's not much to say, in isolation.

Adding the armadillo meshes was a learning experience, however. Now, it started gently enough with the tri and tet meshes. For the tri armadillo, I followed an old blog post by Miles Macklin. For tets, I was able to use TetGen without much fuss. The quad mesh was a bit more of a journey, but I eventually got gmsh to spit one out. The boundary could still use some refinement, but it seems like getting good-quality automatically-generated 2D quad meshes is tractable. The hex "armadillo," on the other hand, is where things really took a turn. Hex meshing is a much more difficult problem than tet meshing, and given that, the progress people have made on it is impressive. I tried a variety of automatic meshers, and I got farthest with the PolyCut toolchain from the University of British Columbia here: http://www.cs.ubc.ca/labs/imager/tr/2018/HexDemo/. It was able to generate some nice meshes, but not at a low-enough resolution for a browser demo. In the end, I ended up just hand-modeling the armadillo-adjacent creature you see in the demo.

Now, up to this point, the H8 linear hex element seemed like by far the most promising direction, since it's locking-resistant and it's the most performant element on a per-vertex basis by a large margin. And I still think if you're in a situation where a hex mesh is available, it's the thing to use. For instance, adding a hex-based flesh layer to a skinned character seems possible. For arbitrary volume meshes, though, hex mesh generation is too much of an impediment, so I decided to look a bit more into getting tets up to snuff.

Like with triangles, if you use the basic technique of constraining the energy function per-element on linear tets, you get pretty bad volumetric locking. Quadratic tets don't lock too much, but they are vastly more costly. Eventually, inspired by this paper: https://graphics.pixar.com/library/Inco ... /paper.pdf, I tried moving the constraint from the element to the one-ring around each vert. Each constraint is now quite a bit more costly, but you only have about 1/6th as many (in the case of tets, at least). Also, with the Gauss-Seidel solver, you need fewer Steps/Second to converge. With my current implementation, performance is a wash for tets between using element-centric constraints and one-ring-centric constraints, with the big tie-breaker being that one-ring constraints don't lock. All other element types are at least a bit slower. For the very simple mixed Neo-Hookean energy function (represented as "V Mix" in the demo), that's all there is to it: integrate each of the energies of the surrounding elements, divide by 4 (to represent the portion of the tet that belongs to the given vert), and you're set. For more complex energy functions, like Pixar or Yeoh Skin, you have to be a bit more careful. Integrating the energy function per-element still leads to locking. Instead it seems like you can just integrate the hyperelastic invariants I1 and J, and then use the weighted sum of those to calculate the energy function. I hacked in this technique for the T3 triangle version of "V Pix," though T3 "V Skin" and both T4 "V Pix" and "V Skin" still integrate the full energy function per-element, and thus lock.

Looking towards the future, I'm thinking of ditching the higher-order elements. It was educational to implement them, but I haven't been able to get them performance-competitive with the first-order elements on a per-vert basis. Even first-order tets are about 3x slower to achieve convergence compared to first-order hexes, but I think if I gave them a specialized pathway I could at least make up some of that difference. Along those lines, I might investigate WASM SIMD, since browsers are starting to roll out support for that, and it would be a good way to keep me honest about the complications of parallelizing one-ring constraints vs. simple element-constraints.

In terms of the new element types, there's not much to say, in isolation.

Adding the armadillo meshes was a learning experience, however. Now, it started gently enough with the tri and tet meshes. For the tri armadillo, I followed an old blog post by Miles Macklin. For tets, I was able to use TetGen without much fuss. The quad mesh was a bit more of a journey, but I eventually got gmsh to spit one out. The boundary could still use some refinement, but it seems like getting good-quality automatically-generated 2D quad meshes is tractable. The hex "armadillo," on the other hand, is where things really took a turn. Hex meshing is a much more difficult problem than tet meshing, and given that, the progress people have made on it is impressive. I tried a variety of automatic meshers, and I got farthest with the PolyCut toolchain from the University of British Columbia here: http://www.cs.ubc.ca/labs/imager/tr/2018/HexDemo/. It was able to generate some nice meshes, but not at a low-enough resolution for a browser demo. In the end, I ended up just hand-modeling the armadillo-adjacent creature you see in the demo.

Now, up to this point, the H8 linear hex element seemed like by far the most promising direction, since it's locking-resistant and it's the most performant element on a per-vertex basis by a large margin. And I still think if you're in a situation where a hex mesh is available, it's the thing to use. For instance, adding a hex-based flesh layer to a skinned character seems possible. For arbitrary volume meshes, though, hex mesh generation is too much of an impediment, so I decided to look a bit more into getting tets up to snuff.

Like with triangles, if you use the basic technique of constraining the energy function per-element on linear tets, you get pretty bad volumetric locking. Quadratic tets don't lock too much, but they are vastly more costly. Eventually, inspired by this paper: https://graphics.pixar.com/library/Inco ... /paper.pdf, I tried moving the constraint from the element to the one-ring around each vert. Each constraint is now quite a bit more costly, but you only have about 1/6th as many (in the case of tets, at least). Also, with the Gauss-Seidel solver, you need fewer Steps/Second to converge. With my current implementation, performance is a wash for tets between using element-centric constraints and one-ring-centric constraints, with the big tie-breaker being that one-ring constraints don't lock. All other element types are at least a bit slower. For the very simple mixed Neo-Hookean energy function (represented as "V Mix" in the demo), that's all there is to it: integrate each of the energies of the surrounding elements, divide by 4 (to represent the portion of the tet that belongs to the given vert), and you're set. For more complex energy functions, like Pixar or Yeoh Skin, you have to be a bit more careful. Integrating the energy function per-element still leads to locking. Instead it seems like you can just integrate the hyperelastic invariants I1 and J, and then use the weighted sum of those to calculate the energy function. I hacked in this technique for the T3 triangle version of "V Pix," though T3 "V Skin" and both T4 "V Pix" and "V Skin" still integrate the full energy function per-element, and thus lock.

Looking towards the future, I'm thinking of ditching the higher-order elements. It was educational to implement them, but I haven't been able to get them performance-competitive with the first-order elements on a per-vert basis. Even first-order tets are about 3x slower to achieve convergence compared to first-order hexes, but I think if I gave them a specialized pathway I could at least make up some of that difference. Along those lines, I might investigate WASM SIMD, since browsers are starting to roll out support for that, and it would be a good way to keep me honest about the complications of parallelizing one-ring constraints vs. simple element-constraints.

### Re: Small Step XPBD FEM Demo

The excellent paper Miles Macklin posted yesterday (A Constraint-based Formulation of Stable Neo-Hookean Materials) elucidated some things about the relationship between constraints and energy in XPBD that I had been fuzzy on, and I couldn't resist jamming the corrected mixed Neo-Hookean energy function into the demo. The updated version is still at https://jak-xyz.github.io/xpbd-fem/, and I moved the previous version to https://jak-xyz.github.io/xpbd-fem/v2. The biggest issue with the previous version is that I was constraining the full Neo-Hookean energy, instead of the square root. The irreducible Pixar Neo-Hookean energy function was similarly squared compared to what it should be, and I went ahead and fixed that as well. With the fixed energy function, the material is more permissive of large extensions and Poisson's ratio is respected in most modes. The "Mixed" energy function (which is a fully integrated mixed energy function) also now works correctly with quadratic elements.

### Re: Small Step XPBD FEM Demo

Updated the demo. The current version is still at https://jak-xyz.github.io/xpbd-fem, and the previous version is now at https://jak-xyz.github.io/xpbd-fem/v3. Additionally, I uploaded the latest code to: https://github.com/jak-xyz/xpbd-fem.

Took a deeper dive into the Rayleigh damping suggested in the original XPBD paper. It always looked quite nice with the single-equation irreducible Pixar Neo-Hookean, but I'd never been happy with the results for mixed energy functions. I eventually stumbled on the idea of switching over to working with gamma instead of beta, and feeding identical gamma values into both the deviatoric and volumetric constraints. This got the behavior matching between the irreducible and mixed energy functions pretty well. Still, there was one big problem, which is that the damping completely breaks down at high values, or even under high tension. Intuitively, it seems like by folding damping into the constraint you can get into an equilibrium where the velocity term cancels out the constraint term, netting zero force and causing your FEM object to melt slowly into formlessness, or at the very least massively violating incompressibility.

I have surfaced a couple mitigations via the "Rayleigh Type" tweakable. "Orig" lets you view damping as presented in the paper. The first mitigation, called "Lim," breaks the damping constraint out on its own and then limits how large it can get relative to the constraint function. This reduces the maximum strength of damping, but also greatly reduces the artifacts. Both constraints are still applied at the same time, so the cost is barely increased compared to the original. Another approach that allows for maximum damping and completely removes artifacts is to apply damping directly to velocity after all position constraints are calculated. This is selectable with "Vel." While this looks great, it is very expensive, as it's essentially doubling the number of constraints that are run each substep. To address the slowness, I added a final option, "Amor," which amortizes the cost of damping by only running it every 8 substeps. This reduces the maximum possible damping, but otherwise appears to generate very similar behavior in the demo (though I'd hesitate to recommend this as a generally viable solution, as this kind of thing feels like it has the potential to introduce weird resonances). In general, for all the mixed method energy functions, max damping is much stronger when the two mixed constraints are solved simultaneously, rather than serially.

All vert-centric constraints on T3 and T4 elements now integrate invariants and use the results to calculate energy, rather than integrating energy directly. This means VPix, VMix and VSkin all have reduced locking.

Inspired by that idea, I added an element-based constraint version of Yeoh skin called "YFast" which integrates invariants instead of energies and the results are promising. Per-constraint, it's nearly the same speed as regular Neo-Hookean, but it behaves very similarly to full Yeoh (or in the case of Q4/H8 elements, which severely lock with full Yeoh, the selectively integrated "YSel"). At high Steps/Second and high Compliance YSel and YFast are indistinguishable. As you lower Steps/Second, YFast actually seems to stay converged longer. As you lower Compliance, YFast also avoids locking, which is an issue for YSel, though it does exhibit slightly more visible hourglassing. (And a little tip when playing with the Yeoh skin models: unlike the Neo-Hookean model, it's only designed for the nearly-incompressible range, so Poisson's Ratio needs to be 0.4999 or higher.)

Added two new Energy Functions, "CubeNeo" and "CubeSkin." These only work for Q4 and H8 elements, and only work correctly for perfect squares/cubes. The main draw is how simple they are. The most basic version of CubeNeo for Q4s looks like this:
The deviatoric constraint simply pulls all the nodes to the element's centroid. This is only an approximation of the true Neo-Hookean constraint, but for squares and cubes, it's shockingly close. Setting Rayleigh Damp to 0 and Poisson's Ratio to 0.5 runs the above version of CubeNeo, so you can see how fast it is. Setting the values to anything other than exactly 0 and 0.5 switches to a more generalized version of the algorithm which respects individual node weights and very closely mimics the serial version of the MSel energy function. CubeSkin is the same idea, but it mimics YFast. Looking at the terms of the prefactored integral of the Neo-Hookean energy suggests why this approximation works so well. For squares and cubes, the weights of the centripetal gradients account for two-thirds of the total and the rest of the terms multiply by dot products of generally perpendicular vectors.

Given the limitations of this approximation, I'm not sure it has any use in production, but it could be a good stepping stone for learners ("Wow, look! You're doing real FEM in 16 lines of code!"). I've been toying with a writeup along those lines, but we'll see ...

Got rid of the concept of subregion constraints, since, with the correct treatment of energies, they didn't seem to add anything. This improved the speed of higher-order elements to the point where I'm not quite ready to write them off just yet.

The "Mixed XPBD" tweakable allows you to choose between Serial and Simultaneous solvers for the Mixed, MSel, Yeoh, YSel and YFast Energy Functions. Simultaneous means the deviatoric and volumetric constraints are solved simultaneously with the matrix form of XPBD, though the individual elements are still solved serially in a Gauss-Seidel fashion. The Simultaneous solver seems to be able to hold its shape better at low Steps/Second.

Internally, switched to a modified version of the XPBD equation which more directly uses potential energies (avoiding some square roots and divisions when solving constraints). I'll describe it very sketchily, here. Given the original small step XPBD, where:

Δ𝜆 = -C / (∇CM⁻¹∇Cᵀ + ᾶ) and

Δ𝑥 = M⁻¹∇CᵀΔ𝜆

Assuming a fully diagonal ᾶ, we can say U′ is the vector of all constraint energies with their compliances factored out. Then

U′ = ½C² and

Δ𝜆′ = -2U′ / (∇U′M⁻¹∇U′ᵀ + 2U′ᾶ) and

Δ𝑥 = M⁻¹∇U′ᵀΔ𝜆′ where

Δ𝜆 = Δ𝜆′√(2U′) (Though in this formulation raw Δ𝜆 is never needed.)

**Changelog:****Rayleigh Damping**Took a deeper dive into the Rayleigh damping suggested in the original XPBD paper. It always looked quite nice with the single-equation irreducible Pixar Neo-Hookean, but I'd never been happy with the results for mixed energy functions. I eventually stumbled on the idea of switching over to working with gamma instead of beta, and feeding identical gamma values into both the deviatoric and volumetric constraints. This got the behavior matching between the irreducible and mixed energy functions pretty well. Still, there was one big problem, which is that the damping completely breaks down at high values, or even under high tension. Intuitively, it seems like by folding damping into the constraint you can get into an equilibrium where the velocity term cancels out the constraint term, netting zero force and causing your FEM object to melt slowly into formlessness, or at the very least massively violating incompressibility.

I have surfaced a couple mitigations via the "Rayleigh Type" tweakable. "Orig" lets you view damping as presented in the paper. The first mitigation, called "Lim," breaks the damping constraint out on its own and then limits how large it can get relative to the constraint function. This reduces the maximum strength of damping, but also greatly reduces the artifacts. Both constraints are still applied at the same time, so the cost is barely increased compared to the original. Another approach that allows for maximum damping and completely removes artifacts is to apply damping directly to velocity after all position constraints are calculated. This is selectable with "Vel." While this looks great, it is very expensive, as it's essentially doubling the number of constraints that are run each substep. To address the slowness, I added a final option, "Amor," which amortizes the cost of damping by only running it every 8 substeps. This reduces the maximum possible damping, but otherwise appears to generate very similar behavior in the demo (though I'd hesitate to recommend this as a generally viable solution, as this kind of thing feels like it has the potential to introduce weird resonances). In general, for all the mixed method energy functions, max damping is much stronger when the two mixed constraints are solved simultaneously, rather than serially.

**Energy from Pre-Integrated Invariants**All vert-centric constraints on T3 and T4 elements now integrate invariants and use the results to calculate energy, rather than integrating energy directly. This means VPix, VMix and VSkin all have reduced locking.

Inspired by that idea, I added an element-based constraint version of Yeoh skin called "YFast" which integrates invariants instead of energies and the results are promising. Per-constraint, it's nearly the same speed as regular Neo-Hookean, but it behaves very similarly to full Yeoh (or in the case of Q4/H8 elements, which severely lock with full Yeoh, the selectively integrated "YSel"). At high Steps/Second and high Compliance YSel and YFast are indistinguishable. As you lower Steps/Second, YFast actually seems to stay converged longer. As you lower Compliance, YFast also avoids locking, which is an issue for YSel, though it does exhibit slightly more visible hourglassing. (And a little tip when playing with the Yeoh skin models: unlike the Neo-Hookean model, it's only designed for the nearly-incompressible range, so Poisson's Ratio needs to be 0.4999 or higher.)

**Square/Cube approximations**Added two new Energy Functions, "CubeNeo" and "CubeSkin." These only work for Q4 and H8 elements, and only work correctly for perfect squares/cubes. The main draw is how simple they are. The most basic version of CubeNeo for Q4s looks like this:

Code: Select all

```
void ConstrainSquare(
float dt, float compliance, float nodeMass, float area0,
dvec2& X0, dvec2& X1, dvec2& X2, dvec2& X3
) {
// Apply Neo-Hookean compression
dvec2 centroid = (1.0 / 4.0) * (X0 + X1 + X2 + X3);
float compressionScale = -2.0f * (dt * dt) / (compliance * nodeMass);
X0 += compressionScale * (X0 - centroid);
X1 += compressionScale * (X1 - centroid);
X2 += compressionScale * (X2 - centroid);
X3 += compressionScale * (X3 - centroid);
// Apply volume conservation
vec2 A = vec2(X0 - X2);
vec2 B = vec2(X1 - X3);
float area = 0.5f * (A.x * B.y - A.y * B.x);
vec2 gA = 0.5f * vec2(B.y, -B.x); // Partial derivative of area with respect to A
vec2 gB = 0.5f * vec2(-A.y, A.x); // Partial derivative of area with respect to B
float volumeScale = -2.0f * (area - area0) / (dot(A, A) + dot(B, B));
X0 += dvec2(volumeScale * gA);
X1 += dvec2(volumeScale * gB);
X2 -= dvec2(volumeScale * gA);
X3 -= dvec2(volumeScale * gB);
}
```

Given the limitations of this approximation, I'm not sure it has any use in production, but it could be a good stepping stone for learners ("Wow, look! You're doing real FEM in 16 lines of code!"). I've been toying with a writeup along those lines, but we'll see ...

**Smaller changes**Got rid of the concept of subregion constraints, since, with the correct treatment of energies, they didn't seem to add anything. This improved the speed of higher-order elements to the point where I'm not quite ready to write them off just yet.

The "Mixed XPBD" tweakable allows you to choose between Serial and Simultaneous solvers for the Mixed, MSel, Yeoh, YSel and YFast Energy Functions. Simultaneous means the deviatoric and volumetric constraints are solved simultaneously with the matrix form of XPBD, though the individual elements are still solved serially in a Gauss-Seidel fashion. The Simultaneous solver seems to be able to hold its shape better at low Steps/Second.

Internally, switched to a modified version of the XPBD equation which more directly uses potential energies (avoiding some square roots and divisions when solving constraints). I'll describe it very sketchily, here. Given the original small step XPBD, where:

Δ𝜆 = -C / (∇CM⁻¹∇Cᵀ + ᾶ) and

Δ𝑥 = M⁻¹∇CᵀΔ𝜆

Assuming a fully diagonal ᾶ, we can say U′ is the vector of all constraint energies with their compliances factored out. Then

U′ = ½C² and

Δ𝜆′ = -2U′ / (∇U′M⁻¹∇U′ᵀ + 2U′ᾶ) and

Δ𝑥 = M⁻¹∇U′ᵀΔ𝜆′ where

Δ𝜆 = Δ𝜆′√(2U′) (Though in this formulation raw Δ𝜆 is never needed.)

### Re: Small Step XPBD FEM Demo

Hi Jak, I just wanted to say that I’m a big fan of your explorations here. The framerate and smoothness is hypnotic! Your implementation (and Miles’s most recent paper) inspired me to have a go porting an XPBD sim to run in WebGL2 Shaders: https://zalo.github.io/TetSim/

This uses simple polar decomposition-based tet constraints with jacobi iterations (instead of the paper’s neohookean energy function with gauss-seidel iterations).

I think the example dragon model has too high of a max vertex valence (31) for graph coloring-based parallelization to work nicely, but uniform hexahedral meshes might work great (~8).

I haven’t worked out the physical correctness or damping yet, but I’m hoping to build out a more powerful framework for GPGPU calculations in WebGL2 before spending too much time on the details.

This uses simple polar decomposition-based tet constraints with jacobi iterations (instead of the paper’s neohookean energy function with gauss-seidel iterations).

I think the example dragon model has too high of a max vertex valence (31) for graph coloring-based parallelization to work nicely, but uniform hexahedral meshes might work great (~8).

I haven’t worked out the physical correctness or damping yet, but I’m hoping to build out a more powerful framework for GPGPU calculations in WebGL2 before spending too much time on the details.

### Re: Small Step XPBD FEM Demo

Love it! That GPU sim is so smooth ~~

Great to see other people working in the area!

Great to see other people working in the area!