Ineffective Theory

An Introduction to the Sign Problem

I recently uploaded a paper to the arXiv on the partial removal of sign problems with perturbation theory. This seems like a good time to write up a little introduction to what the sign problem is and why we care (with no familiarity with Lattice QFT assumed!). Please be aware of the obvious: this is an introduction to the aspects of the sign problem that I care about. For the rest, go read something else. Also, it’s only a small fraction of that, for brevity.

A sensible starting point for all this is the Ising model. The Ising model takes place on a lattice, like the one shown below. We can assign to each site of the lattice a value, either \(+1\) or \(-1\) (or spin up/down, if you prefer). Such an assignment, of one value at every site, is termed a configuration of the lattice. Of course, a \(V\)-site Ising lattice has \(2^V\) configurations.

To each configuration is associated an energy. In the Ising model, the core intuition is that neighboring spins should like to point in the same direction. Therefore, we impose an energy penalty (of size \(J\)) on edges where one site is \(+1\) and the other is \(-1\). In the configuration above, each red line marks an energy penalty.

$$ E = - J \sum_{\langle i j \rangle} s_i s_j $$

At this point, we can talk about things like “the ground state”, but that’s incredibly boring. The ground state has all spins pointed in the same direction, and has no interesting properties whatsoever. Instead, we’re going to go to finite temperature. The Ising model is a statistical system, and the probability of the system being in configuration \(s\) is proportional to \(e^{-E_s}\).

$$ p(s) = Z(J)^{-1} e^{-J \sum_{\langle ij \rangle} s_i s_j} $$

The probability \(e^{-E_s}\) is the Boltzmann factor, and the normalization \(Z(J)\) is termed the partition function The temperature is of course degenerate with the coupling \(J\), so we can set it to \(1\) without losing anything. Note that we’re not talking about a dynamical system. There’s no notion of time evolution here. The “thermal equilibrium” specified by the Boltzmann factor is the defining property of the theory, rather than being an emergent property of some equations of motion.

Given a probability distribution, what can we do? Evaluate expectation values, of course! The simplest one is the expectation value of the energy.

$$ \langle E \rangle = \frac {\sum_s E_s e^{-J \sum_{\langle ij\rangle} s_i s_j}} {\sum_s e^{-J \sum_{\langle ij\rangle} s_i s_j}} $$

Of course, there are exponentially many (in volume) configurations \(s\), so we’re not going to explicitly evaluate the numerator and denominator. Rather, one uses Monte Carlo sampling, treating \(e^{-J \sum s_i s_j}\) as a probability distribution to be sampled from. The precise algorithm used to sample from that distribution isn’t very important. but it’s usually some form of Markov chain Monte Carlo.

So much for the Ising model. Other field theories look much the same. There’s some big space of configurations (let’s call a configuration \(\phi(x)\), with \(x\) a point on the lattice), and we want an expectation value that looks like $$ \langle\mathcal O\rangle = \frac{\int \mathcal D \phi\;\mathcal O[\phi]e^{-S[\phi]}}{\int \mathcal D\phi\;e^{-S[\phi]}} \text. $$ The functional \(\mathcal O\) is the observable; the functional \(S\) (which plays the role of the energy above) is termed the action.

The Ising model has no sign problem. For other models, the action \(S[\phi]\) can be complex-valued, rather than real. As a result, the Boltzmann factor \(e^{-S}\) is no longer a non-negative real number which can be inerpreted as a probability. (The fact that the sign can be positive or negative is the origin of the term “sign problem”.) This means that the strategy of sampling with Markov chain Monte Carlo will no longer work.

The standard workaround is reweighting. We’ll sample with respect to \(|e^{-S}|\), which is non-negative and real. Of course, expectation values with respect to \(|e^{-S}|\) are not the same as those with respect to \(e^{-S}\). This is where the reweighting comes in: the original expectation values are given by $$ \langle \mathcal O \rangle = \frac{\int \mathcal D\phi\;}{\int \mathcal D\phi\;} = \frac{\langle e^{\Im S}\rangle_{\Re S}}{\langle e^{\Im S}\rangle_{\Re S}} \text. $$ Now, we can perform Monte Carlo sampling for both numerator and denominator. Problem solved!

The difficulty here is that the denominator might be small, but the integrand in the denominator is always a number of magnitude \(1\). If the denominator is \(0.01\), say, then most samples simply serve to cancel out other samples. Around ten thousand samples must be obtained before even the order of magnitude of the denominator can be estimated.

To make this more visceral, try integrating the two functions above in your head. Human “by-eye” integration isn’t really so different from Monte Carlo integration. On the left, it’s pretty clear that the integral is about \(1\). Maybe it’s \(0.5\), maybe it’s \(2\) — it’s not \(10\), and it’s not \(10^{-2}\) (or, god forbid, \(-1\)). Now, what’s the integral of the one on the right? Note that change in scale! Is the integral \(1\)? Is it even positive? You don’t know, you’ll never know, you’re cursed to die in ignorance.

(I think when I made these figures I made sure the integrals were the same. Maybe I made a mistake though.)

Is the denominator characteristically small? Yes: for fermionic systems, it scales like \(e^{-V}\), where \(V\) is the volume of the system. This comes from the fact that in large-volume systems, the partition function can be approximated as the product of two partition functions, each of the same system at half the volume. This is the nail in the coffin for simulations of a lot of systems we care about, like the material in neutron stars.

How hard is the sign problem? Troyer-Wiese showed that it’s NP-hard. It’s important to be careful about what exactly that means, though. Under the assumption that \(P \ne NP\) (a standard assumption of computational complexity) there exists no general algorithm, which solves any fermion sign problem in polynomial (in the number of lattice sites) time. To show this, Troyer and Wiese designed a “malicious” family of fermion sign problems, not unrelated to the Ising model above. By adjusting the bond strengths (making the coupling \(J\) a function of space), they encoded a previously known NP-hard problem within the sign problem.

All is not lost! A critical part of their construction is the ability to adjust bond strengths arbitrarily. But in field theories of great interest, the action is homogeneous: the bond strengths do not vary throughout the lattice. A solution to these sorts of sign problems would not run afoul of Troyer and Wiese’s result. In fact, a solution to these homogeneous sign problems can never be ruled out with a trick like that. There’s simply no opportunity to encode an NP-hard problem in a homogeneous action, which is parameterized by only two or three numbers.

There’s an endless list of strategies for attacking the sign problem in field theories. To summarize them all: they don’t work. Not in any reliable, generalizable sort of way (with the exception of quantum computing, which doesn’t exist). The two big sign problems everybody cares about are QCD (neutron star matter) and the Hubbard model away from half-filling. These remain as unsolved as ever.