Quant Out of Water
Solving Equations with Jacobi Iteration

Solving Equations with Jacobi Iteration

Jacobi iteration is a natural idea for solving certain types of nonlinear equations, and reduces to a famous algorithm for linear systems. This post discusses the algorithm, its convergence, benefits and drawbacks, along with a discussion of examples and pretty pictures 🖼️.

Introduction

The method of Jacobi Iteration, named after the 19th century mathematician Carl Gustav Jacobi (the same Jacobi for whom the Jacobian in calculus is named after), is a numerical method for solving systems of equations. It is particularly famous as a classical iterative algorithm for solving linear systems. One of the main reasons one might want to use Jacobi iteration in practice is that it admits of a naturally parallel implementation, and can thus scale to very large and complex systems, and even to problems without any closed form representation of the system we want to solve (like a simulator, or a black-box machine learning algorithm).

In this post, I’ll introduce the basic mathematical intuition of Jacobi iteration, along with some examples of how and where it might arise. I’ll also give a brief analysis of convergence for linear systems of equations, where the concept of a diagonally dominant matrix arises. I also attempt to extend the intuition of the linear case to nonlinear systems by using techniques from the theory of ordinary differential equations.

Systems of Equations

We are all familiar with the grade-school notion of a mathematical equation. Particularly famous are the quadratic polynomial equations, e.g., \(x^2 - x - 2 = 0\), which has exactly the solutions \(x = 2\) and \(x = -1\). For another example, the trigonometric equation \(\mathsf{sin}(\frac{(2x + 1) \pi}{2}) - 1 = 0\) has any \(x\) being an integer multiple of \(2\), i.e., \(x \in 2\Z\) as a solution.

What is meant by a system of equations is simply a multitude of ordinary equations that need to be satisfied simultaneously. For instance, if we combined the above two examples into the system of two equations and one variable we would be left with the system

\begin{equation} \notag \begin{aligned} x^2 + x - 2 &= 0\\ \mathsf{sin}\bigl(\frac{(2x + 1) \pi}{2}\bigr) - 1 &= 0, \end{aligned} \end{equation}

which now has a single unique simultaneous solution \(x = 2\).

In general, we can write systems of equations using a single multivariate function \(F: \R^m \rightarrow \R^n\) which takes \(m\) variables \(x = (x_1, x_2, \ldots, x_m)\) as in \(F(x)\), and outputs \(n\) values \(F(x) = \bigl(F_1(x), F_2(x), \ldots, F_n(x)\bigr)\). It is a rule of thumb (certainly not an actual rule) that if there are \(n\) equations, you can expect to be able to solve for \(n\) variables, i.e., the function \(F\) is “square” with \(m = n\). Assuming this square case is by no-means essential, but it simplifies many of our examples, so we will run with this case.

The ultimate goal of solving systems of equations is to find some \(x \in \R^n\) such that \(F(x) = 0\). We can motivate such problems with a few illustrative examples.

Minimizing Functions

Consider a function \(f: \R^n \rightarrow \R\) which we want to minimize, i.e., to find an \(x^\star \in \R^n\) such that \(f(x^\star) \le f(x)\) for every other \(x\) in some neighbourhood of \(x^\star\). One might want to think of \(f(x)\) perhaps as a design objective (find the best design according to the cost function \(f\)), or as a machine learning loss function, etc. It is a theorem (the first order necessary conditions) that for differentiable functions \(f\), any minimizer \(x^\star\) must necessarily satisfy the derivative condition \(\mathsf{D} f(x^\star) = 0\), where \(\mathsf{D} f: \R^n \rightarrow \R^n\) is the derivative of \(f\).

Economic Equilibria

In economics, many believe that prices of goods in an economy are determined by the matching of supply and demand. That is, if \(\mathcal{D}(p)\) is a demand curve and \(\mathcal{S}(p)\) is a supply curve, then we expect the price \(p\) to be found from solving the nonlinear equation \(\mathcal{D}(p) = \mathcal{S}(p)\). I have much more to say about this example in Section Example: Supply and Demand with Substitution , where I apply Jacobi iteration to a simple coffee and tea economy where the two goods serve as partial substitutes for one and other.

Robotic Manipulators

Robotic arms with stiff linkages can be modeled by the angles \(\theta\) of their joints. For example, the position of your hand (your end affector) in space can be determined as a function of the length of your upper arm and your forearm, along with the angles formed at your elbow and shoulder (including the multiple dimensions of rotation your shoulder is capable of). Specifically, we might wish to describe the position of your hand in space by a function \(F(\theta)\) of these joints. The problem of determining the appropriate angular settings of your joints, in order to place your hand at a point \(p \in \R^3\) in space, is a problem of solving the system of equations \(F(\theta) - p = 0\), and one which your brain apparently solves with remarkable ease.

Jacobi Iteration

The ideal of Jacobi iteration is to split up the system \(F(x) = 0\) of \(n\) equations in \(n\) unknowns, into a sequence of simpler equations of one variable and one unknown. For equation \(i\), we imagine that every variable except \(x_i\), let’s call them \(x_{-i}\) is held fixed, and that we don’t care about the value of any function except \(F_i\). We then solve the univariate equation \(F_i(x_i; x_{-i}) = 0\) (This notation is common in game theory, I hope that it is understood) to find the single value of \(x_i\) that results in function \(F_i\) being satisfied.

This process is carried out, possibly in parallel, simultaneously for each equation to obtain a new set of points which we hope is closer to satisfying the full system of equations.

Remark: Proceeding sequentially, rather than simultaneously, by updating each \(x_i\) value immediately after finding a univariate solution, and before finding the next one, is an algorithm called Gauss-Seidel Iteration. The reader is encouraged to meditate upon the difference.

A pseudo-code algorithm implementing Jacobi iteration is given as follows:

  • Initialize \(x(0) \in \R^n\)
  • for \(t = 1, 2, \ldots\)
    • parallel for each \(i \in [n]\)
      • find \(\tilde{x}_i\) such that \(F_i(\tilde{x}_i, x_{-i}(t)) = 0\) // i.e., solve the $i^{th}$ equation
    • set \(x(t + 1) = \tilde{x}\)

In order that this algorithm be an appropriate choice for your problem, it is at least necessary that the step requiring that we “solve the \(i^{th}\) equation” can be reliably carried out. However, even if solving each of the sub-problems is challenging, we may benefit from the straight-forward parallelism offered by finding each \(\tilde{x}_i\) simultaneously.

The other major issue is convergence, i.e., does the algorithm actually find a solution? We can get fairly clear answers to this question in the linear case by applying dynamical systems theory. As is so often the case, the theory for linear functions serves as a stepping stone to building intuition more generally.

The Case of Linear Equations

Whether or not the algorithm is actually able to find some \(x\) such that \(F(x) = 0\) (assuming at the very least that such an \(x\) exists!) is highly problem dependent and in practice may require various “clever tweaks”, or benefit from restarting the algorithm from a wide range of initial points \(x(0)\). However, for the case of linear systems of equations \(Ax = b\) (with \(A \in \R^{n \times n}\) a square matrix with real entries), the algorithm is both elegantly simple, and admits of an easily verifiable sufficient condition for convergence known as diagonal dominance.

To understand how this works, lets solve for \(x_i\) such that the \(i^{th}\) equation is satisfied. That is,

\begin{equation} \notag \begin{aligned} \sum_{j = 1}^n A_{ij} x_j &= b_i\\ \iff A_{ii} x_i &= b_i - \sum_{j: j \ne i} A_{ij} x_j\\ \iff x_i &= \frac{1}{A_{ii}}\bigl(b_i - \sum_{j: j \ne i}A_{ij} x_j \bigr), \end{aligned} \end{equation}

where we should notice that we already require the diagonal elements of \(A\) to be non-zero (otherwise we couldn’t divide).

Focusing for a moment on the term \(\sum_{j \ne i} A_{ij} x_j\), this is nothing but the \(i^{th}\) element in the matrix multiplication \(Ax\) minus the component \(A_{ii} x_i\), that is, \(\sum_{j \ne i} A_{ij} x_j = (Ax)_i - A_{ii} x_i\). Using this, we can write the parallel updates to the entire vector \(x\), and the entire Jacobi iteration algorithm as:

\begin{equation} \notag x(t + 1) = D^{-1} (b - Mx(t)), \end{equation}

where \(D = \mathsf{dg}(A)\) (the diagonal elements of \(A\)), \(M = A - D\) is all of the off-diagonal elements of \(A\), and \(x(0) \in \R^n\) is an arbitrary starting point for the sequence of candidate solutions \(x(t)\). When people refer to “Jacobi iteration”, it is usually this algorithm in particular that they are referring to.

Convergence

The hope is that the iterations \(x(t)\) converge to a solution \(x^\star\), i.e., \(x(t) \rightarrow x^\star\), where \(Ax^\star = b\).

To see why we might expect this to happen, imagine that \(x(t)\) converges to a fixed point of the algorithm, i.e., some \(x(\infty)\) satisfying \(x(\infty) = D^{-1} (b - M x(\infty))\). For such a point it holds that:

\begin{equation} \notag \begin{aligned} x(\infty) &= D^{-1} (b - M x(\infty))\\ Dx(\infty) &= b - M x(\infty)\\ (D + M)x(\infty) &= b\\ Ax(\infty) &= b, \end{aligned} \end{equation}

which is a solution to the equation \(Ax = b\)!

To establish the convergence of the iterations, it is well known (this is likey to be a topic of a future post! 😜) that for linear systems \(x(t + 1) = a + K x(t)\), \(x(t)\) will converge whenever \(\rho(K) < 1\), where \(\rho(K) = \text{max}_i\ |\lambda_i(K)|\) is the largest eigenvalue magnitude, a quantity called the spectral radius. For the case of Jacobi iteration, we require that \(\rho(D^{-1} M) < 1\).

Using the Gershgorin circle theorem , and the fact that the diagonal values of \(M\) are all \(0\) (by construction), it holds that every eigenvalue of \(D^{-1} M\) lies within a distance \(|D_{ii}^{-1}|\sum_{j \ne i} |M_{ij}|\) of \(0\). A condition on \(A\) which guarantees this is diagonal dominance: \(\sum_{j \ne i} |A_{ij}| < |A_{ii}|\) for each \(i\). If this holds, then the eigenvalues \(\lambda_i\) of \(A\) are “close” to the diagonals of \(A_{ii}\), so such matrices are in some sense “almost diagonal”. Since linear systems \(Dx = b\), for diagonal \(D\), are easy to solve, it is not surprising that there is a simple iterative algorithm for solving linear systems which are diagonally dominant. Moreover, if the diagonals of \(A\) are also non-zero, \(A\) will necessarily have non-zero eigenvalues, and therefore it will be full-rank, invertible, and admit of unique solutions. This is all summarized with the following theorem.

Theorem: Let $A \in \R^{n \times n}$ be a square matrix with real entries. Suppose that $A$ has a non-zero diagonal and is diagonally dominant. Then, for any vector $b \in \R^n$, there exists a unique solution $x^\star$ to the linear equation $Ax = b$ and Jacobi iteration converges, from any initial condition, to the solution $x^\star$.

Linear \(2 \times 2\) Example

Some straightforward Python code implementing linear Jacobi iteration is provided in the listing below. A realistic implementation should have a method of detecting divergence. As well, checking the norm of the distance to the solution on every iteration is relatively expensive – it essentially doubles the computational effort.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def solve_linear_system(A, b, x0=None, eps=1e-6, maxiter=np.inf):
    """Solves the linear system Ax = b by Jacobi iteration.

    The algorithm's starting point is x0.  The algorithm is only guaranteed
    to converge if A is diagonally dominant.
    """
    Dinv_M, Dinv_b = (A - np.diagflat(D)) / np.diag(A)[:, None], b / np.diag(A)

    x = np.array(x0) if x0 is not None else np.zeros_like(b)

    it = 0
    while it < maxiter and np.linalg.norm(A @ x - b) > eps:
        x = Dinv_b - Dinv_M @ x
        it += 1

    return x

For problems in \(\R^2\) (i.e., with just two variables) it is quite straight-forward to produce nice-looking figures. The following figure plots the flow of an associated ODE, as well as an “SOR” modification (both to be explained shortly 💁), along with the discrete iterates of the Jacobi iteration algorithm for the matrix

\[ A = \begin{bmatrix} 4/5 & 3 / 5 \\ -6 / 5 & 7/5\end{bmatrix}, \]

and \(b = 0\). The case \(b = 0\) is without loss of generality for the purpose of plotting. Even though finding a solution to the equation \(Ax = 0\) is trivial in this case, it is equivalent to re-orienting the center of our coordinate system upon the solution \(A^{-1} b\). It is also worth noting that this matrix is row-wise diagonally dominant (corresponding to our definition) but not column wise.

If you squint closely at this figure, you might even believe that the lines joining the discrete iterates are tangent to the flow lines in the background…

Relaxation and Associated Flows

Our figure above is constructed by plotting the vector field from the flow of a closely related ordinary differential equation. Let’s use \(t\) to denote time, and as in the previous example, assume without loss that \(b = 0\).

Merely as a device to construct a differential equation, imagine that each step of the algorithm takes \(\Delta\) “algorithm time”. Precisely, let us write

\begin{equation} \notag \begin{aligned} x(t + \Delta) &= -D^{-1} Mx(t)\\ \iff x(t + \Delta) - x(t) &= -(I + D^{-1} M)x(t))\\ \end{aligned} \end{equation}

Now, we have an analogy with continually taking steps \(-(I + D^{-1} M)x(t)\) to update the value of \(x(t)\). If we reduce the size of these steps by \(\Delta\) and then take a limit as \(\Delta \rightarrow 0\)

\begin{equation} \notag \begin{aligned} x(t + \Delta) - x(t) &= -\Delta(I + D^{-1} M)x(t))\\ \iff \frac{1}{\Delta} [x(t + \Delta) - x(t)] &= -(I + D^{-1} M)x(t))\\ \overset{\Delta \rightarrow 0}{\implies} \dot{x}(t) &= -(I + D^{-1} M)x(t)), \end{aligned} \end{equation}

we obtain a linear ordinary differential equation.

Remark: The stablity of linear ODEs can be established by checking that the eigenvalues of the corresponding system matrix have negative real part. In our case, due to the negative sign, we require that $\mathfrak{R}[\lambda_i(I + D^{-1} M)] > 0$ hold for every $i$, where $\lambda_i(A)$ is the $i^{th}$ eigenvalue of $A$. Since $D^{-1}M$ has $0$ on the main diagonal (by construction) the Gershgorin circle theorem tells us that the eigenvalues lie within Gershgorin discs centered at $1$. Clearly, the condition $\rho(D^{-1} M) < 1$ is sufficient for the stability of this ODE. But, thinking intuitively, should one expect that this ODE be “more likely” to converge than the discrete algorithm? The reader is encouraged to visualize this situation, and to think about this stability condition.

Let’s play with this a bit. What if we didn’t take a full limit towards \(\Delta \rightarrow 0\)? Rearranging the above equations results in another discrete algorithm

\begin{equation} \notag x(t + \Delta) = (1 - \Delta) x(t) -\Delta D^{-1} Mx(t)), \end{equation}

where \(\Delta > 0\) is a parameter of the algorithm, with \(\Delta = 1\) corresponding to ordinary Jacobi iteration.

This algorithm is now picking the next point to lie somewhere along the line connecting \(x\) and the nominal next step \(s(x)\), that is: as \(x \leftarrow (1 - \Delta) x + \Delta s(x)\). This is a general pattern called Successive Over-relaxation (SOR), and can be applied to any iterative algorithm which takes as input a point \(x\), and outputs the next point as \(x \leftarrow s(x)\). When \(\Delta < 1\), the next point will lie somewhere in the interval \([x, s(x)]\) between the current point and the nominal next step (the “relaxation” part of SOR); and \(\Delta > 1\) means that the step goes beyond \(s(x)\) to move a further distance (the “over” part). The value \(\Delta = 0\) corresponds to the vanilla version of the algorithm. I don’t think that the case \(\Delta < 0\) has any sensible use (at least not directly in this context) as you would be going backwards in some sense, but maybe it’s an interesting possibility to think about.

The reason that the SOR algorithm can be expected to converge for a broader collection of \(A\) matrices than ordinary Jacobi iteration, is that the space of matrices satisfying \(\mathfrak{R}[\lambda_i(I + D^{-1} M)] > 0\) is larger than those which satisfy \(\rho(D^{-1} M) < 1\) (can you see why?). However, for a matrix \(A\) such that \(D^{-1}M\) has an eigenvalue with real part less than \(-1\), both algorithms will diverge.

The case of Nonlinear Equations

For the case of nonlinear equations, we can use tools from ordinary differential equations to say something about the convergence of the associated Jacobi flow, and then extend that intuition through successive over-relaxation to try to understand something about how the discrete algorithm will behave.

To this end, let us go back to the nonlinear equation \(F(x) = 0\) that we want to solve. Supposing that the Jacobi iteration algorithm can be implemented for \(F\) (i.e., each equation in the system can be solved in the “diagonal” variable), we can write the algorithm with the notation

\begin{equation} \notag x(t + 1) = \mathcal{M}(x(t)), \end{equation}

where for any \(x\), \(y = \mathcal{M}(x)\) is a vector such that \(F_i(x_{-i}, y_i) = 0\), that is, solves the \(i^{th}\) equation in the \(i^{th}\) variable. Following the same trick as we had in the linear case, we can construct an ODE:

\begin{equation} \notag \begin{aligned} \frac{x(t + \Delta) - x(t)}{\Delta} &= \mathcal{M}(x(t)) - x(t)\\ \overset{\Delta \rightarrow 0}{\implies} \dot{x}(t) &= \mathcal{M}(x(t)) - x(t). \end{aligned} \end{equation}

Thus, we can hope to understand the convergence of nonlinear Jacobi iteration by analyzing the convergence of the nonlinear system of ODEs associated to the function \(\mathcal{M}\). Firstly, a fixed point of this ODE is, similarly to the linear case, a solution to the nonlinear equation. So, perhaps we can understand the convergence of \(x(t)\) for points where \(\mathcal{M}(x(t)) \approx x(t)\)? The method for doing this type of local analysis of ODE convergence is the Hartman-Grobman Theorem (HGT from hereon). Intuitively, our hope is that if we initialize the algorithm somewhere “close” to a solution, that it will actually converge to that solution. That is, we hope that the ODE is locally asymptotically stable.

The HGT essentially tells us that if the Jacobian \(A = \mathsf{D}\mathcal{M}(x^\star) - I\) of the system at an equilibrium point \(x^\star\) is stable, i.e., \(\dot{x} = Ax\) converges to \(x^\star\), then there is a local neighbourhood around \(x^\star\) such that the original nonlinear ODE \(\dot{x} = \mathcal{M}(x) - x\) converges to \(x^\star\).

This statement is still pretty abstract, since we don’t really have a handle on what \(\mathcal{M}\) is explicitly. Another abstract tool that we could apply here is the Implicit Function Theorem , which would allow us to calculate \(\mathsf{D}\mathcal{M}\) in terms of derivatives of the original function \(F\). While this approach can certainly get us somewhere, I’d like to make a closer analogy with the linear case. To this end, let’s make a simplifying assumption about the structure of \(F\):

\begin{equation} \notag F_i(x) = f_i(x_i) + g_i(x_{-i})\ \forall i, \end{equation}

that is, the \(i^{th}\) equation consists of an individual function \(f_i: \mathbb{R} \rightarrow \mathbb{R}\) of \(x_i\), along with additional coupling function \(g_{i}: \mathbb{R}^{n - 1} \rightarrow \mathbb{R}^{}\) involving the remainder of the variables. We can now write down what \(\mathcal{M}_i\) is explicitly:

\begin{equation} \notag \mathcal{M}_i(x) = f_i^{-1} (-g_i(x_{-i})). \end{equation}

In the linear case, \(f_i^{-1}(z) = z / A_{ii}\) and \(g_i(x_{-i}) = \sum_{j: j \ne i} A_{ij} x_j - b_i\). Considering now the derivatives of \(\mathcal{M}_i,\) we just need to combine the Inverse Function Theorem along with the chain rule to obtain

\begin{equation} \notag \mathsf{D}\mathcal{M}_i(x) = -\frac{1}{f_i^\prime \circ \mathcal{M}_i(x)} \begin{bmatrix}\frac{\partial g_i(x_{-i})}{\partial x_1} & \cdots & \frac{\partial g_i(x_{-i})}{\partial x_{i - 1}} & 0 & \frac{\partial g_i(x_{-i})}{\partial x_{i + 1}} & \cdots &\frac{\partial g_i(x_{-i})}{\partial x_{n}}\end{bmatrix} \end{equation}

where \(f^\prime \circ \mathcal{M}_i(x) = f^\prime (f_i^{-1} (-g_{i}(x_{-i})))\). Thus, in exact analogy with the linear case, we can write

\begin{equation} \notag \mathsf{D}\mathcal{M}(x) = -D(x)^{-1} M(x), \end{equation}

where \(D(x)\) is a diagonal matrix consisting of the derivatives \(f_i^\prime\) evaluated at the point \(\mathcal{M}_i(x)\) and \(M(x)\) is the Jacobian \(\mathsf{D}F\) of \(F\) itself, but with the diagonal elements zeroed out.

Returning to the HGT, we need to consider the stability of the linear ODE defined by the matrix \(A = \mathsf{D}\mathcal{M}(x^\star) - I\). Since \(\mathcal{M}(x^\star) = x^\star\), we have that \(D_i(x^\star) = f_i^\prime(x^\star_i)\). Using this fact, we are inspired to make a definition of a locally diagonally dominant function. We will say that \(F\) is a locally diagonally dominant (around an equilibrium point \(x^\star\)) if it has the form \(F_i(x) = f_i(x_i) + g_i(x_{-i})\) and \[|f_i^\prime(x_i^\star)| > \sum_{j: j \ne i} |\frac{\partial g_i (x^\star_{-i})}{\partial x_j}|\] for every \(i\).

Using what we know from the linear case, combined with the HGT, and the fact that the SOR scheme is an Euler Method for the ODE (which converges towards the ODE itself as \(\Delta \rightarrow 0\)), we have a theorem about local convergence of nonlinear Jacobi iteration:

Theorem: Let $F: \mathbb{R}^n \rightarrow \mathbb{R}^n$ be a smooth function. Suppose there exists some $x^\star$ such that $F(x^\star) = 0$ and that $F(x^\star)$ is locally diagonally dominant in a neighbourhood of $x^\star$. Then, there exists a neighbourhood $\mathcal{N}$ of $x^\star$ and a $\Delta > 0$ such that SOR nonlinear Jacobi iteration with step-size $\Delta$ converges to $x^\star$ from any initial point $x \in \mathcal{N}$.

Of course, the linear case corresponds to the functions \(f_i(x_i) = A_{ii} x_i\) and \(g_i(x_{-i}) = \sum_{j: j \ne i} A_{ij} x_j - b_i\), wherein we recover from above the convergence theorem of linear Jacobi iteration.

Example: Supply and Demand with Substitution

Let’s cook up an example from economics where we’ll try to work out the equilibrium prices of goods in an economy. A typical economic model (at least so far as I know… I’m no economist!), is to find the prices at which the supply of a good is matched to the demand for that good. The story is that when prices are high, manufacturers will scramble to produce that good in order to sell it for a large profit, which will drive down the price for the good through competition. Similarly, when prices are low, demand will be very high, since you can get a lot of utility out of consuming the good in comparison to keeping the small amount of money. This can be expected to drive up prices. Through these competing effects pushing prices up and down, it is hoped that prices will equilibriate at some fixed value through a process of tâtonnement .

The critical aspect that will make this model an interesting example for solving nonlinear equations is a substitution effect . To get more specific, suppose that when the prices for all \(n\) goods are given by \(p \in \R^n\), the supply for good \(i\) is described by a function \(\mathcal{S}_i(p) = S_i(p_i)\) and the demand for this good is \[\mathcal{D}_i(p) = D_i(p_i) + T_i(p_{-i}).\] What this latter equation says is that there is some nominal demand \(D_i(p_i)\) given by a scalar function of the price of that good, as well as the functions \(T_i(p_{-i})\) which serves to model the substitution effect: the demand for good \(i\) also depends upon the prices of other goods in the economy. A set of equilibrium prices \(p^\star \in \R^n\) (which may not be unique!) in this economy are prices such that \[\mathcal{S_i}(p^\star) - \mathcal{D}_i(p^\star) = 0\ \forall i \in [n],\] a nonlinear equation.

To make this more concrete, suppose we have an economy with two goods: coffee and tea. As the price of coffee increases, demand for coffee also falls. But, the demand for tea rises, since some coffee drinkers that are tightening their belt and drinking less coffee, start drinking more tea. Let’s write this all down mathematically. For the demand I’ll use functions \(D_i(p_i) = \frac{k_i}{1 + p_i}\), modelling the fact that demand should be decreasing toward zero as price increases, and that there is some maximum demand \(k_i\) when the good is free (I don’t think it’s possible to consume an infinite amount of tea or coffee! ☕). Picking another perfectly good function, I’ll model the substitution effect with a hyperbolic tangent: \(T_i(p_{-i}) = c_i \mathsf{tanh}(p_{-i})\). This function has a plateau at \(c_i\), as \(p_{-i} \rightarrow \infty\), so the ratio \(c_c / k_\tau\) is measuring the proportion of tea drinkers that would switch to coffee as the price of tea increases. On the supply side, I’ll use a nice S-shaped function \(S_i(p_i) = \frac{m_i p_i}{1 + p_i}\) where the maximum producible amount of the good reaches another plateau at \(m_i\). This results in the nonlinear system of equations (using \(c\) for coffee and \(\tau\) for tea):

\begin{equation} \notag \begin{aligned} \text{coffee:}\quad& \frac{m_c p_c}{1 + p_c} - \frac{k_c}{1 + p_c} - c_c \mathsf{tanh}(p_\tau) = 0,\\ \text{tea:}\quad& \frac{m_\tau p_\tau}{1 + p_\tau} - \frac{k_\tau}{1 + p_\tau} - c_\tau \mathsf{tanh}(p_c) = 0. \end{aligned} \end{equation}

Solving these equations individually for \(p_c, p_\tau\) we obtain a nonlinear Jacobi algorithm:

\begin{equation} \notag \begin{aligned} p_c(t + 1) &= \frac{k_c + c_c \mathsf{tanh}(p_\tau(t))}{m_c - c_c \mathsf{tanh}(p_\tau(t))}\\ p_\tau(t + 1) &= \frac{k_\tau + c_\tau \mathsf{tanh}(p_c(t))}{m_\tau - c_\tau \mathsf{tanh}(p_c(t))}. \end{aligned} \end{equation}

In order for this to constitute a locally diagonally dominant system, we need to check the derivatives. It is a fairly straightforward calculation which, after cancelling some terms, results in the requirements \[\frac{m_i + k_i}{(1 + p_i^\star)^2} > c_i(1 - \mathsf{tanh}^2 (p_{-i}^\star)).\] The idea of “diagonal dominance” is clear in this equation: \(m_i, k_i\) are coefficients controlling the importance of the “diagonal” part of the function, with \(c_i\) controlling the coupling. If the coupling is weak: \(m_i + k_i \gg c_i\), then it can be expected that the system is diagonally dominant, and an equilibrium should be close to the “nominal” equilibrium \(p_i^\star \approx k_i / m_i\). Incidentally, this should give us a decent starting point for initializing Jacobi iteration.

Here’s a generic implementation of nonlinear Jacobi iteration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from typing import List, Callable


def jacobi_iteration(x, f_inv: List[Callable], g: List[Callable]):
    x_next = np.zeros_like(x)
    all_i = np.arange(len(x))
    for i, (f_inv_i, g_i) in enumerate(zip(f_inv, g)):
        x_next[i] = f_inv_i(-g_i(x[all_i != i]))
    return x_next


def jacobi_flow(x, f_inv: List[Callable], g: List[Callable]):
    return jacobi_iteration(x, f_inv, g) - x


def solve_nonlinear_system(
    F: Callable,
    f_inv: List[Callable],
    g: List[Callable],
    x0=None,
    eps=1e-6,
    maxiter=np.inf,
    D=0
):
    """Solves the nonlinear system F_i(x) = f_i(x_i) + g_i(x_{-i}) = 0.

    The algorithm's starting point is x0.  The algorithm is only guaranteed
    to converge if F is locally diagonally dominant around an equilibrium, and
    x0 is chosen nearby that equilibrium point.
    """
    x = np.array(x0) if x0 is not None else np.zeros(len(g))
    _iter = partial(jacobi_iteration, f_inv=f_inv, g=g)
    all_i = np.arange(len(x))

    it = 0
    while it < maxiter and np.linalg.norm(F(x)) > eps:
        x = D * x + (1 - D) * _iter(x)
        it += 1
    return x

The code follows a functional programming style that mimics the mathematical description as closely as possible. A particular implementation of our coffee-tea economy, including useful functions for plotting the progress of the algorithm, might look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from math import tanh
from functools import partial


def coffee_tea_solver(k, m, c, D=0.0):
    """A particular example with coffee and tea..."""

    def F(p):
        x0 = (m[0] * p[0] - k[0]) / (1 + p[0]) - c[0] * tanh(p[1])
        x1 = (m[1] * p[1] - k[1]) / (1 + p[1]) - c[1] * tanh(p[0])
        return (x0, x1)

    def f_inv(z, _k, _m):
        return (_k + z) / (_m - z)

    def g(p, _c):
        return -_c * tanh(p)

    f_invs = [
        partial(f_inv, _k=k[0], _m=m[0]),
        partial(f_inv, _k=k[1], _m=m[1]),
    ]
    gs = [partial(g, _c=c[0]), partial(g, _c=c[1])]

    _jacobi_iterator = partial(jacobi_iteration, f_inv=f_invs, g=gs)
    _jacobi_flow = partial(jacobi_flow, f_inv=f_invs, g=gs)
    _solver = partial(
        solve_nonlinear_system,
        F=F,
        f_inv=f_invs,
        g=gs,
        eps=1e-6,
        maxiter=1000,
        D=D
    )
    return _jacobi_iterator, _jacobi_flow, _solver, F

Here’s a plot of the flow of this system for the parmeter settings

\begin{equation} \begin{aligned} &k_c = 3/2, k_\tau = 2\\ &m_c = 5/3, m_\tau = 1/2\\ &c_c = 4/3, c_\tau = 1/3, \end{aligned} \end{equation}

which, at least intuitively, results in a system corresponding to our intuition of diagonal dominance. Increasing the values of \(c_c, c_\tau\) results in a breakdown of the algorithm and a failure to converge, either as a result of domain errors, or running to infinity. The SOR technique described earlier can potentially be used to increase the domain of convergence, but that is a matter for experimentation.

For this particular coffee-tea system, there is surely a lot more analysis that one could do in order to determine clearer convergence criteria, the ranges of parameters that will result in convergence, etc. Such analysis is worthwhile in actual practice, but we’re here for fun 🙃.

Conclusion

Our main point is in understanding the Jacobi iteration algorithm for solving systems of equations. The key assumption for this algorithm is that the system be diagonally dominant. Intuitively, this means that the equation associated to the \(i^{th}\) variable be only weakly coupled with the remaining variables \(x_{-i}\). We’ve seen how to formalize this in the case of linear equations, and how similar “rough guide” criteria can be obtained from the Hartman-Grobman Theorem in the nonlinear case.

In actual practice, Jacobi iteration is not likely to be the best nonlinear equation solver to use, though it depends upon the problem domain. The fact that it breaks down a system of equations into the parallel solution of univariate equations (for which completely general black-box algorithms like the Bisection Method are available) gives it a rather large potential domain of applicability. If you find yourself confronted with the need to solve a large system of nonlinear equations, available only through slow and black-box evaluation, and where the equations can be expected to be weakly coupled, Jacobi iteration might be for you 🤞.