Lyapunov Stability, Linear Systems, and Semidefinite Programming

Dynamical systems are ubiquitous models occuring in science, engineering, and mathematics. Not only are they used to model real-world dynamic phenomena like the dynamics of chemical plants, population growth, and physical engineered systems, they can also be applied to model algorithms themselves. This post focuses on linear dynamical systems, their analysis by means of semidefinite programming, and connections with control theory through the computation of quadratic functionals of their paths.

Introduction

This post introduces the Lyapunov approach to stability and convergence analysis. While this approach and its connection to semidefinite programming may seem like “overkill” for linear systems (where you can just look at the eigenvalues of the system matrix), I believe it is an insightful starting point. Analysis based on Lyapunov functions generalizes naturally to both nonlinear and stochastic systems, and the connection with semidefinite programming can be exploited to derive an analysis which may be, in some sense, optimal.

References

Much of the material in this post is derived or inspired from: Boyd, Stephen, Laurent El Ghaoui, Eric Feron, and Venkataramanan Balakrishnan. Linear matrix inequalities in system and control theory. Society for industrial and applied mathematics, 1994. This is an easily accessible book on linear matrix inequalities. I have also drawn some ideas and inspiration from the Convex Optimization book (also freely available ) Boyd, Stephen, Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004. and have made use of cvxpy for many some of the examples: Diamond, Steven, and Stephen Boyd. “CVXPY: A Python-embedded modeling language for convex optimization.” The Journal of Machine Learning Research 17, no. 1 (2016): 2909-2913.

Linear Dynamical Systems

Linear dynamical systems are generally characterized by a matrix $A \in \R^{n \times n}$ which describes how the state vector $x \in \R^n$ changes through time. For continuous time dynamical systems, we work with ordinary differential equations

\begin{equation} \frac{\mathsf{d}}{\mathsf{d} t} x(t) = Ax(t), \end{equation}

usually written simply as $\dot{x} = Ax$ where it is to be understood that $x$ is a continuous function of time $t$ and $\dot{x}$ denotes differentiation with respect to $t$. In this case, $A$ describes a velocity field in state space – each point $x \in \R^n$ is associated to some velocity $Ax$. Analogously, for discrete time dynamical systems, we have an iterative relationship

\begin{equation} x(t + 1) = Ax(t), \end{equation}

where in this case the matrix $A$ specifies the next state of the system.

Remark: I like to use $t$ for both discrete and continuous time. The reader is trusted to recognize which is which (or if it matters at all) from the context.

Somewhat more generally, we could work with systems having some constant offset $b \in \R^n$ as in $\dot{x} = Ax + b$. However, if $A$ is a full-rank matrix (meaning that it is invertible), then this additional offset term is not interesting. Indeed, we can just as-well shift the coordinates of the system and instead work with $\dot{z} = Az$ where $z = x - A^{-1} b$. Anything interesting that can be established for $x$ can be done so by first doing it for $z$. If the matrix $A$ is not invertible, then there is a subspace which the matrix $A$ has no effect upon, and this subspace can be analyzed separately. It will become more-or-less obvious how to do this as we proceed, so I will make the standing assumption that $A$ is full-rank and $b = 0$.

Remark: The case of linear systems with a general input $\dot{x}(t) = Ax(t) + Bu(t)$ on the other hand are quite interesting, and is likely to be the topic of some future post.

Examples

Firstly, it should be understood that much of the analysis of nonlinear dynamical systems, i.e., ordinary differential equations of the form $\dot{x} = f(x)$, comes down to the linear case. The reason lies in the Hartman-Grobman Theorem (which we have also already seen ): the nonlinear system $\dot{x} = f(x)$ can be approximated by the linear dynamical system $\dot{x} = Ax$ for in a neighbourhood of an equilibrium point (i.e., point $x_e$ such that $f(x_e) = 0$) with the matrix $A = \mathsf{D} f(x_e)$, the derivative of $f$. More examples follow.

Gradient Descent

One of the modern drivers of interest in systems theory is for applications in machine learning, or more specifically, optimization algorithms. One of the most famous classical algorithms for optimization is gradient descent: given a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ that we want to minimize, the gradient descent algorithm proceeds through the iterates \[x(t + 1) = x(t) - \alpha \nabla f(x(t)),\] where $\nabla f(x) = (\mathsf{D} f(x))^{\mathsf{T}}$ is the gradient and $\alpha > 0$ is a step-size parameter. In general, this is a nonlinear system. However, there is a particularly important special case when $\nabla f(x)$ is a linear function of $x$: when $f(x) = \frac{1}{2}x^{\mathsf{T}} Q x$ is a quadratic function. In this case, $\nabla f(x) = Qx$ and gradient descent \[x(t + 1) = (I - \alpha Q)x(t)\] is a linear system with $A = I - α Q.$

Electronic Circuits

Another interesting example comes from electrical engineering. In the design of electronic circuits, there are devices called capacitors (basically two metal plates sandwiched next to each other) which can store a small electric charge and then subsequently release it. Similarly, inductors (coils of wire) store magnetic energy, and later release it.

Using these devices, along with simple resistors, we can construct an RLC circuit – a circuit consisting of resitors, inductors, and capacitors (the ‘L’ for inductors coming from the scientist Heinrich Lenz). One of the simplest such circuits is to just place these three elements in a loop with each other.

Using Kirchoff’s voltage law , which says that the total voltage differences around a loop must sum to zero, we can obtain the equation $V_R(t) + V_L(t) + V_C(t)$ where $V_R, V_L, V_C$ are the voltages across the resistor, inductor, and capacitor at time $t$, respectively.

The needed facts for this analysis are the the current-voltage relationships: $V_R(t) = RI_R(t),$ $V_L(t) = L\dot{I}_L(t)$, and $V_C(t) = \frac{1}{C}\int_0^t I_C(\tau)\mathsf{d}\tau$ with $I_R, I_L, I_C$ being the currents passing through the devices, and $R, L, C$ denoting the “size” of the devices in units of Ohms (resistance), Henries (inductance), and Farads (capacitance). These are nothing but functions which describe how the devices operate. The other key observation is that, since these devices are all placed in series one after the other along the same wire: $I_R = I_L = I_C$! Thus, we have obtained the integro-differential equation:

\begin{equation} \begin{aligned} RI(t) + L\dot{I}(t) + \frac{1}{C}\int_0^t I(\tau) \mathsf{d} \tau &= 0\\ \implies R\dot{I}(t) + L\ddot{I}(t) + \frac{1}{C}I(t) &= 0, \end{aligned}\notag \end{equation}

which by differentiating through the whole thing results in a second order ODE. We now intend to play a clever trick with this equation – a common source of linear systems. We think of the state variables as being $x(t) = \bigl(\dot{I}(t), I(t)\bigr)$, and by using the relationship between these derivatives described above we obtain:

\begin{equation} \begin{bmatrix} \ddot{I}\\ \dot{I} \end{bmatrix}= \begin{bmatrix} -\frac{R}{L} & -\frac{1}{CL}\\ 1 & 0\\ \end{bmatrix} \begin{bmatrix} \dot{I}\\ I \end{bmatrix}, \notag \end{equation}

which is a linear dynamical system. Simulation of this system is straightforward.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


from scipy.integrate import solve_ivp

def simulate(t, R, C, L, I0=1.0, I_dot0=0.0):
    A = np.array([[0, 1.0], [-1.0 / (C * L), -R / L]])

    def ode(I):
        return A @ I

    res = solve_ivp(
        fun=lambda t, y: ode(y),
        t_span=(t[0], t[-1]),
        y0=np.array([I0, I_dot0]),
        t_eval=t
    )
    return (res.y[0, :], res.y[1, :])

The reader unfamiliar with this trick is encouraged to study this example, as the technique is very commonly used. The behaviour of the system can now be understood by analyzing the dynamics of this ODE given some set of initial conditions $I(0), \dot{I}(0)$.

Semidefinite Programming

In convex optimization, a semidefinite program is an optimization problem involving a matrix variable $X$, a linear cost function $J(x) = \mathsf{tr}\ C^{\mathsf{T}} X = \sum_{i, j} C_{ij} X_{ij}$, a linear equality constraint $\mathcal{A}(X) = 0$, and a constraint $X \succeq 0$, which means that $X$ must be a positive semidefinite matrix. If you are not familiar with what this is, it is a matrix which is (1) symmetric and (2) has non-negative eigenvalues. A generic SDP is as follows:

\begin{equation} \begin{aligned} \underset{X \in \R^{n \times n}}{\text{minimize}}&\quad \mathsf{tr}\ C^{\mathsf{T}} X\\ \text{subject to} &\quad \mathcal{A}(X) = 0\\ &\quad X \succeq 0. \end{aligned}\notag \end{equation}

The linear function $\mathcal{A}(X) = 0$ can be any linear function of $X$ – examples include $AX = 0$, $AXA^{\mathsf{T}} - X + Q = 0$, etc. Of course, much more general SDPs are possible, but this will be adequate for our purposes.

Practically speaking, solving (relatively small: $n < 100$) SDPs is fairly straightforward. Here is a basic CVX Program to solve an SDP we will encounter shortly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


from math import inf
import scipy.linalg as la
import cvxpy as cvx


def solve_lyapunov_sdp(A, g, verbose=True):
    """Solves the feasibility problem P > 0, (1 - g) * P - A' P A > 0."""
    n, _ = A.shape
    P = cvx.Variable(shape=(n, n), symmetric=True)
    I = np.eye(n)

    objective = cvx.Minimize(0)
    constraints = [
        P >> I,  # P is positive semi-definite
        (1 - g) * P - A.T @ P @ A >> I
        ]
    problem = cvx.Problem(objective, constraints=constraints)
    problem.solve(solver="CVXOPT", verbose=verbose)
    return problem.value, P.value

Given a matrix $A$, and a constant $\gamma$, this program will solve a semidefinite feasibility program to find a matrix $P \succeq I$ such that $(1 - \gamma) P - A^{\mathsf{T}} P A \succeq I$. This is actually just a computational means of solving a special type of feasability problem called a linear matrix inequality (LMI). Usually, LMIs are strict inequalities where it is required to find $P \succ 0$ and $(1 - \gamma) P - A^{\mathsf{T}} P A \succ 0$, and any such $P$ will do. However, since optimization problems are not usually well defined when strict inequalities are involved (since the feasible region is then not closed, c.f., Weierstrass’ Theorem ) it is necessary to modify the LMI to involve non-strict inequalities. Doing this in a naive way, simply replacing $\succ$ by $\succeq$, may result in a trivial output $P = 0$. However, since the LMI is linear in $P$, we can just scale the inequality and ask that it be $\succeq I$ – indeed, if any $P \succ 0$ exists which satisfies the inequality, there must also exist one satisfying $P \succeq I$.

The meaning of this LMI will be explained next.

Lyapunov Stability Theory

The most important question to answer when it comes to linear systems is stability. The term stability is used to invoke the idea of a system which, when slightly perturbed, returns back to its nominal position. However, it might be somewhat of a misnomer, as what is often really meant is convergence: given some starting point $x(0)$, does the linear system satisfy $x(t) \rightarrow 0\ \text{as}\ t \rightarrow \infty$. There are many methods of determining this, the most natural being an analysis of the eigenvalues of the system matrix $A$. Analyzing the eigenvalues however, is a method which is quite particular to the case of simple linear systems. Another method, which has a much greater deal of potential for generalization, is the method of analysis by means of Lyapunov Stability Theory.

Lyapunov Stability

One of the most important techniques for establishing system stability is Lyapunov’s direct method – I’ll explain the idea for discrete time systems, although much of the literature focuses on the continuous time case.

The idea is to find a so-called Lyapunov function $V: \R^n \rightarrow \R$ on the state space which captures some generalized notion of “kinetic energy” of the system. A Lyapunov function must satisfy a few properties. The first of which is that $V(0) = 0,$ but $V(x) > 0$ everwhere else (i.e., everywhere that $x \ne 0$). I will call this property positivity, and it is this property that makes a Lyapunov function analogous to a generalized notion of kinetic energy: it is positive except at an equilibrium point when the system stops moving. The next key property is that $V$ must be constructed such that its value decreases along the path traced by the system. That is, if it can be shown that $V(x(t + 1)) < V(x(t))$ for the system $x(t + 1) = Ax(t)$, with $x(0) = x_0 \in \R^n$ being an arbitrary initial condition, then the system can be expected to converge towards $0$.

These two properties (positivity and decreasing along trajectories) are not the whole story though. The additional technical property of coerciveness is required: $||x|| \rightarrow \infty \implies V(x) \rightarrow \infty$. This appears like an esoteric technical requirement, but it is essential. If $V$ is not coercive, then the level sets of $V$, the sets $L_\alpha = \{x \in \R^n\ |\ V(x) \le \alpha\}$ may be unbounded, and the trajectory $x(t)$ can drift off $x(t) \rightarrow \infty$ while $V(x(t))$ is stil decreasing.

In the linear case, excellent Lyapunov function candidates (i.e., functions $V$ which you hope will serve as Lyapunov functions) are quadratic forms $V(x) = x^{\mathsf{T}} P x$, for some matrix $P \in \R^{n \times n}$. It is quite easy to determine when these functions are positive – this is the case exactly when $P \succ 0$ (the matrix $P$ is positive-definite), and moreover, $V(x)$ will be coercive when $P \succ 0$ (the matrix $P$ is positive definite). Thus, it makes sense to refer to positive coercive functions $V$ as being positive definite; we can refer to a Lyapunov function for a system as being a positive definite function which is decreasing along trajectories of the system.

So, in the linear case we have the candidate function $V(x) = x^{\mathsf{T}} P x$ and the requirement $P > 0$. All that remains is to figure out if $V(x(t + 1)) < V(x(t))$ along trajectories of the system. This can be established by verifying $V(Ax) < V(x)$ (why?), that is:

\begin{equation} \begin{aligned} \forall x \in \R^n:\ V(Ax) - V(x) < 0 &\iff \forall x \in \R^n:\ x^{\mathsf{T}}A^{\mathsf{T}}PAx - x^{\mathsf{T}}Px < 0\\ &\iff \forall x \in \R^n:\ x^{\mathsf{T}}\bigl(A^{\mathsf{T}} P A - P\bigr)x < 0\\ &\iff A^{\mathsf{T}} P A - P \prec 0. \end{aligned}\notag \end{equation}

If we are able to simply find some matrix $P \succ 0$ such that $V(x)$ is a Lyapunov function, then we have established the convergence of our system. The astute reader will notice that this is a linear matrix inequality and can be solved by means of the above semidefinite feasibility program (with $\gamma = 0$).

Remark: Lyapunov stability is a fundamental tool for establishing convergence in nonlinear systems. One of the most elegant examples is that of the gradient flow $\dot{x} = -\nabla f(x)$ for a strongly convex function $f$ (making this assumption for simplicity). In this case, the function $f$ itself can serve as a Lyapunov function: That is, $\frac{\mathsf{d}}{\mathsf{d}t} f(x) = \nabla f(x)^{\mathsf{T}} \dot{x} = -||\nabla f(x)||_2^2 < 0$ except at $x = x^\star$, the minimizer where $\nabla f(x^\star) = 0$. Incidentally, this also tells us that the gradient flow monotonically decreases the function $f$. Convergence proofs of many algorithms come down to constructing an appropriate Lyapunov function.

Example

Consider a simple example with

\begin{equation} A = \begin{bmatrix} 0.750 & 1.00\\ -0.667 & 0.111 \end{bmatrix}. \notag \end{equation}

Notice that $A$ does not have any particularly simple properties (triangularity, diagonal dominance…) that would allow us to determine at a glance (unless we are rather speedy when it comes to calculating eigenvalues of $2 \times 2$ matrices!) that the system $x(t + 1) = Ax(t)$ is stable. However, we can plug this matrix into our trusty semidefinite program and obtain a matrix

\begin{equation} P = \begin{bmatrix} 8.419 & 3.636\\ 3.636 & 12.109 \end{bmatrix}, \notag \end{equation}

which proves stability by Lyapunov’s method.

Figure 3: Demonstrating that (V(x(t))) decreases

I’ve plotted above a figure showing the value of various functions of the state $x(t)$ for the example. The main observation is that while they all are roughly “decreasing”, it is only the Lyapunov function which is decreasing monotonically. That is not to say that this Lyapunov function is the only function with this property – there are many Lyapunov functions. More subtly, it may also be possible to find starting points such that some other arbitrary function decreases monotonically towards zero from that particular system initialization, but a Lyapunov function will do so from any starting point.

Fast Convergence

Can we modify Lyapunov’s method to somehow establish fast convergence? The answer is yes. Consider the stronger condition:

\begin{equation} \begin{aligned} V(Ax) - V(x) &< -\gamma V(x)\\ \implies V(Ax) &< (1 - \gamma)V(x)\\ \implies V(A^2 x) &< (1 - \gamma)V(Ax) < (1 - \gamma)^2 V(x)\\ &\vdots\\ \implies V(A^T x) &< (1 - \gamma)^T V(x). \end{aligned}\notag \end{equation}

which is establishing a rate of convergence on the Lyapunov function – after $T$ iterations of the system, the value must have decreased by a factor of $(1 - \gamma)^T$. It is quite convenient to recognize that the existence of a quadratic Lyapunov function $V_\gamma$ which verifies this fast convergence is equivalent to finding an ordinary quadratic Lyapunov function for the modified system with $A_\gamma = \frac{1}{\sqrt{1 - \gamma}} A$.

How do we incorporate the search for some such $\gamma$ into our SDP? All we need is that $A^{\mathsf{T}}PA - (1 - \gamma)P \prec 0$, and indeed, we would like to find the largest satisfactory $\gamma$. Unfortunately, I do not believe it is possible to directly encode optimization over $\gamma$ into a convex SDP, since the product $γ P $, where both $\gamma$ and $P$ are variables, is non-convex. Instead, since $\gamma \in \R$ is just a single parameter, we can apply bisection to find an optimal $\gamma$:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


from math import inf

def is_feasible(A, g, verbose=True):
    """Checks if P > 0, (1 - g) * P - A' P A > 0 is feasible."""
    val, _ = solve_lyapunov_sdp(A, g, verbose=verbose)
    return val < inf

def find_optimal_convergence_rate(A, num_iter=20):
    right = 1.0
    left = 0.0
    for it in range(num_iter):
        g = left + (right - left) / 2
        if is_feasible(A, g, verbose=False):
            left = g
        else:
            right = g
    return g

By solving the feasability SDP multiple times, we obtain a value of $\gamma$ such that $V(A^T x) < (1 - \gamma)^T V(x)$, establishing a fast convergence rate for the system. It should be noted that specialized algorithms to solve the Lyapunov inequality $A^{\mathsf{T}} P A - P \prec 0$, but doing so by semidefinite programming opens up a more general array of possibilities, which I turn to next.

Remark: It is possible to represent this optimization problem as a joint optimization problem over $P, \gamma$, but not as a convex optimization problem. However, the resulting problem is quasiconvex.

Stability of Uncertain Linear Systems

So far, I’ve talked about linear systems with a single and perfectly well known matrix $A$. However, in many applications it may be the case that the matrix $A$ is not known exactly, or that it is only an approximation of a real system. For this reason, we would like to establish some degree of robust stability guarantee. To this end, let us suppose that we have some uncertainty set $\mathbf{A} \subset \R^{n \times n}$ and that our system matrix $A \in \mathbf{A} \subset \R^{n \times n}$. If this is the case, then we can expect our linear system to satisfy a linear recursive inclusion:

\begin{equation} x(t + 1) \in \mathbf{A} x(t) \overset{\Delta}{=} \{A x(t)\ |\ A \in \mathbf{A}\}. \notag \end{equation}

That is, the next point $x(t + 1)$ is obtained from $x(t)$ by simple multiplication by a matrix $A$, we just don’t know which matrix $A$; all we know is that $A \in \mathbf{A}$. Amazingly, establishing the stability of this system can still be done by looking for a Lyapunov function $V(x) = x^{\mathsf{T}} P x$ with $P \succ 0$ and such that $\forall x \in \R^n, A \in \mathbf{A}:\ V(A x) - V(x) < 0$ and this is equivalent to the collection of LMIs: \[P \succ 0, \forall A \in \mathbf{A}:\ A^{\mathsf{T}} P A - P \prec 0.\]

Of course, not all such LMIs can be solved – it depends on the model of $\mathbf{A}$. A quite flexible model is furnished by polytopic sets: $\mathbf{A} = \mathsf{conv}\{ A_k\ |\ k \in [K] \}$, that is, the convex hull of a finite number of matrices $A_k$, i.e., for any $A \in \mathbf{A}$ there exists some $\mu \in [0, 1]$ and matrices $A_i, A_j$ such that $A = \mu A_i + (1 - \mu) A_j$. These sets can be used to construct arbitrarily accurate approximations of any convex set $\mathbf{A}$.

Then, using the fact that $V(x)$ is convex if $P \succ 0$, we have $V(\mu A_i x + (1 - \mu) A_j x) \le \mu V(A_i x) + (1 - \mu) V(A_j x)$ and therefore a sufficient condition for the stability of the recursive inclusion is to find some $P \succ 0$ which satisfies the system of $K$ LMIs: \[P \succ 0, \forall k \in [K]:\ A_k^{\mathsf{T}} P A_k - P \prec 0.\]

Speaking intuitively, if such a $P$ exists for some uncertain linear system, we can make a conclusion about the robust stability of the system.

Example – Gain Margins and Momentum Gradient Descent

One of the main motivations for considering uncertain linear systems arises from control theory. A very common situation is that we have some system matrix $A$, as well as a gain matrix $K$ which we have designed. The gain matrix $K$ is intended to be used to steer the system $x(t + 1) = A x(t)$ towards zero by means of a feedback control $x(t + 1) = Ax(t) - Kx(t)$. Importantly, the system $A$ may very well be unstable, in which case the matrix $K$ is absolutely essential.

Suppose that we have some $K$ in hand which makes the controlled system $x(t + 1) = (A - K)x(t)$ stable. The idea of a gain margin is to determine if for some interval $[\alpha_1, \alpha_2]$ that the modified system $x(t + 1) = (A - \alpha K) x(t)$ remains stable, for any $\alpha \in [\alpha_1, \alpha_2]$. If this interval is wide, then it may give us some assurance that the matrix $K$ is robust, in some sense. The collection of system matrices \[\mathbf{A} = \{A - \alpha K\ |\ \alpha_1 \le \alpha \le \alpha_2\}\] forms a natural polytopic set.

Gradient descent constitutes a perfectly good example of this situation. For a quadratic function $f(x) = \frac{1}{2}x^{\mathsf{T}} Q x$ we have non-stable natural dynamics $x(t + 1) = x(t)$, a controller $Q$, and the stepsize $\alpha$ fills in for the gain margin, giving us $x(t + 1) = (I - \alpha Q)x(t)$. To make this a bit more interesting, we can augment this system with a momentum term by introducing a velocity $v(t)$:

\begin{equation} \begin{aligned} v(t + 1) &= \gamma v(t) + \alpha Q x(t)\\ x(t + 1) &= x(t) - v(t + 1). \end{aligned}\notag \end{equation}

These equations are known as gradient descent with momentum . The idea is that if we are heading in some “nominal” direction $v(t)$, we might as well keep going in that direction, and just add the gradient step $\alpha Qx(t)$ to our velocity. A slight modification of this momentum scheme, Nesterov’s momentum, is known to be optimal in a certain sense.

In any case, the ordinary Momentum gradient descent algorithm corresponds to the (block) system matrix

\begin{equation} A_{\gamma, \alpha} = \begin{bmatrix} \gamma I & \alpha Q\\ -\gamma I & I - \alpha Q \end{bmatrix}\notag \end{equation}

for the joint system $\bigl(v(t), x(t)\bigr)$.

Now, if you can construct a Lyapunov function (by solving a semidefinite program) for $\gamma, \alpha \in \{\gamma_1, \gamma_2\} \times \{\alpha_1, \alpha_2\}$ (i.e., all four corners of a square) then you will have proof that the algorithm will converge for every pair $\gamma, \alpha \in [\gamma_1, \gamma_2] \times [\alpha_1, \alpha_2]$.

Remark: This example is quite simplistic, but it is illustrative of general and powerful techniques . Specifically, if you can construct and solve an appropriate SDP analytically, then it can serve as a constructive proof of convergence. As well, SDPs can potentially be used to find optimal parameters for algorithms by similar means as were used earlier to calculate convergence rates by means of bisection search.

Quadratic Integrals

As a final illustration of the power of the Lyapunov approach, let’s consider how to evaluate infinite quadratic integrals (or summations, in the discrete case). Say we have a linear system $x(t + 1) = Ax(t)$, and are interested in computing the value of a of the path of this system according to the quadratic functional

\begin{equation} \begin{aligned} J_Q(x_0) = \sum_{t = 0}^\infty x(t)^{\mathsf{T}} Q x(t); x(0) = x_0, \end{aligned}\notag \end{equation}

where $Q \succ 0$ is some arbitrary positive definite matrix. Such functions arise often in control theory.

To see the connection with Lyapunov theory, recognize that we can fully expand out the summation in terms of $x_0$ by using the fact that $x(t) = A^t x_0$:

\begin{equation} \begin{aligned} J_Q(x_0) &= \sum_{t = 0}^\infty \bigl(A^t x_0 \bigr)^{\mathsf{T}} Q \bigl(A^t x_0)\\ &= x_0^{\mathsf{T}} \Bigl( \sum_{t = 0}^\infty (A^t)^{\mathsf{T}} Q A^t \Bigr)x_0\\ &= x_0^{\mathsf{T}} Q x_0 + \Bigl( \sum_{t = 1}^\infty (A^t)^{\mathsf{T}} Q A^t \Bigr)x_0\\ &= x_0^{\mathsf{T}} \Bigl(Q + \sum_{t = 1}^\infty (A^t)^{\mathsf{T}} Q A^t \Bigr)x_0\\ &= x_0^{\mathsf{T}} \Bigl(Q + A^{\mathsf{T}} \bigl[\sum_{t = 0}^\infty (A^t)^{\mathsf{T}} Q A^t \bigr]A \Bigr)x_0\\ &= x_0^{\mathsf{T}} Q x_0 + J(Ax_0). \end{aligned}\notag \end{equation}

Thus, since $J$ is a quadratic function of $x_0$, there must be some $P$ such that $J(x_0) = x_0^{\mathsf{T}} P x_0$ and therefore $J(x)$ satisfies, for any starting point $x \in \R^n$:

\begin{equation} \begin{aligned} J(x) &= x^{\mathsf{T}} P x\\ &= x^{\mathsf{T}}[Q + A^{\mathsf{T}} P A] x, \end{aligned}\notag \end{equation}

or in other words, $P - A^{\mathsf{T}} P A = Q$. What this tells us is that if we have a matrix $P \succ 0$ verifying the Lyapunov inequality $P - A^{\mathsf{T}} P A \succ 0$, then not only do we have a Lyapunov function, but there is some positive definite matrix $Q = P - A^{\mathsf{T}} P A$ such that $x^\mathsf{T} P x$ is the value of the quadratic functional $J_Q(x)$. Conversely, if we can solve the Lyapunov equation $P - A^{\mathsf{T}} P A = Q$ for some $P \succ 0$, then we have a Lyapunov function, as well as a function for easily evaluating the value of $J_Q(x)$, namely, $J_Q(x) = x^{\mathsf{T}} P x$.

I intend to explore the ideas surrounding quadratic functionals in more detail in a future post. These functions can be used to tie together semidefinite programming duality, stochastic linear systems (with random disturbances), as well as the classical methods of stochastic optimal control: the Kalman filter and the linear quadratic regulator.

Remark: Lyapunov’s equation $P - A^{\mathsf{T}} P A = Q$ is a linear equation in $P$ and can be solved efficiently. In Python, the function scipy.linalg.solve_discrete_lyapunov can be used for this purpose. Lyapunov equations arise in a number of different ways, so it is a useful pattern to have in mind.

Conclusion

My purpose in writing this has been to explore the Lyapunov approach to analyzing the stability of linear dynamical systems. This is only one of many possible approaches to this analysis, but it also generalizes (in various directions) much more easily than does a direct analysis of the eigenvalues of the system matrix. Indeed, the Lyapunov approach can also lead to global convergence theorems for nonlinear systems, whereas the Hartman-Grobman theorem combined with an eigenvalue analysis can only lead to a local convergence result. Moreover, the semidefinite programming perspective on this problem can also generalize greatly, and opens the door to constructing optimal algorithms. The tradeoff with the Lyapunov approach is that constructing an appropriate Lyapunov function can be extremely difficult, and there is no ready-made recipe for doing so – it often comes down to your creativity!