Nonlinear Trajectory Optimization - Lecture 3
Getting started
Review
- Second order methods are important: We should always do this.
- Problem structure gave us a way to solve optimal control problems efficiently, with "shooting" or "sparsity".
- Direct optimal control is best if we aren't worried about real-time deployment.
- Recall the Newton's method with equality constraints, and the KKT system
Goals
- We can go in two directions:
- second-order indirect (iLQR, DDP)
- second-order direct (Direct Collocation, Multiple Shooting).
- We will focus on second-order direct.
I. Nonlinear Programming
- A nonlinear program is a cost, equality constraint, and inequality constraint,
\[\begin{align} \min_z &\quad f(z) \\ \text{s.t.}&\quad c(z) = 0 \\ &\quad d(z) \le 0 \end{align}\]
- Grab an off-the-shelf solver: IPOPT (free), SNOPT, KNITRO (commercial)
SQP (Sequential Quadratic Programming)
- Take a second order Taylor expansion of the cost and linearize the constraints (locally about a guess):
\[\begin{align} \min_z &\quad J(z) + g \Delta z + \tfrac{1}{2} \Delta z^T H \Delta z \\ \text{s.t.}&\quad c(z) + C \Delta z = 0 \\ &\quad d(z) + D \Delta z \le 0 \end{align}\]
- Lagrangian, $\mathcal{L}(z, \lambda, \mu) = J(z) + \lambda^T c(z) + \mu^T d(x)$
- Gradient: $g = \frac{\partial \mathcal{L}}{\partial z}$, Hessian: $H = \frac{\partial^2 \mathcal{L}}{\partial z^2}$, Jacobians: $C = \frac{\partial c}{\partial z}$, $D = \frac{\partial d}{\partial z}$
- Solve a QP to compute the primal-dual search direction,
\[\Delta s = \begin{bmatrix} \Delta z \\ \Delta \lambda \\ \Delta \mu \end{bmatrix}\]
Comments
- Play lots of tricks to leverage sparsity.
\[\text{SQP} \subset \text{SCP}\]
: Sequential Convex Programming puts constraints directly into the solver, without linearization.- We don't actually need rollouts: Direct Collocation maximally leverages this constraint structure.
Splines
- Cubic splines,
\[\begin{align} x(t) &= a_0 + a_1 t + a_2 t^2 + a_3 t^3 \\ \dot{x}(t) &= \phantom{a_0 + } a_1 + 2 a_2 t + 3 a_3 t^2 \end{align}\]
- Hermite splines use the left and right endpoint ($x_n = x(t)$, $x_{n+1}=x(t + h)$), and their derivatives,
\[\begin{align} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & h & h^2 & h^3 \\ 0 & 1 & 2h & 3 h^2 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix} &= \begin{bmatrix} x_n \\ \dot{x}_n \\ x_{n+1} \\ \dot{x}_{n+1} \end{bmatrix} \\ \Rightarrow \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ -\tfrac{3}{h^2} & -\tfrac{2}{h} & \tfrac{3}{h^2} & -\tfrac{1}{h} \\ \tfrac{2}{h^3} & \tfrac{1}{h^2} & -\tfrac{2}{h^3} & \tfrac{1}{h^2} \end{bmatrix} \begin{bmatrix} x_n \\ \dot{x}_n \\ x_{n+1} \\ \dot{x}_{n+1} \end{bmatrix} &= \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix} \end{align}\]
- The collocation point enforces the dynamics constraint using the value at some other point (the spline required free variables!),
\[\begin{align} x(t + \tfrac{h}{2}) &= \tfrac{1}{2}(x_n + x_{n+1}) + \tfrac{h}{8} (\dot{x}_n - \dot{x}_{n+1}) \\ &= \tfrac{1}{2}(x_n + x_{n+1}) + \tfrac{h}{8} (f(x_n, u_n) - f(x_{n+1}, u_{n+1}))\\ \, \\ u(t + \tfrac{h}{2}) &= \tfrac{1}{2} (u_n + u_{n+1}) \\ \, \\ \dot{x}(t + \tfrac{h}{2}) &= -\tfrac{3}{2h}(x_n - x_{n+1}) - \tfrac{1}{4} (\dot{x}_n + \dot{x}_{n+1}) \\ &= -\tfrac{3}{2h}(x_n - x_{n+1}) - \tfrac{1}{4} (f(x_n, u_n) + f(x_{n+1}, u_{n+1})) \end{align}\]
- Putting the dynamics into the constraint,
\[\begin{align} C_n(z) &= C_n(x_n, u_n, x_{n+1}, u_{n+1}) \\ &= f(x_{n + 1/2}, u_{n + 1/2}) - \dot{x}_{n + 1/2} \\ &= f\left( \tfrac{1}{2}(x_n + x_{n+1}) + \tfrac{h}{8} (f(x_n, u_n) - f(x_{n+1}, u_{n+1})), \tfrac{1}{2} (u_n + u_{n+1}) \right) \\&\quad- \left(-\tfrac{3}{2h}(x_n - x_{n+1}) - \tfrac{1}{4} (f(x_n, u_n) + f(x_{n+1}, u_{n+1})) \right) \\&\quad \overset{!}{=} 0 \end{align}\]
Achieves 3rd order accuracy (RK4 is Runge-Kutta 4th Order).
Requires fewer $f$ calls than RK3. Exercise: How?
II. IPOPT
From the IPOPT documentation (https://coin-or.github.io/Ipopt/OUTPUT.html):
inf_pr: The unscaled constraint violation at the current point. This quantity is the infinity-norm (max) of the (unscaled) constraints ($g_L≤g(x)≤g_U$ in (NLP)). During the restoration phase, this value remains the constraint violation of the original problem at the current point. The option infproutput can be used to switch to the printing of a different quantity.
inf_du: The scaled dual infeasibility at the current point. This quantity measure the infinity-norm (max) of the internal dual infeasibility, Eq. (4a) in the implementation paper [12], including inequality constraints reformulated using slack variables and problem scaling. During the restoration phase, this is the value of the dual infeasibility for the restoration phase problem.
lg(mu): log10 of the value of the barrier parameter μ.
||d||: The infinity norm (max) of the primal step (for the original variables x and the internal slack variables s). During the restoration phase, this value includes the values of additional variables, p and n (see Eq. (30) in [12]).
lg(rg): log10 of the value of the regularization term for the Hessian of the Lagrangian in the augmented system (δw in Eq. (26) and Section 3.1 in [12]). A dash ("-") indicates that no regularization was done.
alpha_du: The stepsize for the dual variables (αzk in Eq. (14c) in [12]).