滤波方法(六):无迹卡尔曼滤波(UKF)

上一篇介绍了 EKF。它通过一阶泰勒展开(雅可比矩阵)把非线性系统线性化后代入卡尔曼方程。但线性化有两个根本缺陷:强非线性下截断误差大,且要求函数处处可微。Julier 和 Uhlmann 在 1997 年提出了完全不同的思路:不在函数上做近似,而是在分布上做近似。这就是无迹卡尔曼滤波(Unscented Kalman Filter, UKF)。

与 EKF 的对比

EKF 近似的是函数——用切线代替曲线。UKF 近似的是分布——用一组采样点代替高斯分布,让这些点各自通过真实的非线性函数,再用变换后的点集重建新的均值和协方差。这避开了雅可比计算,也保留了更高阶的统计信息。对高斯分布而言,UKF 的精度达到三阶泰勒展开等价,而 EKF 仅为一阶。

核心思想

UKF 在状态的均值和协方差周围选取一组称为西格玛点(sigma points)的确定性采样点,将这些点各自独立地通过非线性函数,然后用变换后的点集重建输出的均值和协方差。对于 $n$ 维状态,共生成 $2n+1$ 个西格玛点。

西格玛点采样

设当前估计为 $\hat{x}$,协方差为 $P$,尺度参数 $\lambda = \alpha^2(n + \kappa) - n$,其中 $\alpha$ 和 $\kappa$ 是控制西格玛点扩散范围的参数。西格玛点定义为:

$$\mathcal{X}_0 = \hat{x}$$

$$\mathcal{X}_i = \hat{x} + \left( \sqrt{(n+\lambda)P} \right)_i, \quad i = 1, \dots, n$$

$$\mathcal{X}_{i+n} = \hat{x} - \left( \sqrt{(n+\lambda)P} \right)_i, \quad i = 1, \dots, n$$

这里 $\left( \sqrt{(n+\lambda)P} \right)_i$ 表示矩阵 $(n+\lambda)P$ 的 Cholesky 分解的第 $i$ 列。

对应的权重为:

$$W_0^{(m)} = \frac{\lambda}{n+\lambda}, \quad W_0^{(c)} = \frac{\lambda}{n+\lambda} + (1 - \alpha^2 + \beta)$$

$$W_i^{(m)} = W_i^{(c)} = \frac{1}{2(n+\lambda)}, \quad i = 1, \dots, 2n$$

其中 $\beta$ 是刻画先验分布高阶矩的参数(对高斯分布取 $\beta = 2$ 最优)。

预测-更新递推

预测步——将西格玛点通过非线性状态转移函数 $f$,加权重建先验估计:

$$\hat{x}{k|k-1} = \sum_i)$$}^{2n} W_i^{(m)} f(\mathcal{X

$$P_{k|k-1} = \sum_{i=0}^{2n} W_i^{(c)} \left[ f(\mathcal{X}i) - \hat{x}} \right] \left[ f(\mathcal{Xi) - \hat{x} \right]^T + Q_k$$

更新步——将先验西格玛点通过非线性观测函数 $h$,加权重建观测预测和交叉协方差:

$$\hat{z}{k|k-1} = \sum_i)$$}^{2n} W_i^{(m)} h(\mathcal{X

$$P_{zz} = \sum_{i=0}^{2n} W_i^{(c)} \left[ h(\mathcal{X}i) - \hat{z}} \right] \left[ h(\mathcal{Xi) - \hat{z} \right]^T + R_k$$

$$P_{xz} = \sum_{i=0}^{2n} W_i^{(c)} \left[ \mathcal{X}i - \hat{x}} \right] \left[ h(\mathcal{Xi) - \hat{z} \right]^T$$

$$K_k = P_{xz} P_{zz}^{-1}$$

$$\hat{x}{k|k} = \hat{x})$$} + K_k (z_k - \hat{z}_{k|k-1

$$P_{k|k} = P_{k|k-1} - K_k P_{zz} K_k^T$$

可视化

下图的 GIF 在同一个非线性追踪场景中对比了 EKF 和 UKF 的表现。UKF 通过西格玛点传播分布,不需要计算雅可比矩阵,在转弯模型下的估计精度明显优于 EKF。

滤波06-图1 UKF与EKF对比:非线性转弯模型追踪

Python 实现

下面的代码在同一个圆周运动追踪场景中对比 EKF 与 UKF。UKF 采用 $2n+1=9$ 个西格玛点(状态维数 $n=4$),通过非线性状态转移函数 $f$ 后加权重建均值和协方差,再通过非线性观测函数 $h$ 重建预测观测与交叉协方差。输出轨迹对比图与误差图(SVG/PNG)、逐步展开的 GIF,以及记录 $k$、true、ekf、ukf、errors 的 CSV。

import os
os.makedirs('images', exist_ok=True)

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import csv

np.random.seed(42)

# ---------- 圆周运动真值 ----------
DT = 0.1
N = 120
OMEGA = 0.3
R_TRUE = 1.0
Q = np.diag([0.01, 0.01, 0.01, 0.01])
R_OBS = np.diag([0.05**2, 0.02**2])
N_STATE = 4

def true_trajectory():
    tr = []
    for k in range(N):
        th = OMEGA * k * DT
        px, py = R_TRUE * np.cos(th), R_TRUE * np.sin(th)
        vx, vy = -R_TRUE * OMEGA * np.sin(th), R_TRUE * OMEGA * np.cos(th)
        tr.append(np.array([px, py, vx, vy]))
    return np.array(tr)

def h_obs(x):
    px, py = x[0], x[1]
    return np.array([np.hypot(px, py), np.arctan2(py, px)])

def H_jac(x):
    # Jacobian matrix of observation function h(x)
    px, py = x[0], x[1]
    r = np.hypot(px, py) + 1e-12
    H = np.zeros((2, 4))
    H[0, 0] = px / r; H[0, 1] = py / r
    H[1, 0] = -py / (r * r); H[1, 1] = px / (r * r)
    return H

def f_ekf(x):
    px, py, vx, vy = x
    c, s = np.cos(OMEGA * DT), np.sin(OMEGA * DT)
    vx_n = c * vx - s * vy; vy_n = s * vx + c * vy
    return np.array([px + vx_n * DT, py + vy_n * DT, vx_n, vy_n])

def F_jac(x):
    # Jacobian matrix of state transition function f(x)
    c, s = np.cos(OMEGA * DT), np.sin(OMEGA * DT)
    return np.array([[1, 0, c * DT, -s * DT],
                     [0, 1, s * DT,  c * DT],
                     [0, 0, c,      -s],
                     [0, 0, s,       c]])

# ---------- EKF ----------
def ekf(zs):
    x = np.array([1.0, 0.0, 0.0, OMEGA * R_TRUE])
    P = np.eye(4) * 0.1
    out = []
    for z in zs:
        F = F_jac(x); x = f_ekf(x)
        P = F @ P @ F.T + Q
        H = H_jac(x); zp = h_obs(x)
        S = H @ P @ H.T + R_OBS
        K = P @ H.T @ np.linalg.inv(S)
        x = x + K @ (z - zp)
        P = (np.eye(4) - K @ H) @ P
        out.append(x.copy())
    return np.array(out)

# ---------- UKF ----------
ALPHA, BETA, KAPPA = 1e-3, 2.0, 0.0
LAMBDA = ALPHA ** 2 * (N_STATE + KAPPA) - N_STATE

def sigma_points(x, P):
    sqrt = np.linalg.cholesky((N_STATE + LAMBDA) * P)
    pts = [x]
    for i in range(N_STATE):
        pts.append(x + sqrt[:, i])
        pts.append(x - sqrt[:, i])
    return np.array(pts)

def ukf_weights():
    Wm = np.full(2 * N_STATE + 1, 1.0 / (2 * (N_STATE + LAMBDA)))
    Wc = np.full(2 * N_STATE + 1, 1.0 / (2 * (N_STATE + LAMBDA)))
    Wm[0] = LAMBDA / (N_STATE + LAMBDA)
    Wc[0] = LAMBDA / (N_STATE + LAMBDA) + (1 - ALPHA ** 2 + BETA)
    return Wm, Wc

def ukf(zs):
    x = np.array([1.0, 0.0, 0.0, OMEGA * R_TRUE])
    P = np.eye(4) * 0.1
    Wm, Wc = ukf_weights()
    out = []
    for z in zs:
        # predict
        sig = sigma_points(x, P)
        sig_f = np.array([f_ekf(p) for p in sig])
        x = np.sum(Wm[:, None] * sig_f, axis=0)
        P = np.zeros((4, 4))
        for i in range(2 * N_STATE + 1):
            d = (sig_f[i] - x).reshape(-1, 1)
            P += Wc[i] * d @ d.T
        P += Q
        # update
        sig = sigma_points(x, P)
        sig_h = np.array([h_obs(p) for p in sig])
        zp = np.sum(Wm[:, None] * sig_h, axis=0)
        Pzz = np.zeros((2, 2)); Pxz = np.zeros((4, 2))
        for i in range(2 * N_STATE + 1):
            dz = (sig_h[i] - zp).reshape(-1, 1)
            dx = (sig[i] - x).reshape(-1, 1)
            Pzz += Wc[i] * dz @ dz.T
            Pxz += Wc[i] * dx @ dz.T
        Pzz += R_OBS
        K = Pxz @ np.linalg.inv(Pzz)
        x = x + K @ (z - zp)
        P = P - K @ Pzz @ K.T
        out.append(x.copy())
    return np.array(out)

# ---------- 生成观测 ----------
truth = true_trajectory()
zs = np.array([h_obs(truth[k]) + np.random.multivariate_normal(np.zeros(2), R_OBS)
               for k in range(N)])

ekf_traj = ekf(zs)
ukf_traj = ukf(zs)

ekf_err = np.linalg.norm(ekf_traj[:, :2] - truth[:, :2], axis=1)
ukf_err = np.linalg.norm(ukf_traj[:, :2] - truth[:, :2], axis=1)

# ---------- 轨迹图 ----------
fig, ax = plt.subplots(figsize=(6, 6))
ax.plot(truth[:, 0], truth[:, 1], 'k-', label='True')
ax.plot(ekf_traj[:, 0], ekf_traj[:, 1], 'r--', label='EKF')
ax.plot(ukf_traj[:, 0], ukf_traj[:, 1], 'b-.', label='UKF')
ax.set_xlabel('px'); ax.set_ylabel('py')
ax.set_title('UKF vs EKF: Constant Turn Model')
ax.legend(); ax.axis('equal'); ax.grid(True)
fig.tight_layout()
fig.savefig('images/ukf_trajectory.svg'); fig.savefig('images/ukf_trajectory.png')
plt.close(fig)

# ---------- 误差图 ----------
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(ekf_err, 'r-', label='EKF')
ax.plot(ukf_err, 'b-', label='UKF')
ax.set_xlabel('k'); ax.set_ylabel('Position Error')
ax.set_title('Position Error: UKF vs EKF')
ax.legend(); ax.grid(True)
fig.tight_layout()
fig.savefig('images/ukf_error.svg'); fig.savefig('images/ukf_error.png')
plt.close(fig)

# ---------- GIF:EKF 和 UKF 轨迹逐步展开 ----------
fig, ax = plt.subplots(figsize=(6, 6))
ax.set_xlim(-1.5, 1.5); ax.set_ylim(-1.5, 1.5)
ax.set_xlabel('px'); ax.set_ylabel('py')
ax.set_title('UKF vs EKF: Constant Turn Model')
ax.grid(True)
ln_t, = ax.plot([], [], 'k-', label='True')
ln_e, = ax.plot([], [], 'r--', label='EKF')
ln_u, = ax.plot([], [], 'b-.', label='UKF')
ax.legend()
def update(i):
    ln_t.set_data(truth[:i, 0], truth[:i, 1])
    ln_e.set_data(ekf_traj[:i, 0], ekf_traj[:i, 1])
    ln_u.set_data(ukf_traj[:i, 0], ukf_traj[:i, 1])
    return ln_t, ln_e, ln_u
ani = animation.FuncAnimation(fig, update, frames=N, interval=50, blit=True)
ani.save('images/滤波06-图1.gif', writer='pillow')
plt.close(fig)

# ---------- CSV ----------
with open('images/ukf_comparison.csv', 'w', newline='') as f:
    w = csv.writer(f)
    w.writerow(['k', 'true_px', 'true_py', 'ekf_px', 'ekf_py', 'ukf_px', 'ukf_py', 'ekf_err', 'ukf_err'])
    for k in range(N):
        w.writerow([k, truth[k, 0], truth[k, 1], ekf_traj[k, 0], ekf_traj[k, 1],
                    ukf_traj[k, 0], ukf_traj[k, 1], ekf_err[k], ukf_err[k]])

演化与局限

UKF 相比 EKF 是显著进步:不需要计算雅可比矩阵、精度更高、可处理不可微函数。但 EKF 和 UKF 共享一个根本假设——后验分布可以用高斯分布近似表示。当系统的噪声严重非高斯(如多峰分布、重尾分布),或当后验分布本身高度非高斯时,这个假设导致两者都会失效。

要彻底放弃高斯假设,需要用大量随机样本直接表示任意分布——不假设任何参数形式。这就是粒子滤波,下一篇也是最后一篇的主题。

参考文献

[1] arXiv:1903.09247 — The Hitchhiker's Guide to Nonlinear Filtering.

[2] arXiv:2201.08180 — Sequential Bayesian Inference for Uncertain Nonlinear Dynamic Systems: A Tutorial.

Category
Tagcloud
Junck Kalman Tool Science Moving Average Exponential Smoothing UKF Nonlinear Filtering FckZhiHu Wiener Filter Code GIS Probability Code Generation Bayesian Story Architecture Non-Gaussian Nonlinear Hackintosh Hadoop Matplotlib OSX-KVM Optimal Estimation History Lens Cellular Automata Translate Unscented Transform Discuss Sequential Monte Carlo Bayesian Estimation Photo OpenWebUI Time Series Windows AI PVE Computability State Space Algorithm QGIS Kalman Filter Photography CUDA Signal Processing Pyenv RX590 PHD C Radio Communicate Prerequisites Ubuntu Sigma Point Simulation LLM Tools Geology Ollama Visualization GlumPy Programming University Scholar Particle Filter ChromeBook Hack RTL-SDR Kivy Guide Qwen3 Camera Math EKF Turing LlamaFactory AMD Jacobian Book Data Game Ventoy Hardware VirtualMachine Mac Frequency Domain Memory Computer GPT-OSS Learning Linux Filtering ML Python Life NumPy Mathematical Modeling Poem Microscope