Skip to content

Commit 0b38856

Browse files
MNT fix same y_init used for different sessions (#37)
* MNT fix same y_init used for different sessions * DOC add example of msa NARX
1 parent 711673e commit 0b38856

10 files changed

+1082
-1182
lines changed

doc/multioutput.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ multioutput data may result in better accuracy in the following classification t
1919
than applying it directly to the original single-label data. See Figure 5 in [2]_.
2020

2121
Relationship on multiclass data
22-
-------------------------------
22+
===============================
2323
Assume the feature matrix is :math:`X \in \mathbb{R}^{N\times n}`, the multiclass
2424
target vector is :math:`y \in \mathbb{R}^{N\times 1}`, and the one-hot encoded target
2525
matrix is :math:`Y \in \mathbb{R}^{N\times m}`. Then, the Fisher's criterion for
@@ -54,4 +54,4 @@ coefficient is one-to-one correspondence to each Fisher's criterion.
5454
.. rubric:: Examples
5555

5656
* See :ref:`sphx_glr_auto_examples_plot_fisher.py` for an example of
57-
the equivalent relationship between CCA and LDA on multiclass data.
57+
the equivalent relationship between CCA and LDA on multiclass data.

doc/narx.rst

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
.. currentmodule:: fastcan.narx
2+
3+
.. _narx:
4+
5+
=======================================================
6+
Reduced polynomial NARX model for system identification
7+
=======================================================
8+
9+
:class:`NARX` is a class to build a reduced polynomial NARX (Nonlinear AutoRegressive eXogenous) model for system identification.
10+
The structure of a polynomial NARX model is shown below:
11+
12+
.. math::
13+
y(k) = 0.5y(k-1) + 0.3u_0(k)^2 + 2u_0(k-1)u_0(k-3) + 1.5u_0(k-2)u_1(k-3) + 1
14+
15+
where :math:`k` is the time index, :math:`u_0` and :math:`u_1` are input signals, and :math:`y` is the output signal.
16+
17+
18+
The NARX model is composed of three parts:
19+
20+
#. Terms: :math:`y(k-1)`, :math:`u_0(k)^2`, :math:`u_0(k-1)u_0(k-3)`, and :math:`u_0(k-2)u_1(k-3)`
21+
#. Coefficients: 0.5, 0.3, 2, and 1.5
22+
#. Intercept: 1
23+
24+
25+
To build a reduced polynomial NARX model, we normally need the following steps:
26+
27+
#. Build term library
28+
#. Get time shifted variables: e.g., :math:`u(t-2)` and :math:`y(t-1)`
29+
#. Get terms by nonlinearising the time shifted variables with polynomial basis, e.g., :math:`u_0(k-1)u_0(k-3)` and :math:`u_0(k-2)u_1(k-3)`
30+
#. Search useful terms by feature selection, i.e., `Reduced Order Modelling (ROM) <https://www.mathworks.com/discovery/reduced-order-modeling.html>`_
31+
#. Learn the coefficients by least-squares
32+
33+
.. rubric:: Examples
34+
35+
* See :ref:`sphx_glr_auto_examples_plot_narx.py` for an example of the basic usage of the NARX model.
36+
37+
Multiple time series and missing values
38+
=======================================
39+
40+
For time series modelling, the continuity of the data is important.
41+
The continuity means all neighbouring samples have the same time interval.
42+
A discontinuous time series can be treated as multiple pieces of continuous time series.
43+
The discontinuity can be caused by
44+
45+
#. Different measurement sessions, e.g., one measurement session is carried out at Monday and
46+
lasts for one hour at 1 Hz, and another measurement is carried out at Tuesday and lasts for two hours at 1 Hz
47+
#. Missing values
48+
49+
No matter how the discontinuity is caused, :class:`NARX` treats the discontinuous time series as multiple pieces of continuous time series in the same way.
50+
51+
.. note::
52+
It is easy to notify :class:`NARX` that the time series data are from different measurement sessions by inserting np.nan to break the data. For example,
53+
54+
>>> import numpy as np
55+
>>> x0 = np.zeros((3, 2)) # First measurement session has 3 samples with 2 features
56+
>>> x1 = np.ones((5, 2)) # Second measurement session has 5 samples with 2 features
57+
>>> u = np.r_[x0, [[np.nan, np.nan]], x1] # Insert np.nan to break the two measurement sessions
58+
59+
It is important to break the different measurement sessions by np.nan, because otherwise,
60+
the model will assume the time interval between the two measurement sessions is the same as the time interval within a session.
61+
62+
63+
One-step-ahead and multiple-step-ahead prediction
64+
=================================================
65+
66+
In system identification, two types of predictions can be made by NARX models:
67+
68+
#. One-step-ahead prediction: predict the output at the next time step given the measured input and the measured output at the past time steps, e.g., :math:`\hat{y}(k) = 0.5y(k-1) + 0.3u_0(k-1)^2`, where :math:`\hat{y}(k)` is the predicted output at time step :math:`k`
69+
#. Multiple-step-ahead prediction: predict the output at the next time step given the measured input and the predicted output at the past time steps, e.g., :math:`\hat{y}(k) = 0.5\hat{y}(k-1) + 0.3u_0(k-1)^2`
70+
71+
Obviously, the multiple-step-ahead prediction is more challenging, because the predicted output is used as the input to predict the output at the next step, which may accumulate the error.
72+
The `prediction` method of :class:`NARX` gives the multiple-step-ahead prediction.
73+
74+
It should also be noted the different types of predictions in model training.
75+
76+
#. If assume the NARX model is one-step-ahead prediction structure, we know all input data for training in advance,
77+
and the model can be trained by the simple ordinary least-squares (OLS) method
78+
#. If assume the NARX model is a multiple-step-ahead prediction structure, the input data, like :math:`\hat{y}(k-1)` is
79+
unknown in advance. Therefore, the training data must first be generated by the multiple-step-ahead prediction with
80+
the initial model coefficients, and then the coefficients can be updated recursively.
81+
82+
ARX and OE model
83+
----------------
84+
85+
To better understant the two types of training, it is helpful to know two linear time series model structures,
86+
i.e., `ARX (AutoRegressive eXogenous) model <https://www.mathworks.com/help/ident/ref/arx.html>`_ and
87+
`OE (output error) model <https://www.mathworks.com/help/ident/ref/oe.html>`_.
88+
89+
The ARX model structure is
90+
91+
.. math::
92+
(1-A(z))y(t) = B(z)u(t) + e(t)
93+
94+
where :math:`A(z)` and :math:`B(z)` are polynomials of :math:`z`, e.g.,
95+
:math:`A(z) = 1.5 z^{-1} - 0.7 z^{-2}`, :math:`z` is a complex number from `Z-transformation <https://en.wikipedia.org/wiki/Z-transform>`_
96+
and :math:`z^{-1}` can be view as a time-shift operator, and :math:`e(t)` is the white noise.
97+
98+
The OE model structure is
99+
100+
.. math::
101+
y(t) = \frac{B(z)}{(1-A(z))} u(t) + e(t)
102+
103+
104+
The ARX model and OE model is very similar. If we rewrite ARX model to :math:`y(t) = \frac{B(z)}{(1-A(z))} u(t) + \frac{1}{(1-A(z))} e(t)`,
105+
we can see that both the ARX model and the OE model have the same transfer function :math:`\frac{B(z)}{(1-A(z))}`.
106+
The two key differences are:
107+
108+
#. noise assumption
109+
#. `best linear unbiased estimator (BLUE) <https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem>`_
110+
111+
For noise assumption, the noise of the ARX model passes through a filter :math:`\frac{1}{(1-A(z))}` while the noise of the OE model not.
112+
The structure of ARX models implies that the measurement noise is a very special colored noise, which is not common in practice.
113+
114+
For difference in BLUE, to get BLUE of models, we need to rewrite models to the form of
115+
116+
.. math::
117+
e(t) = y(t) - \hat{y}(t)
118+
119+
where :math:`\hat{y}(t)` is a predictor, which describes how to compute the predicted output to make the error white.
120+
It is easy to find that for a ARX model
121+
122+
.. math::
123+
\hat{y}(t) = A(z)y(t) + B(z)u(t)
124+
125+
while for an OE model
126+
127+
.. math::
128+
\hat{y}(t) = \frac{B(z)}{(1-A(z))} u(t) = A(z)\hat{y}(t) + B(z)u(t)
129+
130+
The difference between the two predictors is the one for ARX models uses both measured input and output to make prediction,
131+
while the one for OE models only uses measured input to make prediction.
132+
In other words, the predictor of the BLUE for ARX computes one-step-ahead prediction,
133+
while the predictor of the BLUE for OE computes multiple-step-ahead prediction.
134+
135+
In summary, using multiple-step-ahead prediction in model training is more aligned with the real noise condition, but harder to train.
136+
However, using one-step-ahead prediction in model training is easier to train, but the noise assumption is too strong to be true.
137+
The `fit` method of :class:`NARX` uses the one-step-ahead prediction for training by default,
138+
but the multiple-step-ahead prediction can be used by setting the parameter `coef_init`.
139+
140+
.. rubric:: Examples
141+
142+
* See :ref:`sphx_glr_auto_examples_plot_narx_msa.py` for an illustration of the multiple-step-ahead NARX model.

doc/user_guide.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ User Guide
1111
unsupervised.rst
1212
multioutput.rst
1313
redundancy.rst
14-
ols_and_omp.rst
14+
ols_and_omp.rst
15+
narx.rst

examples/plot_narx.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
# %%
4545
# Build term libriary
4646
# -------------------
47-
# To build a polynomial NARX model, it is normally have two steps:
47+
# To build a reduced polynomial NARX model, it is normally have two steps:
4848
#
4949
# #. Search the structure of the model, i.e., the terms in the model, e.g.,
5050
# :math:`u_0(k-1)u_0(k-3)`, :math:`u_0(k-2)u_1(k-3)`, etc.
@@ -115,7 +115,7 @@
115115
# %%
116116
# Build NARX model
117117
# ----------------
118-
# As the polynomical NARX is a linear function of the nonlinear tems,
118+
# As the reduced polynomial NARX is a linear function of the nonlinear tems,
119119
# the coefficient of each term can be easily estimated by oridnary least squares.
120120
# In the printed NARX model, it is found that :class:`FastCan` selects the correct
121121
# terms and the coefficients are close to the true values.

examples/plot_narx_msa.py

+182
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
"""
2+
===========================
3+
Multi-step-ahead NARX model
4+
===========================
5+
6+
.. currentmodule:: fastcan
7+
8+
In this example, we will compare one-step-ahead NARX and multi-step-ahead NARX.
9+
"""
10+
11+
# Authors: Sikai Zhang
12+
# SPDX-License-Identifier: MIT
13+
14+
# %%
15+
# Nonlinear system
16+
# ----------------
17+
#
18+
# `Duffing equation <https://en.wikipedia.org/wiki/Duffing_equation>` is used to
19+
# generate simulated data. The mathematical model is given by
20+
#
21+
# .. math::
22+
# \ddot{y} + 0.1\dot{y} - y + 0.25y^3 = u
23+
#
24+
# where :math:`y` is the output signal and :math:`u` is the input signal, which is
25+
# :math:`u(t) = 2.5\cos(2\pi t)`.
26+
#
27+
# The phase portraits of the Duffing equation are shown below.
28+
29+
import matplotlib.pyplot as plt
30+
import numpy as np
31+
from scipy.integrate import odeint
32+
33+
34+
def duffing_equation(y, t):
35+
"""Non-autonomous system"""
36+
y1, y2 = y
37+
u = 2.5 * np.cos(2 * np.pi * t)
38+
dydt = [y2, -0.1 * y2 + y1 - 0.25 * y1**3 + u]
39+
return dydt
40+
41+
42+
def auto_duffing_equation(y, t):
43+
"""Autonomous system"""
44+
y1, y2 = y
45+
dydt = [y2, -0.1 * y2 + y1 - 0.25 * y1**3]
46+
return dydt
47+
48+
49+
dur = 10
50+
n_samples = 1000
51+
52+
y0 = None
53+
if y0 is None:
54+
n_init = 10
55+
x0 = np.linspace(0, 2, n_init)
56+
y0_y = np.cos(np.pi * x0)
57+
y0_x = np.sin(np.pi * x0)
58+
y0 = np.c_[y0_x, y0_y]
59+
else:
60+
n_init = len(y0)
61+
62+
t = np.linspace(0, dur, n_samples)
63+
sol = np.zeros((n_init, n_samples, 2))
64+
for i in range(n_init):
65+
sol[i] = odeint(auto_duffing_equation, y0[i], t)
66+
67+
for i in range(n_init):
68+
plt.plot(sol[i, :, 0], sol[i, :, 1], c="tab:blue")
69+
70+
plt.title("Phase portraits of Duffing equation")
71+
plt.xlabel("y(t)")
72+
plt.ylabel("dy/dt(t)")
73+
plt.show()
74+
75+
# %%
76+
# Generate training-test data
77+
# ---------------------------
78+
#
79+
# In the phase portraits, it is shown that the system has two stable equilibria.
80+
# We use one to generate training data and the other to generate test data.
81+
82+
dur = 10
83+
n_samples = 1000
84+
85+
t = np.linspace(0, dur, n_samples)
86+
87+
sol = odeint(duffing_equation, [0.6, 0.8], t)
88+
u_train = 2.5 * np.cos(2 * np.pi * t).reshape(-1, 1)
89+
y_train = sol[:, 0]
90+
91+
sol = odeint(auto_duffing_equation, [0.6, -0.8], t)
92+
u_test = 2.5 * np.cos(2 * np.pi * t).reshape(-1, 1)
93+
y_test = sol[:, 0]
94+
95+
# %%
96+
# One-step-head VS. multi-step-ahead NARX
97+
# ---------------------------------------
98+
#
99+
# First, we use :meth:`make_narx` to obtain the reduced NARX model.
100+
# Then, the NARX model will be fitted with one-step-ahead predictor and
101+
# multi-step-ahead predictor, respectively. Generally, the training of one-step-ahead
102+
# (OSA) NARX is faster, while the multi-step-ahead (MSA) NARX is more accurate.
103+
104+
from sklearn.metrics import r2_score
105+
106+
from fastcan.narx import make_narx
107+
108+
max_delay = 2
109+
110+
narx_model = make_narx(
111+
X=u_train,
112+
y=y_train,
113+
n_features_to_select=10,
114+
max_delay=max_delay,
115+
poly_degree=3,
116+
verbose=0,
117+
)
118+
119+
120+
def plot_prediction(ax, t, y_true, y_pred, title):
121+
ax.plot(t, y_true, label="true")
122+
ax.plot(t, y_pred, label="predicted")
123+
ax.legend()
124+
ax.set_title(f"{title} (R2: {r2_score(y_true, y_pred):.5f})")
125+
ax.set_xlabel("t (s)")
126+
ax.set_ylabel("y(t)")
127+
128+
129+
narx_model.fit(u_train, y_train)
130+
y_train_osa_pred = narx_model.predict(u_train, y_init=y_train[:max_delay])
131+
y_test_osa_pred = narx_model.predict(u_test, y_init=y_test[:max_delay])
132+
133+
narx_model.fit(u_train, y_train, coef_init="one_step_ahead")
134+
y_train_msa_pred = narx_model.predict(u_train, y_init=y_train[:max_delay])
135+
y_test_msa_pred = narx_model.predict(u_test, y_init=y_test[:max_delay])
136+
137+
fig, ax = plt.subplots(2, 2, figsize=(8, 6))
138+
plot_prediction(ax[0, 0], t, y_train, y_train_osa_pred, "OSA NARX on Train")
139+
plot_prediction(ax[0, 1], t, y_train, y_train_msa_pred, "MSA NARX on Train")
140+
plot_prediction(ax[1, 0], t, y_test, y_test_osa_pred, "OSA NARX on Test")
141+
plot_prediction(ax[1, 1], t, y_test, y_test_msa_pred, "MSA NARX on Test")
142+
fig.tight_layout()
143+
plt.show()
144+
145+
146+
# %%
147+
# Multiple measurement sessions
148+
# -----------------------------
149+
#
150+
# The plot above shows that the NARX model cannot capture the dynamics at
151+
# the left equilibrium shown in the phase portraits. To improve the performance, let us
152+
# combine the training and test data for model training to include the dynamics of both
153+
# equilibria. Here, we need to insert `np.nan` to indicate the model that training data
154+
# and test data are from different measurement sessions. The plot shows that the
155+
# prediction performance of the NARX on test data has been largely improved.
156+
157+
u_all = np.r_[u_train, [[np.nan]], u_test]
158+
y_all = np.r_[y_train, [np.nan], y_test]
159+
narx_model = make_narx(
160+
X=u_all,
161+
y=y_all,
162+
n_features_to_select=10,
163+
max_delay=max_delay,
164+
poly_degree=3,
165+
verbose=0,
166+
)
167+
168+
narx_model.fit(u_all, y_all)
169+
y_train_osa_pred = narx_model.predict(u_train, y_init=y_train[:max_delay])
170+
y_test_osa_pred = narx_model.predict(u_test, y_init=y_test[:max_delay])
171+
172+
narx_model.fit(u_all, y_all, coef_init="one_step_ahead")
173+
y_train_msa_pred = narx_model.predict(u_train, y_init=y_train[:max_delay])
174+
y_test_msa_pred = narx_model.predict(u_test, y_init=y_test[:max_delay])
175+
176+
fig, ax = plt.subplots(2, 2, figsize=(8, 6))
177+
plot_prediction(ax[0, 0], t, y_train, y_train_osa_pred, "OSA NARX on Train")
178+
plot_prediction(ax[0, 1], t, y_train, y_train_msa_pred, "MSA NARX on Train")
179+
plot_prediction(ax[1, 0], t, y_test, y_test_osa_pred, "OSA NARX on Test")
180+
plot_prediction(ax[1, 1], t, y_test, y_test_msa_pred, "MSA NARX on Test")
181+
fig.tight_layout()
182+
plt.show()

0 commit comments

Comments
 (0)