Deep Learning project for Trajectory Prediction using nuScenes dataset. DIAG, Sapienza University in Roma
Co-author:
The dataset folder link is https://drive.google.com/drive/folders/118Z18sWEg4CqHAhFcYmDqXvDj2qk4YtI?usp=sharing , check "Add shortcut to Drive" and then you can import it.
The main file is "DL_Project.ipynb", download and run it directly on your notebook or colab. Other files are supplementary to the main file.
Trajectory prediction is the problem of predicting the short-term (1-3 seconds) and long-term (3-5 seconds) spatial coordinates of various road-agents such as cars, buses, pedestrians, rickshaws, and animals, etc. These road-agents have different dynamic behaviors that may correspond to aggressive or conservative driving styles.
For nuScenes dataset, that requesting 6 second predictions at 2 Hz, that means we have to predict 12 points as predicted trajectory.
nuScenes dataset is huge , we only use part of it, only using instance position x, y, velocity v, acceleration a, and head rating r.
The rough dataset structure be like: [frame_id, instance_token, x, y, z, v, a, r].
In this step, the order of the dataset is in chronological order of the scene, but we only need continuous chronological data for each instance as the continuous trajectory of the instance.
So the next step is getting unique instance ids and using those ids to reorganize the order of trajectory points of each instance.
The final dataset is arranged in the chronological order of the instances, that is, in the order of the trajectories of the instances.
We have tried many different architecture of models include baselines and our modified architecture , then compared their performance respectively. And our final modified model has the best performance.
- First we tested a model that used basic LSTM model, which use incremental prediction, in the iteration, we predict the next point (x, y) of the trajectory in turn, and then use (x, y) as the training trajectory point to add to the sequence.
- As we know, this basic method have a big flaw when we have predict a long frames squence, because it's have a accumulated error(This is called naive forecasting). So we want to get the whole predicted position sequence at once as output throughout Linear layer.
- Code Link
- This is a plain basline which has accumlated error with iteration prediction. The following models are all modified architectures except this.
How it predicts.
After changing the predict way.
- Code Link
- Add a CNN layer instead of LSTM to extract the feature of sequence.
- Same as 5.3, increase the effect of CNN on loss.
- Code Link
- Keep the LSTM to extract the time sequence information.
- In order to crop the data into the same shape and expand the training set data,we divide the single trajectory into several samples using sliding window.
- We chose the five features contain target x, y, speed, acceleration and heading rate as input feed to the model, and get the predicted (x, y) as output to propagate loss.
- avg_ade: 1.152
- avg_fde: 2.143
- avg_missRate: 0.762
In the final section, we compared the metrics from 7 different models.
Model | LSTM+LinearX,LinearY | LSTM+Linear+LinearX,LinearY | LSTM+LinearXY | Conv1d+Linear+LinearX,LinearY | Conc1d+LinearXY | Conv1d+LSTM+Linear+LinearX,LinearY | Conv1d+LSTM+LinearXY |
---|---|---|---|---|---|---|---|
Average ADE | 4.03 | 1.35 | 1.35 | 1.32 | 1.32 | 1.16 | 1.15 |
Average FDE | 7.34 | 2.59 | 2.59 | 2.42 | 2.74 | 2.11 | 2.14 |
Miss Rate | 66.61% | 74.96% | 75.12% | 74.10% | 74.39% | 76.19% | 76.27% |