This project provides a comprehensive framework for processing, visualizing, and modeling accelerometer and gyroscope data from fitness trackers. The goal is to create machine learning models that can classify barbell exercises and count repetitions.
- Introduction
- Data Conversion and Cleaning
- Data Visualization
- Outlier Detection
- Feature Engineering
- Predictive Modeling
- Repetition Counting Algorithm
- Final Results
- Interesting Techniques Used
- Libraries and Technologies
- Project Structure
- Key Directories
- External Libraries
This project aims to analyze sensor data collected from a MetaMotion fitness tracker to classify different barbell exercises and accurately count repetitions. It falls under the realm of the quantified self, leveraging machine learning techniques to interpret wearable device data.
Raw data from the MetaMotion sensor is converted and read from CSV files. The data is split and cleaned to prepare it for analysis, ensuring that it is in a usable format for subsequent steps.
Time series data from the accelerometer and gyroscope are visualized to understand patterns and behaviors. Plotting this data helps in identifying trends and anomalies that may affect model performance.
Outliers in the dataset are detected and handled using techniques like Chauvenet’s Criterion and the Local Outlier Factor. This step is crucial to improve the quality of the data and the reliability of the models.
- Chauvenet’s Criterion: A statistical method to identify and remove outliers from normally distributed data.
- Local Outlier Factor: An algorithm to find anomalies in data based on density deviations.
Features are engineered from the raw data to enhance model performance. Techniques include:
- Frequency Analysis: Using Fast Fourier Transform (FFT) to move data to the frequency domain.
- Low Pass Filter: Applying filters to remove noise and smooth the data.
- Principal Component Analysis (PCA): Reducing data dimensionality while retaining significant variance.
- Clustering: Grouping similar data points to identify patterns.
Multiple machine learning models are trained and evaluated:
- Naive Bayes Classifier: A probabilistic model based on Bayes' theorem with strong independence assumptions.
- Support Vector Machines (SVMs): Models that find the optimal hyperplane for classification tasks.
- Random Forest: An ensemble learning method using multiple decision trees.
- Neural Networks: Models inspired by the human brain structure, capable of capturing complex patterns.
- K-Nearest Neighbors (KNN): A non-parametric method used for classification based on feature similarity.
- Decision Trees: Models that make decisions based on feature values, forming a tree-like structure.
A custom algorithm is developed to accurately count exercise repetitions. This involves analyzing the processed sensor data to detect cycles corresponding to individual repetitions.
The final results of the classification models are presented using a confusion matrix, which visually represents the performance by showing the correctly and incorrectly classified instances for each exercise category. This allows for a comprehensive evaluation of the model's accuracy and the identification of any misclassifications.
-
Fast Fourier Transform (FFT): Utilized in the
FrequencyAbstraction.py
module for frequency domain analysis. FFT helps identify dominant frequencies in sensor data. -
Principal Component Analysis (PCA): Implemented in the
DataTransformation.py
module for dimensionality reduction. PCA reduces data complexity while retaining significant features. -
Rolling Window Calculations: Used in the
TemporalAbstraction.py
module to compute statistical aggregations over time windows, capturing temporal patterns. -
Forward Feature Selection: Employed in the
LearningAlgorithms.py
module to iteratively select the most significant features for classification.
- NumPy: For numerical computations and array operations.
- Pandas: For data manipulation and analysis.
- scikit-learn: Machine learning library providing algorithms like SVM, Decision Trees, and PCA.
- Matplotlib: Used for data visualization during exploratory analysis.
- SciPy: For scientific and technical computing tasks.
├── README.md # Project documentation
├── confusion_matrix.png # Confusion matrix image
├── data
│ ├── external # Data from third-party sources
│ ├── interim # Intermediate transformed data
│ ├── processed # Final datasets ready for modeling
│ └── raw # Original, unprocessed data
├── docs # Documentation for the project
├── models # Trained models and model outputs
├── notebooks # Jupyter notebooks for experimentation
├── references # Manuals and explanatory materials
├── reports
│ └── figures # Generated graphics and figures
├── src # Source code for the project
│ ├── __init__.py # Initializes the Python module
│ ├── data # Scripts to download or generate data
│ ├── features # Feature extraction scripts
│ │ ├── FrequencyAbstraction.py # Frequency domain features
│ │ ├── DataTransformation.py # Data normalization and PCA
│ │ ├── TemporalAbstraction.py # Temporal feature abstraction
│ │ └── remove_outliers.py # Outlier detection and removal
│ ├── models # Machine learning models and algorithms
│ │ └── LearningAlgorithms.py # Classification algorithms
│ └── visualization # Data visualization scripts
├── LICENSE # License information
└── setup.py # Setup script for package installation
-
src/features
: Contains modules for feature extraction and data preprocessing, including frequency abstraction, PCA, temporal abstraction, and outlier removal. -
src/models
: Includes machine learning algorithms for classification, implementing techniques like Support Vector Machines, Neural Networks, K-Nearest Neighbors, and Decision Trees. -
notebooks
: Jupyter notebooks for exploratory data analysis and experimentation.
- NumPy: https://numpy.org/
- Pandas: https://pandas.pydata.org/
- scikit-learn: https://scikit-learn.org/
- Matplotlib: https://matplotlib.org/
- SciPy: https://www.scipy.org/