- AYUSH KUMAR AGARWAL (202151037)
- PARMAR HET (202151103)
- PATEL SHUBH UMESHKUMAR (202151111)
- POPAT RITIK MANISH (202151114)
- RAJ JIGNESHKUMAR SHAH (202151127)
- JENNY KAMLESHBHAI BHUT (202152314)
This project leverages Software-Defined Networking (SDN) for traffic classification in 5G networks using supervised learning and unsupervised learning techniques. By analyzing traffic patterns and characteristics, it aims to enhance:
- 🛠 Network Efficiency
- 🔒 Security
- ⚡ Resource Allocation
Follow these steps to set up and run the project locally:
First, clone the repository to your local machine:
git clone https://github.com/IIITV-5G-and-Edge-Computing-Activity/SDN-Traffic-Classification.git
cd SDN-Traffic-Classification
pip install -r requirements.txt
-
🧹 Dataset Preparation:
- Cleaned and encoded the data.
- Split into features (independent variables) and labels (dependent variables).
- Used
train_test_split
for a 70-30 split between training and testing data.
-
⚙️ Model Initialization:
- Chose Logistic Regression for its simplicity and effectiveness in multiclass problems.
- Configured parameters:
- Penalty: L2 regularization.
- Solver: Liblinear.
-
📊 Training Process:
- Trained the model using the one-vs-rest approach to handle multiclass classification.
-
🔄 Cross-Validation:
- Performed 5-fold cross-validation for robust evaluation.
- Analyzed metrics across folds to detect overfitting or underfitting.
-
📈 Evaluation Metrics:
- Evaluated using Accuracy, Precision, Recall, and F1 Score for each traffic class.
- Generated a confusion matrix to highlight misclassifications.
-
📏 Dataset Standardization:
- Used
StandardScaler
to standardize feature values.
- Used
-
🛠 Clustering Initialization:
- Explored multiple cluster numbers (
k
) to uncover the data structure. - Used
random_state
for reproducibility.
- Explored multiple cluster numbers (
-
📉 Optimal Clusters:
- Applied the Elbow Method to find the ideal
k
.
- Applied the Elbow Method to find the ideal
-
🔗 Cluster Formation:
- Grouped data into clusters and assigned labels.
-
✔️ Cluster Validation:
- Evaluated clustering with the Silhouette Score.
-
🌈 Visualization:
- Reduced dimensions using Principal Component Analysis (PCA) for 2D scatter plots.
- Confusion Matrix Highlights:
- 🟢 Ping Traffic: 569 correctly classified.
- 🟢 Voice Traffic: 885 correctly classified (highest accuracy).
- 🟢 DNS Traffic: 574 correct, minor confusion observed.
- 🟢 Telnet Traffic: 571 correctly identified samples.
- 🟡 Misclassifications: Minimal errors, primarily between closely related traffic types (e.g., Telnet and DNS).
- Optimal clusters determined using the Elbow Method.
- Silhouette Scores indicated strong cluster separation.
- PCA Visualization: Clearly illustrated data separation.
- 🐍 Python: Core programming language.
- 📚 Libraries: NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn, TensorFlow.
- 🖧 SDN Simulation: Mininet and OpenFlow.
- 🔍 Dimensionality Reduction: PCA.