The Business Intelligence E-Commerce Project is designed to process, analyze, and extract valuable insights from e-commerce data. This project covers data cleaning, transformation, and analysis, culminating in a star schema data warehouse. The insights support decision-making processes for optimizing business operations.
business-intelligence-ecommerce
├──data
│ ├──processed
│ │ ├──cleaned_events_chunks
│ │ │ ├──cleaned_events_chunk_1.csv
│ │ │ ├──cleaned_events_chunk_2.csv
│ │ │ ├──cleaned_events_chunk_3.csv
│ │ │ └──cleaned_events_chunk_4.csv
│ │ ├──cleaned_distribution_centers.csv
│ │ ├──cleaned_inventory_items.csv
│ │ ├──cleaned_order_items.csv
│ │ ├──cleaned_orders.csv
│ │ ├──cleaned_products.csv
│ │ └──cleaned_users.csv
│ └──raw
│ │ ├──events_chunks
│ │ │ ├──events_1.csv
│ │ │ ├──events_2.csv
│ │ │ ├──events_3.csv
│ │ │ └──events_4.csv
│ │ ├──distribution_centers.csv
│ │ ├──inventory_items.csv
│ │ ├──order_items.csv
│ │ ├──orders.csv
│ │ ├──products.csv
│ │ └──users.csv
├──src
│ ├──analysis
│ │ ├──association.py
│ │ ├──clustering.py
│ │ ├──Elbow_Silhouette.png
│ │ ├──kmeans.png
│ │ └──user_data_with_clusters.csv
│ ├──ETL
│ │ ├──insights.ipynb
│ │ ├──pipeline.ipynb
│ │ ├──visualisation.pbix
│ │ └──visualisations.ipynb
│ └──Warehouse
│ │ ├──schema.pdf
│ │ └──schema.sql
├──README.md
└──requirements.txt
-
Data Processing:
- Raw data stored in
data/raw/
. - Cleaned and transformed data stored in
data/processed/
.
- Raw data stored in
-
ETL Pipeline:
- Found in
src/ETL/pipeline.ipynb
. - Automates data extraction, transformation, and loading into a warehouse-ready format.
- Found in
-
Data Warehouse Schema:
- Star schema SQL scripts available in
src/Warehouse/schema.sql
.
- Star schema SQL scripts available in
-
Analysis:
- Clustering analysis performed in
src/analysis/clustering.py
. - Data visualizations in
src/ETL/visualisations.ipynb
.
- Clustering analysis performed in
The project depends on the following Python packages:
pandas
numpy
matplotlib
seaborn
geopandas
shapely
Install all required packages using:
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/Albaforce/business-intelligence-ecommerce.git cd business-intelligence-ecommerce
-
Install dependencies:
pip install -r requirements.txt
-
Execute the ETL pipeline:
- Open
src/ETL/pipeline.ipynb
in Jupyter Notebook and run all cells.
- Open
-
Run the analysis scripts:
- For clustering analysis, execute
src/analysis/clustering.py
.
- For clustering analysis, execute
-
Access the data warehouse schema:
- SQL scripts for creating and populating the star schema are available in
src/Warehouse/schema.sql
.
- SQL scripts for creating and populating the star schema are available in
The data directory includes:
- Raw Data: Unprocessed CSV files in
data/raw/
. - Processed Data: Cleaned and transformed datasets in
data/processed/
.
- Fork the repository and create a new branch for your feature or bug fix.
- Follow proper coding standards and comment your code.
- Submit a pull request with detailed explanations.
- Inspiration and support from the open-source community.
- Datasets used for analysis sourced from simulated e-commerce activities.
- dataset link : https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset?select=users.csv