This notebook explores the task of predicting house prices in California using machine learning regression models. The dataset used for this task contains various features such as the number of rooms, median income, location, etc., which can be used to train regression models to predict house prices.
The dataset used for this project is the California Housing Prices dataset, which is commonly used in machine learning and statistics literature. The dataset contains features such as median income, housing median age, average rooms, average bedrooms, population, and geographical information (latitude and longitude) for each block group in California.
Link to dataset: California Housing Prices Dataset
- Data Preprocessing: The dataset is preprocessed to handle missing values, outliers, and feature scaling.
- Exploratory Data Analysis (EDA): Exploring the dataset to gain insights into the distribution of features and their relationship with the target variable (house prices).
- Feature Engineering: Creating new features or transforming existing ones to improve model performance.
- Model Selection: Evaluating various regression models such as Linear Regression, Decision Tree Regression, Random Forest Regression, etc., to find the best-performing model for predicting house prices.
- Model Evaluation: Assessing the performance of each model using appropriate evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared, etc.
- Hyperparameter Tuning: Fine-tuning the hyperparameters of selected models to improve their performance.
- Prediction: Using the trained model to make predictions on new data or test set.
- Python 3.x
- Jupyter Notebook
- NumPy
- pandas
- matplotlib
- seaborn
- scikit-learn
- Download the California Housing Prices dataset from the provided link.
- Clone this repository.
- Open the Jupyter Notebook
House_Price_Prediction_California.ipynb
. - Follow the instructions provided in the notebook to execute each cell.
- Analyze the results, evaluate model performance, and make necessary adjustments.
- The California Housing Prices dataset was obtained from Kaggle.
- Inspiration and guidance for this project were drawn from various online tutorials, documentation, and literature on machine learning and regression.
This project is licensed under the MIT License.