Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak house price regression interpretation tutorial #1514

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions tutorials/House_Prices_Regression_Interpret.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to apply `Captum` library on a regression model and understand important features, layers / neurons that contribute to the prediction. It compares a number of attribution algorithms from `Captum` library for a simple DNN model trained on a sub-sample of a well-known California house prices dataset.\n",
"This notebook demonstrates how to apply the `Captum` library on a regression model and understand important features, layers / neurons that contribute to the prediction. It compares a number of attribution algorithms from `Captum` library for a simple DNN model trained on a sub-sample of a well-known California house prices dataset.\n",
"\n",
"Note that in order to be able to run this notebook successfully you need to install scikit-learn package in advance.\n"
"Note that in order to be able to run this notebook successfully you need to install matplotlib and scikit-learn packages in advance.\n"
]
},
{
Expand All @@ -29,7 +29,7 @@
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"#scikit-learn related imports\n",
"# scikit-learn related imports\n",
"import sklearn\n",
"from sklearn.datasets import fetch_california_housing\n",
"from sklearn.datasets import fetch_openml\n",
Expand Down Expand Up @@ -90,7 +90,7 @@
"(a block group typically has a population of 600 to 3,000 people).\n",
"\"\"\"\n",
"\n",
"#take first n examples for speed up\n",
"# take first n examples for speed up\n",
"n = 600\n",
"X = california.data[:n]\n",
"y = california.target[:n]\n"
Expand Down Expand Up @@ -140,7 +140,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize dependent variable vs each independent variable in a separate plot. Apart from that we will also perform a simple regression analysis and plot the fitted line in dashed, red color."
"Let's visualize the dependent variable vs each independent variable in a separate plot. Apart from that we will also perform a simple regression analysis and plot the fitted line in dashed, red color."
]
},
{
Expand Down Expand Up @@ -179,7 +179,7 @@
"metadata": {},
"source": [
"From the diagram above we can tell that some of the most influential features that are correlated with the output average house value are: \n",
" - MedInc, median income in block group\n",
" - MedInc, median income in block group.\n",
" If MedInc increases the house value increases too.\n",
" - AveRooms, average number of rooms per household.\n",
" This variable is positively correlated with the house value. The higher the average number of rooms per household the higher the average value of the house. "
Expand Down Expand Up @@ -219,7 +219,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining default hyper parameters for the model.\n"
"Defining default hyperparameters for the model.\n"
]
},
{
Expand Down Expand Up @@ -321,7 +321,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining the training function that contains the training loop and uses RMSprop and given input hyper-parameters to train the model defined in the cell above."
"Defining the training function that contains the training loop and uses RMSprop and given input hyperparameters to train the model defined in the cell above."
]
},
{
Expand Down Expand Up @@ -359,7 +359,7 @@
"source": [
"If the model was previously trained and stored, we load that pre-trained model, otherwise, we train a new model and store it for future uses.\n",
"\n",
"Models can found here: https://github.com/pytorch/captum/tree/master/tutorials/models"
"Models can found here: https://github.com/pytorch/captum/tree/master/tutorials/models."
]
},
{
Expand Down Expand Up @@ -412,7 +412,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's perform a simple sanity check and compute the performance of the model using Root Squared Mean Error (RSME) metric."
"Let's perform a simple sanity check and compute the performance of the model using Root Mean Square Error (RMSE) metric."
]
},
{
Expand Down Expand Up @@ -447,7 +447,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's compute the attributions with respect to the inputs of the model using different attribution algorithms from core `Captum` library and visualize those attributions. We use test dataset defined in the cells above for this purpose.\n",
"Let's compute the attributions with respect to the inputs of the model using different attribution algorithms from core `Captum` library and visualize those attributions. We use the test dataset defined in the cells above for this purpose.\n",
"\n",
"We use mainly default settings, such as default baselines, number of steps etc., for all algorithms, however you are welcome to play with the settings. For GradientSHAP specifically we use the entire training dataset as the distribution of baselines.\n",
"\n",
Expand Down Expand Up @@ -485,7 +485,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's visualize attribution scores with respect to inputs (using test dataset) for our simple model in one plot. This will help us to understand how similar or different the attribution scores assigned from different attribution algorithms are. Apart from that we will also compare attribution scores with the learned model weights.\n",
"Now let's visualize attribution scores with respect to inputs (using the test dataset) for our simple model in one plot. This will help us to understand how similar or different the attribution scores assigned from different attribution algorithms are. Apart from that we will also compare attribution scores with the learned model weights.\n",
"\n",
"It is important to note the we aggregate the attributions across the entire test dataset in order to retain a global view of feature importance. This, however, is not ideal since the attributions can cancel out each other when we aggregate then across multiple samples."
]
Expand Down Expand Up @@ -569,11 +569,11 @@
"\n",
"From the plot above we can see that attribution algorithms sometimes disagree on assigning importance scores and that they are not always aligned with weights. However, we can still observe that two of the top important features: `MedInc`, and `AveRooms` are also considered to be important based on both most attribution algorithms and the weight scores.\n",
"\n",
"It is interesting to observe that the feature `Population` has high positive attribution score based on some of the attribution algorithms. This can be related, for example, to the choice of the baseline. In this tutorial we use zero-valued baselines for all features, however if we were to choose those values more carefully for each feature the picture will change. Similar arguments apply also when the signs of the weights and attributions mismatches or when one algorithm assigns higher or lower attribution scores compare to the others.\n",
"It is interesting to observe that the feature `Population` has high positive attribution score based on some of the attribution algorithms. This can be related, for example, to the choice of the baseline. In this tutorial we use zero-valued baselines for all features, however if we were to choose those values more carefully for each feature the picture will change. Similar arguments apply also when the signs of the weights and attributions mismatch or when one algorithm assigns higher or lower attribution scores compare to the others.\n",
"\n",
"In terms of least important features, we observe that `AveBedrms` and `AveOccup` are voted to be least important both based on most attribution algorithms and learned coefficients.\n",
"\n",
"Another interesting observation is that both Integrated Gradients and DeepLift return similar attribution scores across all features. This is associated with the fact that although we have non-linearities in our model, their effects aren't significant and DeepLift is close to `(input - baselines) * gradients`. And because the gradients do not change significantly along the straight line from baseline to input, we observe similar situation with Integrated Gradients as well.\n",
"Another interesting observation is that both Integrated Gradients and DeepLift return similar attribution scores across all features. This is associated with the fact that although we have non-linearities in our model, their effects aren't significant and DeepLift is close to `(input - baselines) * gradients`. Because the gradients do not change significantly along the straight line from baseline to input, we observe similar situation with Integrated Gradients as well.\n",
"\n",
"We also note that GradientShap behaves differently than the other methods for this data and model. Whereas the other methods in this tutorial are calculated on test inputs and a reference baseline of zero, GradientShap is calculated with a baseline of the training distribution which might be the cause of the behavior observed."
]
Expand All @@ -589,7 +589,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's beside attributing to the inputs of the model, also attribute to the layers of the model and understand which neurons appear to be more important.\n",
"Beside attributing to the inputs of the model, now let's also attribute to the layers of the model and understand which neurons appear to be more important.\n",
"\n",
"In the cell below we will attribute to the inputs of the second linear layer of our model. Similar to the previous case, the attribution is performed on the test dataset."
]
Expand Down