feature importance, the incompatible between eli5 and scikit survival #426

mittyone · 2024-01-12T06:36:04Z

mittyone
Jan 12, 2024

I was conducting Random Survival Forest (RSF) analysis by dividing the data into train, validation, and test sets. I was able to compute the concordance index, but due to compatibility issues between eli5 and scikit-survival, I couldn’t determine feature importance. Below is the code I used in Google Colab:

It gets complicated when survival functions are involved, but how is everyone solving the compatibility issues between eli5 and scikit-survival? Feature importance is essential for writing research papers.

!pip install scikit-survival

import pandas as pd
from sklearn.model_selection import train_test_split
from sksurv.ensemble import RandomSurvivalForest
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import concordance_index_censored
import numpy as np

Load your data

data = pd.read_excel('your_data_file.xlsx')

Preparing the data

target = np.array([(e == 2, t) for e, t in zip(data['Event'], data['DFS'])], dtype=[('Event', '?'), ('DFS', '<f8')])
features = data.drop(columns=['Event', 'DFS'])

Splitting the data

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

Hyperparameter tuning

param_grid = {
'max_features': ['auto', 'sqrt', 'log2'],
'max_depth': [None, 10, 20, 30],
'min_samples_leaf': [1, 2, 4],
'n_estimators': [100, 200, 300],
'min_samples_split': [2, 5, 10]
}
rsf = RandomSurvivalForest(random_state=42)
grid_search = GridSearchCV(rsf, param_grid, cv=5, n_jobs=-1, scoring='roc_auc')
grid_search.fit(X_train, y_train)

Best parameters and model

best_params = grid_search.best_params_
best_rsf = grid_search.best_estimator_

Evaluation on test set

prediction = best_rsf.predict(X_test)
c_index = concordance_index_censored(y_test['Event'], y_test['DFS'], prediction)

Output best parameters and concordance index

print("Best Parameters:", best_params)
print("Concordance Index on Test Set:", c_index[0])

!pip install eli5

rsf = RandomSurvivalForest(random_state=42, **best_params)
rsf.fit(X_train, y_train)

Feature Importance

from eli5.sklearn import PermutationImportance
perm = PermutationImportance(rsf, n_iter=15, random_state=42)
perm.fit(X_test, y_test)
eli5.show_weights(perm, feature_names=X_test.columns.tolist())

Feature Importance

import matplotlib.pyplot as plt
import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(best_rsf, n_iter=15, random_state=42)
perm.fit(X_test, y_test)
eli5.show_weights(perm, feature_names = X_test.columns.tolist())

sebp · 2024-01-12T17:10:56Z

sebp
Jan 12, 2024
Maintainer

If eli5 doesn't work, you can try permutation_importance from sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature importance, the incompatible between eli5 and scikit survival #426

{{title}}

Replies: 1 comment

{{title}}

Select a reply

feature importance, the incompatible between eli5 and scikit survival #426

mittyone Jan 12, 2024

Load your data

Preparing the data

Splitting the data

Hyperparameter tuning

Best parameters and model

Evaluation on test set

Output best parameters and concordance index

Feature Importance

Feature Importance

Replies: 1 comment

sebp Jan 12, 2024 Maintainer

mittyone
Jan 12, 2024

sebp
Jan 12, 2024
Maintainer