Replies: 1 comment
-
If eli5 doesn't work, you can try permutation_importance from sklearn:
https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was conducting Random Survival Forest (RSF) analysis by dividing the data into train, validation, and test sets. I was able to compute the concordance index, but due to compatibility issues between eli5 and scikit-survival, I couldn’t determine feature importance. Below is the code I used in Google Colab:
It gets complicated when survival functions are involved, but how is everyone solving the compatibility issues between eli5 and scikit-survival? Feature importance is essential for writing research papers.
!pip install scikit-survival
import pandas as pd
from sklearn.model_selection import train_test_split
from sksurv.ensemble import RandomSurvivalForest
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import concordance_index_censored
import numpy as np
Load your data
data = pd.read_excel('your_data_file.xlsx')
Preparing the data
target = np.array([(e == 2, t) for e, t in zip(data['Event'], data['DFS'])], dtype=[('Event', '?'), ('DFS', '<f8')])
features = data.drop(columns=['Event', 'DFS'])
Splitting the data
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)
Hyperparameter tuning
param_grid = {
'max_features': ['auto', 'sqrt', 'log2'],
'max_depth': [None, 10, 20, 30],
'min_samples_leaf': [1, 2, 4],
'n_estimators': [100, 200, 300],
'min_samples_split': [2, 5, 10]
}
rsf = RandomSurvivalForest(random_state=42)
grid_search = GridSearchCV(rsf, param_grid, cv=5, n_jobs=-1, scoring='roc_auc')
grid_search.fit(X_train, y_train)
Best parameters and model
best_params = grid_search.best_params_
best_rsf = grid_search.best_estimator_
Evaluation on test set
prediction = best_rsf.predict(X_test)
c_index = concordance_index_censored(y_test['Event'], y_test['DFS'], prediction)
Output best parameters and concordance index
print("Best Parameters:", best_params)
print("Concordance Index on Test Set:", c_index[0])
!pip install eli5
rsf = RandomSurvivalForest(random_state=42, **best_params)
rsf.fit(X_train, y_train)
Feature Importance
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(rsf, n_iter=15, random_state=42)
perm.fit(X_test, y_test)
eli5.show_weights(perm, feature_names=X_test.columns.tolist())
Feature Importance
import matplotlib.pyplot as plt
import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(best_rsf, n_iter=15, random_state=42)
perm.fit(X_test, y_test)
eli5.show_weights(perm, feature_names = X_test.columns.tolist())
Beta Was this translation helpful? Give feedback.
All reactions