cumulative_dynamic_auc，RuntimeWarning: invalid value encountered in divide true_pos = cumsum_tp / cumsum_tp[-1] #507

yangfan0404 · 2025-01-24T02:52:14Z

yangfan0404
Jan 24, 2025

Dear author,
When using the cumulative_dynamic_auc function to calculate the AUC, I notice that the first time point value returned by the function is NaN.

Upon investigation, I find that there's a RuntimeWarning at line 484 in metrics.py indicating an invalid value encountered in divide: true_pos = cumsum_tp / cumsum_tp[-1], where cumsum_tp[-1] contains a zero value.

The data I'm using is from the public METABRIC dataset. For calculating c-index and IBS (Integrated Brier Score), I use pycox. The code for calculation directly follows the example of LogisticHazard in pycox. Both the training set and test set contain data with events marked as true and false across different time intervals.

I want to know what causes the NaN values in the AUC result and how to solve this problem. If you can help answer this, I would be very grateful.

The complete code is as follows:

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper

import torch
import torchtuples as tt

from pycox.datasets import metabric
from pycox.models import LogisticHazard
from pycox.evaluation import EvalSurv
import sksurv
from sksurv.metrics import cumulative_dynamic_auc


np.random.seed(1234)
_ = torch.manual_seed(123)

df_train = metabric.read_df()
df_test = df_train.sample(frac=0.2)
df_train = df_train.drop(df_test.index)
df_val = df_train.sample(frac=0.2)
df_train = df_train.drop(df_val.index)


cols_standardize = ['x0', 'x1', 'x2', 'x3', 'x8']
cols_leave = ['x4', 'x5', 'x6', 'x7']

standardize = [([col], StandardScaler()) for col in cols_standardize]
leave = [(col, None) for col in cols_leave]

x_mapper = DataFrameMapper(standardize + leave)

x_train = x_mapper.fit_transform(df_train).astype('float32')
x_val = x_mapper.transform(df_val).astype('float32')
x_test = x_mapper.transform(df_test).astype('float32')


num_durations = 10

labtrans = LogisticHazard.label_transform(num_durations)

get_target = lambda df: (df['duration'].values, df['event'].values)


y_train = labtrans.fit_transform(*get_target(df_train))
y_val = labtrans.transform(*get_target(df_val))

train = (x_train, y_train)
val = (x_val, y_val)

durations_test, events_test = get_target(df_test)
durations_train, events_train = get_target(df_train)

in_features = x_train.shape[1]
num_nodes = [32, 32]
out_features = labtrans.out_features
batch_norm = True
dropout = 0.1

net = tt.practical.MLPVanilla(in_features, num_nodes, out_features, batch_norm, dropout)
model = LogisticHazard(net, tt.optim.Adam(0.01), duration_index=labtrans.cuts)

batch_size = 256
epochs = 100
callbacks = [tt.cb.EarlyStopping()]
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val)


surv = model.interpolate(10).predict_surv_df(x_test)


ev = EvalSurv(surv, durations_test, events_test, censor_surv='km')
ctd = ev.concordance_td('antolini')
print(f'Cindex Score for test: {ctd}')
# time_grid = np.linspace(durations_test.min(), durations_test.max(), 100)
# IBS = ev.integrated_brier_score(time_grid)
# print(f'IBS Score for test: {IBS}')

start_t = min(durations_test)
end_t = max(durations_test)
times = np.arange(start_t, end_t, ((end_t - start_t) / (model.predict(x_test).squeeze().shape[1])))


def transform_to_struct_array(times, events):
    return sksurv.util.Surv.from_arrays(events, times)


estimate = model.predict(x_test).squeeze()

survival_train = transform_to_struct_array(durations_train, events_train)
survival_test = transform_to_struct_array(durations_test, events_test)


AUC_metric = cumulative_dynamic_auc(
    survival_train,
    survival_test,
    estimate,
    times)

print(f'AUC metric is:{AUC_metric[1]}')

Answered by sebp

Jan 31, 2025

The problem is times, i.e. the time points where the time-dependent ROC is evaluated. The first entry is 0, but the first event in survival_test occurs at time 3.5. Hence, at time point 0 all samples are censored and there is no information which are the cases and which are the controls.

This should be checked in cumulative_dynamic_auc and throw an exception that explains why this happened.

View full answer

sebp · 2025-01-29T19:35:19Z

sebp
Jan 29, 2025
Maintainer

Thanks for the detailed report.

Would it be possible to provide the actual values, you passed to cumulative_dynamic_auc, e.g. as a CSV?

3 replies

yangfan0404 Jan 30, 2025
Author

Thank you for your reply. The four CSV files input into the cumulative_dynamic_auc function are as shown in the attachments. The minimum (end_t) and maximum (start_t) times in the times.csv file are derived from the test data in the survival_test.csv file. The time points are calculated using the code times = np.arange(start_t, end_t, ((end_t - start_t) / 10)). The test code is as follows:

import pandas as pd
import numpy as np
from sksurv.metrics import cumulative_dynamic_auc

# Load times from CSV
times = np.loadtxt('times.csv', delimiter=',')

estimate_df = pd.read_csv('estimate.csv')
estimate = estimate_df.values.astype(np.float32)

# Read survival_train and survival_test from CSV
df_train = pd.read_csv('survival_train.csv')
df_test = pd.read_csv('survival_test.csv')

# Function to convert DataFrame to structured array
def dataframe_to_structured_array(df, dtype):
    return np.array([tuple(x) for x in df.values], dtype=dtype)

# Convert DataFrames to structured arrays
survival_train = dataframe_to_structured_array(df_train, dtype=[('event', '?'), ('time', '<f8')])
survival_test = dataframe_to_structured_array(df_test, dtype=[('event', '?'), ('time', '<f8')])

# Calculate mean_AUC metric
AUC_metric = cumulative_dynamic_auc(survival_train, survival_test, estimate, times)
print(f'mean_AUC metric is:{AUC_metric[1]}')

Attachments:
estimate.csv
survival_test.csv
survival_train.csv
times.csv

sebp Jan 31, 2025
Maintainer

The problem is times, i.e. the time points where the time-dependent ROC is evaluated. The first entry is 0, but the first event in survival_test occurs at time 3.5. Hence, at time point 0 all samples are censored and there is no information which are the cases and which are the controls.

This should be checked in cumulative_dynamic_auc and throw an exception that explains why this happened.

Answer selected by yangfan0404

yangfan0404 Feb 1, 2025
Author

Thank you for your response. I have modified the initial time in the times to the first event time (uncensored) and set the maximum time to the maximum value in the test data. The AUC calculations at each time point and the average AUC are now performing normally. Thank you very much for your assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cumulative_dynamic_auc，RuntimeWarning: invalid value encountered in divide true_pos = cumsum_tp / cumsum_tp[-1] #507

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

cumulative_dynamic_auc，RuntimeWarning: invalid value encountered in divide true_pos = cumsum_tp / cumsum_tp[-1] #507

yangfan0404 Jan 24, 2025

Replies: 1 comment · 3 replies

sebp Jan 29, 2025 Maintainer

yangfan0404 Jan 30, 2025 Author

sebp Jan 31, 2025 Maintainer

yangfan0404 Feb 1, 2025 Author

yangfan0404
Jan 24, 2025

Replies: 1 comment 3 replies

sebp
Jan 29, 2025
Maintainer

yangfan0404 Jan 30, 2025
Author

sebp Jan 31, 2025
Maintainer

yangfan0404 Feb 1, 2025
Author