You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just ran into a crash error when trying to get over 1 million objects to cluster using Fast HDBSCAN. The total length of my file of objects is around 2.5M. I am just using two columns of the object. I import it using pandas.dataframe and making sure that they are in float64. If I subsample the whole dataset below 1.1M objects, everything works fine (and fast), but after this point it just crashes. The whole error it returns is
Windows fatal exception: stack overflow
Thread 0x0000d814 (most recent call first):
File "C:\Users\fasen\anaconda3\Lib\site-packages\zmq\utils\garbage.py", line 47 in run
File "C:\Users\fasen\anaconda3\Lib\threading.py", line 1038 in _bootstrap_inner
File "C:\Users\fasen\anaconda3\Lib\threading.py", line 995 in _bootstrap
Main thread:
Current thread 0x00010ba0 (most recent call first):
File "C:\Users\fasen\anaconda3\Lib\site-packages\fast_hdbscan\hdbscan.py", line 168 in fast_hdbscan
File "C:\Users\fasen\anaconda3\Lib\site-packages\fast_hdbscan\hdbscan.py", line 236 in fit
File "c:\users\fasen\documents\universidad\master\1er_semestre\tecniques\p6\code.py", line 91 in <module>
File "C:\Users\fasen\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py", line 356 in compat_exec
File "C:\Users\fasen\anaconda3\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 473 in exec_code
File "C:\Users\fasen\anaconda3\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 615 in _exec_file
File "C:\Users\fasen\anaconda3\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 528 in runfile
File "C:\Users\fasen\AppData\Local\Temp\ipykernel_68656\701512081.py", line 1 in <module>
Restarting kernel...
I also attach the code
from sklearnex import patch_sklearn
patch_sklearn()
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import time
import matplotlib as mpl
from matplotlib.colors import Normalize, LogNorm
from sklearn.cluster import DBSCAN,HDBSCAN
import fast_hdbscan
mpl.rcParams['figure.dpi'] = 400
start = time.time()
dataset = pd.read_csv("C:/Users/fasen/Documents/Universidad/master/1er_semestre/tecniques/p6/data_gaia_edr3_reduced.csv",
header=0, dtype = np.float64)
n=1300000
X_train = dataset.sample(n)
X_train = X_train[['VR','Vphi']]
# =============================================================================
# clustering = DBSCAN(eps=0.5, min_samples=4,algorithm='ball_tree', metric='haversine').fit(X_train)
# DBSCAN_dataset = X_train.copy()
# DBSCAN_dataset.loc[:,'Cluster'] = clustering.labels_
# =============================================================================
start = time.time()
clustering = fast_hdbscan.HDBSCAN(min_cluster_size=20).fit(X_train)
finish = time.time()
print('Computation time for ', n, ' samples of the total: ', start-finish, ' s')
DBSCAN_dataset = X_train.copy()
DBSCAN_dataset.loc[:,'Cluster'] = clustering.labels_
DBSCAN_dataset.Cluster.value_counts().to_frame()
The file I am using is a copy of the EDR3 of Gaia Data, just in case it is important.
Thanks!!
The text was updated successfully, but these errors were encountered:
Hey!
Just ran into a crash error when trying to get over 1 million objects to cluster using Fast HDBSCAN. The total length of my file of objects is around 2.5M. I am just using two columns of the object. I import it using pandas.dataframe and making sure that they are in float64. If I subsample the whole dataset below 1.1M objects, everything works fine (and fast), but after this point it just crashes. The whole error it returns is
Windows fatal exception: stack overflow
Thread 0x0000d814 (most recent call first):
File "C:\Users\fasen\anaconda3\Lib\site-packages\zmq\utils\garbage.py", line 47 in run
File "C:\Users\fasen\anaconda3\Lib\threading.py", line 1038 in _bootstrap_inner
File "C:\Users\fasen\anaconda3\Lib\threading.py", line 995 in _bootstrap
I also attach the code
The file I am using is a copy of the EDR3 of Gaia Data, just in case it is important.
Thanks!!
The text was updated successfully, but these errors were encountered: