You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
all the input array dimensions except for the concatenation axis must match exactly when running FinRL_PortfolioOptimizationEnv_Demo with CAC40 Data
#1257
I wanted to run FinRL_PortfolioOptimizationEnv_Demo and changing the Data source to the CAC40 here:
But unfortunately I got an error when using the method train_model of DRLAgent:
ValueError Traceback (most recent call last)
<ipython-input-50-63f854a52e04> in <cell line: 1>()
----> 1 DRLAgent.train_model(model, episodes=40)
4 frames
/usr/local/lib/python3.10/dist-packages/finrl/agents/portfolio_optimization/models.py in train_model(model, episodes)
78 An instance of the trained model.
79 """
---> 80 model.train(episodes)
81 return model
82
/usr/local/lib/python3.10/dist-packages/finrl/agents/portfolio_optimization/algorithms.py in train(self, episodes)
118
119 # run simulation step
--> 120 next_obs, reward, done, info = self.train_env.step(action)
121
122 # add experience to replay buffer
/usr/local/lib/python3.10/dist-packages/finrl/meta/env_portfolio_optimization/env_portfolio_optimization.py in step(self, actions)
301 # load next state
302 self._time_index += 1
--> 303 self._state, self._info = self._get_state_and_info_from_time_index(
304 self._time_index
305 )
/usr/local/lib/python3.10/dist-packages/finrl/meta/env_portfolio_optimization/env_portfolio_optimization.py in _get_state_and_info_from_time_index(self, time_index)
454 tic_data = tic_data[self._features].to_numpy().T
455 tic_data = tic_data[..., np.newaxis]
--> 456
457 state = tic_data if state is None else np.append(state, tic_data, axis=2)
458 state = state.transpose((0, 2, 1))
/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
5615 arr = arr.ravel()
5616 values = ravel(values)
-> 5617 axis = arr.ndim-1
5618 return concatenate((arr, values), axis=axis)
5619
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 49 and the array at index 1 has size 50```
The text was updated successfully, but these errors were encountered:
I found my problem and it raised another underlying issue. I will start by explaining my problem and its resolution, then I will raise a problem related to data processing by FeatureEngineer (or not).
My problem was that when YahooDownloader downloaded the data, there were certain dates for which there was no data. This absence of data did not result in empty data (NAN) but in the absence of a row for a given date. This, when grouped by date, resulted in 49 data points instead of 50.
During my investigation, I also tried to run the Stock_NeurIPS2018_2_Train.ipynb file with CAC40 data and encountered another type of error, but once again related to the data.
When FeatureEngineer processes the data, it executes this code to clean the data:
defclean_data(self, data):
""" clean the raw data deal with missing values reasons: stocks could be delisted, not incorporated at the time step :param data: (df) pandas dataframe :return: (df) pandas dataframe """df=data.copy()
df=df.sort_values(["date", "tic"], ignore_index=True)
df.index=df.date.factorize()[0]
merged_closes=df.pivot_table(index="date", columns="tic", values="close")
merged_closes=merged_closes.dropna(axis=1)
tics=merged_closes.columnsdf=df[df.tic.isin(tics)]
# df = data.copy()# list_ticker = df["tic"].unique().tolist()# only apply to daily level data, need to fix for minute level# list_date = list(pd.date_range(df['date'].min(),df['date'].max()).astype(str))# combination = list(itertools.product(list_date,list_ticker))# df_full = pd.DataFrame(combination,columns=["date","tic"]).merge(df,on=["date","tic"],how="left")# df_full = df_full[df_full['date'].isin(df['date'])]# df_full = df_full.sort_values(['date','tic'])# df_full = df_full.fillna(0)returndf
This will delete all tickers for which there is not all the data for each date. That is, if a single data point is missing for a single date, it will delete all data related to that ticker.
I am not familiar with the entire project, which is why I ask the question: why not modify it to delete only the rows that do not have all the data (as in my code above) rather than all the data related to the ticker?
I have time and, having already investigated the subject, I am willing to make the necessary changes, but I am aware that this may have impacts that I am unaware of on the viability of the model or other aspects.
I wanted to run FinRL_PortfolioOptimizationEnv_Demo and changing the Data source to the CAC40 here:
But unfortunately I got an error when using the method train_model of DRLAgent:
The text was updated successfully, but these errors were encountered: