Replies: 2 comments 5 replies
-
Hi @lamhintai, thanks for using mlforecast. The joins with the dynamic features use a left join, so if you don't provide them for all your series you'll probably get some warnings about nulls but it should be able to continue with the prediction. Can you provide a small example? |
Beta Was this translation helpful? Give feedback.
-
Hi @jmoralez , I'm having a similar problem, and there are indeed warnings about nulls when using LightGBM, but when trying with other models, such as RandomForestRegressor or AdaBoostRegressor, an error will be raised (example below). I'm thinking about using the Example of the warning:
Example of the error:
An example of my code: from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from mlforecast import MLForecast
from window_ops.rolling import rolling_mean, rolling_max, rolling_min
from window_ops.expanding import expanding_mean
train = data.loc[demand['ds'] < '2022-01-01'].reset_index(drop=True).sort_values(by=["ds", "unique_id"])
valid = data.loc[(demand['ds'] >= '2022-01-01') & (data['ds'] <= '2022-12-31')].reset_index(drop=True)
h = valid['ds'].nunique()
models={
'rf': RandomForestRegressor(n_estimators=10),
'ada': AdaBoostRegressor(estimator=DecisionTreeRegressor()),
}
model = MLForecast(
models=models,
freq="M",
lags=[3, 6, 12],
lag_transforms={
3: [expanding_mean],
6: [(rolling_mean, 12), (rolling_max, 12), (rolling_min, 12)],
12: [(rolling_mean, 24), (rolling_max, 24), (rolling_min, 24)],
},
target_transforms=[NullImputer()],
date_features=["month"],
num_threads=6,
)
model.fit(train, id_col="unique_id", time_col="ds", target_col="y", static_features=[])
p = model.predict(horizon=h)
p = p.merge(valid[["unique_id", "ds", "y"]], on=["unique_id", "ds"], how="inner") |
Beta Was this translation helpful? Give feedback.
-
Is there any way that we can produce forecasts from a global model (e.g. LightGBM) using mlforecast for a few specific time series (
id_col
)?This case arises when we are training a global model on several thousands of series of mobile network cell site usage. Some sites got decommissioned and we no longer have the projected exogenous variables (e.g. resource blocks configured) for these
id_col
values. Or think of it as a model to forecast the demands of many products with prices as an exogenous variable - eventually some products got discontinued and therefore they are no longer on the future projected price book (dynamic_dfs).It looks like the use of GroupedArray in the implementation of forecast has demanded that one must make up exogenous variables for all
id_col
in the future in order to forecast (maybe just for 1 series), even if many items in id_col may not exist anymore in the business sense? (I do not have the experiment jupyter notebook in work at hand, but I recall I have received a warning of "new must be of size 17538" or something similar.)Beta Was this translation helpful? Give feedback.
All reactions