cv= KFold(n_splits=5, shuffle=False, random_state=0) dataset= Dataset(df=sample_df, target="sales", features=features)#define the validation scheme and scorerparams = {"objective":"count:poisson","learning_rate": 0.075,"max_depth": 6,'n_estimators': 200,'min_child_weight': 50,"tr...
These lag features turn out to be the most important features in my dataset, based on gradient boosting’s importance features.More information can be found in this notebook, under ‘Generate lag feature new_sales_lag_after12.pickle’3. Holiday Boolean featuresAs mentioned above, I look up ...
return create_dataset_part(df, promo_df, cat_features, item_group_mean, store_group_mean, timesteps, first_pred_start, reshape_output, aux_as_tensor, is_train) 两者皆调用了函数create_dataset_part: def create_dataset_part(df, promo_df, cat_features, item_group_mean, store_group_mean, ...
X_test = prepare_dataset(df_2017, promo_2017, date(2017, 8, 16), is_train=False) X_test2 = prepare_dataset(df_2017_item, promo_2017_item, date(2017, 8, 16), is_train=False, name_prefix='item') X_test2.index = df_2017_item.index X_test2 = X_test2.reindex(df_2017.index....
https://www.kaggle.com/c/favorita-grocery-sales-forecasting/overview 整体来看该比赛就是预测商品的销量,官方提供了2013-2017年各商店商品的销量,参赛队伍需要根据已有数据预测未来一段时间商店商品的销量,下面是每年的训练样本量: 图1 每年的训练样本量 ...
create Create a new dataset version Create a new dataset version init Initialize metadata file for dataset creation metadata Download metadata about a dataset status Get the creation status for a dataset List datasets usage: kaggle datasets list [-h] [--sort-by SORT_BY] [--min-size MIN_SIZE...
NYC Property Sales 描述:This dataset is a concatenated and slightly cleaned-up version of the New York City Department of Finance's Rolling Sales dataset. 下载地址:https://www.kaggle.com/new-york-city/nyc-property-sales Gas Sensor Array Under Dynamic Gas Mixtures 描述:This data set contains ...
"1C Sales Dataset": { "source":"kaggle", "name":"competitive-data-science-predict-future-sales", "path":"1c_sales_dataset", "filename":"competitive-data-science-predict-future-sales.zip", }, "Montreal Bixi Bike Data": { "source":"kaggle", ...
Dataset(vl_x, label=v_y) estimator = lgb.train( lgb_params, train_data, valid_sets = [train_data,valid_data], verbose_eval = 500, ) return estimator 再完成基准模型的构建后,创建过去7天的滞后特征,即使用sales_lag_1 到sales_lag_7列。使用 LightGBM 对带有滞后特征的数据进行测试,以评估性能...
usage: kaggle datasets create [-h] [-p FOLDER] [-u] [-q] [-t] [-r {skip,zip,tar}] optional arguments: -h, --help show this help message and exit -p FOLDER, --path FOLDER Folder for upload, containing data files and a special dataset-metadata.json file (https://github.com/...