Added multi_predict() method for catboost engine, enabling efficient tuning of the trees parameter (#115).
catboost tuning parameters were updates: (#119)
mtry; it used the parsnip name mtry_prop.min_n, sample_size, and stop_iter. min_n is only enabled for GPU computations.boost_tree() (#70).The lightgbm engine now warns that arguments passed to params argument in set_engine() are ignored (#110).
Automatic handling of num_classes argument when specifying a multiclass classification objective for the lightgbm engine (#109).
Increased the minimum R version to R 4.1.
num_threads argument were ignored for lightgbm engine (#105).Resolves a test failure ahead of an upcoming parsnip release (#95).
lightgbm models can now accept sparse matrices for training and prediction (#91).
"aorsf" models would not successfully fit in socket cluster workers (i.e. with plan(multisession)) unless another engine requiring bonsai had been fitted in the worker (#85).Introduced support for accelerated oblique random forests for the "classification" and "regression" modes using the new "aorsf" engine (#78 by @bcjaeger).
Enabled passing Dataset Parameters to the "lightgbm" engine. To pass an argument that would be usually passed as an element to the param argument in lightgbm::lgb.Dataset(), pass the argument directly through the ellipses in set_engine(), e.g. boost_tree() %>% set_engine("lightgbm", linear_tree = TRUE) (#77).
Enabled case weights with the "lightgbm" engine (#72 by @p-schaefer).
Fixed issues in metadata for the "partykit" engine for rand_forest() where some engine arguments were mistakenly protected (#74).
Addressed type check error when fitting lightgbm model specifications with arguments mistakenly left as tune() (#79).
num_leaves engine argument! The num_leaves parameter sets the maximum number of nodes per tree, and is an important tuning parameter for lightgbm (tidymodels/dials#256, tidymodels/parsnip#838). With the newest version of each of dials, parsnip, and bonsai installed, tune this argument by marking the num_leaves engine argument for tuning when defining your model specification:boost_tree() %>% set_engine("lightgbm", num_leaves = tune())
num_threads was overridden when passed via param rather than as a main argument. By default, then, lightgbm will fit sequentially rather than with num_threads = foreach::getDoParWorkers(). The user can still set num_threads via engine arguments with engine = "lightgbm":boost_tree() %>% set_engine("lightgbm", num_threads = x)
Note that, when tuning hyperparameters with the tune package, detection of parallel backend will still work as usual.
The boost_tree argument stop_iter now maps to the lightgbm:::lgb.train() argument early_stopping_round rather than its alias early_stopping_rounds. This does not affect parsnip's interface to lightgbm (i.e. via boost_tree() %>% set_engine("lightgbm")), though will introduce errors for code that uses the train_lightgbm() wrapper directly and sets the lightgbm::lgb.train() argument early_stopping_round by its alias early_stopping_rounds via train_lightgbm()'s ....
Disallowed passing main model arguments as engine arguments to set_engine("lightgbm", ...) via aliases. That is, if a main argument is marked for tuning and a lightgbm alias is supplied as an engine argument, bonsai will now error, rather than supplying both to lightgbm and allowing the package to handle aliases. Users can still interface with non-main boost_tree() arguments via their lightgbm aliases (#53).
sample_size argument to boost_tree
(#32 and tidymodels/parsnip#768). The following docs now available in
?details_boost_tree_lightgbm describe the interface in detail:The
sample_sizeargument is translated to thebagging_fractionparameter in theparamargument oflgb.train. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes thesample_sizeargument with [dials::sample_prop()] during tuning.To effectively enable bagging, the user would also need to set the
bagging_freqargument to lightgbm.bagging_freqdefaults to 0, which means bagging is disabled, and abagging_freqargument ofkmeans that the booster will perform bagging at everykth boosting iteration. Thus, by default, thesample_sizeargument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument tobagging_freqand usek = 1when the analogue tobagging_fractionis in $(0, 1)$. bonsai will thus automatically setbagging_freq = 1inset_engine("lightgbm", ...)ifsample_size(i.e.bagging_fraction) is not equal to 1 and nobagging_freqvalue is supplied. This default can be overridden by setting thebagging_freqargument toset_engine()manually.
Corrected mapping of the mtry argument in boost_tree with the lightgbm
engine. mtry previously mapped to the feature_fraction argument to
lgb.train but was documented as mapping to an argument more closely
resembling feature_fraction_bynode. mtry now maps
to feature_fraction_bynode.
This means that code that set feature_fraction_bynode as an argument to
set_engine() will now error, and the user can now pass feature_fraction
to set_engine() without raising an error.
Fixed error in lightgbm with engine argument objective = "tweedie" and
response values less than 1.
A number of documentation improvements, increases in testing coverage, and
changes to internals in anticipation of the 4.0.0 release of the lightgbm
package. Thank you to @jameslamb for the effort and expertise!
Initial release!