Title: | Tools for Creating Tuning Parameter Values |
---|---|
Description: | Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters. |
Authors: | Max Kuhn [aut], Hannah Frick [aut, cre], Posit Software, PBC [cph, fnd] |
Maintainer: | Hannah Frick <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0.9000 |
Built: | 2024-11-18 03:38:29 UTC |
Source: | https://github.com/tidymodels/dials |
Activation functions between network layers
activation(values = values_activation) activation_2(values = values_activation) values_activation
activation(values = values_activation) activation_2(values = values_activation) values_activation
values |
A character string of possible values. See |
An object of class character
of length 5.
This parameter is used in parsnip
models for neural networks such as
parsnip:::mlp()
.
values_activation activation()
values_activation activation()
This parameter can be used to moderate smoothness of spline or other terms used in generalized additive models.
adjust_deg_free(range = c(0.25, 4), trans = NULL)
adjust_deg_free(range = c(0.25, 4), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Used in parsnip::gen_additive_mod()
.
adjust_deg_free()
adjust_deg_free()
Used in themis::step_bsmote()
.
all_neighbors(values = c(TRUE, FALSE))
all_neighbors(values = c(TRUE, FALSE))
values |
A vector of possible values (TRUE or FALSE). |
all_neighbors()
all_neighbors()
Parameters for BART models These parameters are used for constructing Bayesian adaptive regression tree (BART) models.
prior_terminal_node_coef(range = c(0, 1), trans = NULL) prior_terminal_node_expo(range = c(1, 3), trans = NULL) prior_outcome_range(range = c(0, 5), trans = NULL)
prior_terminal_node_coef(range = c(0, 1), trans = NULL) prior_terminal_node_expo(range = c(1, 3), trans = NULL) prior_outcome_range(range = c(0, 5), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
These parameters are often used with Bayesian adaptive regression trees (BART)
via parsnip::bart()
.
In equivocal zones, predictions are considered equivocal (i.e. "could go either way") if their probability falls within some distance on either side of the classification threshold. That distance is called the "buffer."
buffer(range = c(0, 0.5), trans = NULL)
buffer(range = c(0, 0.5), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
A buffer of .5 is only possible if the classification threshold is .5.
In that case, all probability predictions are considered equivocal,
regardless of their value in [0, 1]
.
Otherwise, the maximum buffer is min(threshold, 1 - threshold)
.
buffer()
buffer()
This parameter can be used to moderate how much influence certain classes receive during training.
class_weights(range = c(1, 10), trans = NULL)
class_weights(range = c(1, 10), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Used in brulee::brulee_logistic_reg()
and brulee::brulee_mlp()
class_weights()
class_weights()
Parameters for possible engine parameters for partykit models
conditional_min_criterion( range = c(1.386294, 15), trans = scales::transform_logit() ) values_test_type conditional_test_type(values = values_test_type) values_test_statistic conditional_test_statistic(values = values_test_statistic)
conditional_min_criterion( range = c(1.386294, 15), trans = scales::transform_logit() ) values_test_type conditional_test_type(values = values_test_type) values_test_statistic conditional_test_statistic(values = values_test_statistic)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. |
trans |
A |
values |
A character string of possible values. |
An object of class character
of length 4.
An object of class character
of length 2.
The range of conditional_min_criterion()
corresponds to roughly 0.80 to
0.99997 in the natural units. For several test types, this parameter
corresponds to 1 - {p-value}
.
For the functions, they return a function with classes "param" and either "quant_param" or "qual_param".
These parameters are auxiliary to tree-based models that use the "C5.0"
engine. They correspond to tuning parameters that would be specified using
set_engine("C5.0", ...)
.
confidence_factor(range = c(-1, 0), trans = transform_log10()) no_global_pruning(values = c(TRUE, FALSE)) predictor_winnowing(values = c(TRUE, FALSE)) fuzzy_thresholding(values = c(TRUE, FALSE)) rule_bands(range = c(2L, 500L), trans = NULL)
confidence_factor(range = c(-1, 0), trans = transform_log10()) no_global_pruning(values = c(TRUE, FALSE)) predictor_winnowing(values = c(TRUE, FALSE)) fuzzy_thresholding(values = c(TRUE, FALSE)) rule_bands(range = c(2L, 500L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
For |
To use these, check ?C50::C5.0Control
to see how they are used.
confidence_factor() no_global_pruning() predictor_winnowing() fuzzy_thresholding() rule_bands()
confidence_factor() no_global_pruning() predictor_winnowing() fuzzy_thresholding() rule_bands()
Parameters related to the SVM objective function(s).
cost(range = c(-10, 5), trans = transform_log2()) svm_margin(range = c(0, 0.2), trans = NULL)
cost(range = c(-10, 5), trans = transform_log2()) svm_margin(range = c(0, 0.2), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
cost() svm_margin()
cost() svm_margin()
The number of degrees of freedom used for model parameters.
deg_free(range = c(1L, 5L), trans = NULL)
deg_free(range = c(1L, 5L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
One context in which this parameter is used is spline basis functions.
deg_free()
deg_free()
These parameters help model cases where an exponent is of interest (e.g.
degree()
or spline_degree()
) or a product is used (e.g. prod_degree
).
degree(range = c(1, 3), trans = NULL) degree_int(range = c(1L, 3L), trans = NULL) spline_degree(range = c(1L, 10L), trans = NULL) prod_degree(range = c(1L, 2L), trans = NULL)
degree(range = c(1, 3), trans = NULL) degree_int(range = c(1L, 3L), trans = NULL) spline_degree(range = c(1L, 10L), trans = NULL) prod_degree(range = c(1L, 2L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
degree()
is helpful for parameters that are real number exponents (e.g.
x^degree
) whereas degree_int()
is for cases where the exponent should be
an integer.
The difference between degree_int()
and spline_degree()
is the default ranges
(which is based on the context of how/where they are used).
prod_degree()
is used by parsnip::mars()
for the number of terms in
interactions (and generates an integer).
degree() degree_int() spline_degree() prod_degree()
degree() degree_int() spline_degree() prod_degree()
Used in parsnip::nearest_neighbor()
.
dist_power(range = c(1, 2), trans = NULL)
dist_power(range = c(1, 2), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter controls how distances are calculated. For example,
dist_power = 1
corresponds to Manhattan distance while dist_power = 2
is
Euclidean distance.
dist_power()
dist_power()
These functions generate parameters that are useful for neural network models.
dropout(range = c(0, 1), trans = NULL) epochs(range = c(10L, 1000L), trans = NULL) hidden_units(range = c(1L, 10L), trans = NULL) hidden_units_2(range = c(1L, 10L), trans = NULL) batch_size(range = c(unknown(), unknown()), trans = transform_log2())
dropout(range = c(0, 1), trans = NULL) epochs(range = c(10L, 1000L), trans = NULL) hidden_units(range = c(1L, 10L), trans = NULL) hidden_units_2(range = c(1L, 10L), trans = NULL) batch_size(range = c(unknown(), unknown()), trans = transform_log2())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
dropout()
: The parameter dropout rate. (See parsnip:::mlp()
).
epochs()
: The number of iterations of training. (See parsnip:::mlp()
).
hidden_units()
: The number of hidden units in a network layer.
(See parsnip:::mlp()
).
batch_size()
: The mini-batch size for neural networks.
dropout()
dropout()
These parameters are auxiliary to models that use the "Cubist"
engine. They correspond to tuning parameters that would be specified using
set_engine("Cubist0", ...)
.
extrapolation(range = c(1, 110), trans = NULL) unbiased_rules(values = c(TRUE, FALSE)) max_rules(range = c(1L, 100L), trans = NULL)
extrapolation(range = c(1, 110), trans = NULL) unbiased_rules(values = c(TRUE, FALSE)) max_rules(range = c(1L, 100L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
For |
To use these, check ?Cubist::cubistControl
to see how they are used.
extrapolation() unbiased_rules() max_rules()
extrapolation() unbiased_rules() max_rules()
These functions take a parameter object and modify the unknown parts of
ranges
based on a data set and simple heuristics.
finalize(object, ...) ## S3 method for class 'list' finalize(object, x, force = TRUE, ...) ## S3 method for class 'param' finalize(object, x, force = TRUE, ...) ## S3 method for class 'parameters' finalize(object, x, force = TRUE, ...) ## S3 method for class 'logical' finalize(object, x, force = TRUE, ...) ## Default S3 method: finalize(object, x, force = TRUE, ...) get_p(object, x, log_vals = FALSE, ...) get_log_p(object, x, ...) get_n_frac(object, x, log_vals = FALSE, frac = 1/3, ...) get_n_frac_range(object, x, log_vals = FALSE, frac = c(1/10, 5/10), ...) get_n(object, x, log_vals = FALSE, ...) get_rbf_range(object, x, seed = sample.int(10^5, 1), ...) get_batch_sizes(object, x, frac = c(1/10, 1/3), ...)
finalize(object, ...) ## S3 method for class 'list' finalize(object, x, force = TRUE, ...) ## S3 method for class 'param' finalize(object, x, force = TRUE, ...) ## S3 method for class 'parameters' finalize(object, x, force = TRUE, ...) ## S3 method for class 'logical' finalize(object, x, force = TRUE, ...) ## Default S3 method: finalize(object, x, force = TRUE, ...) get_p(object, x, log_vals = FALSE, ...) get_log_p(object, x, ...) get_n_frac(object, x, log_vals = FALSE, frac = 1/3, ...) get_n_frac_range(object, x, log_vals = FALSE, frac = c(1/10, 5/10), ...) get_n(object, x, log_vals = FALSE, ...) get_rbf_range(object, x, seed = sample.int(10^5, 1), ...) get_batch_sizes(object, x, frac = c(1/10, 1/3), ...)
object |
A |
... |
Other arguments to pass to the underlying parameter
finalizer functions. For example, for |
x |
The predictor data. In some cases (see below) this should only include numeric data. |
force |
A single logical that indicates that even if the parameter object is complete, should it update the ranges anyway? |
log_vals |
A logical: should the ranges be set on the log10 scale? |
frac |
A double for the fraction of the data to be used for the upper
bound. For |
seed |
An integer to control the randomness of the calculations. |
finalize()
runs the embedded finalizer function contained in the param
object (object$finalize
) and returns the updated version. The finalization
function is one of the get_*()
helpers.
The get_*()
helper functions are designed to be used with the pipe
and update the parameter object in-place.
get_p()
and get_log_p()
set the upper value of the range to be
the number of columns in the data (on the natural and
log10 scale, respectively).
get_n()
and get_n_frac()
set the upper value to be the number of
rows in the data or a fraction of the total number of rows.
get_rbf_range()
sets both bounds based on the heuristic defined in
kernlab::sigest()
. It requires that all columns in x
be numeric.
An updated param
object or a list of updated param
objects depending
on what is provided in object
.
library(dplyr) car_pred <- select(mtcars, -mpg) # Needs an upper bound mtry() finalize(mtry(), car_pred) # Nothing to do here since no unknowns penalty() finalize(penalty(), car_pred) library(kernlab) library(tibble) library(purrr) params <- tribble( ~parameter, ~object, "mtry", mtry(), "num_terms", num_terms(), "rbf_sigma", rbf_sigma() ) params # Note that `rbf_sigma()` has a default range that does not need to be # finalized but will be changed if used in the function: complete_params <- params %>% mutate(object = map(object, finalize, car_pred)) complete_params params %>% dplyr::filter(parameter == "rbf_sigma") %>% pull(object) complete_params %>% dplyr::filter(parameter == "rbf_sigma") %>% pull(object)
library(dplyr) car_pred <- select(mtcars, -mpg) # Needs an upper bound mtry() finalize(mtry(), car_pred) # Nothing to do here since no unknowns penalty() finalize(penalty(), car_pred) library(kernlab) library(tibble) library(purrr) params <- tribble( ~parameter, ~object, "mtry", mtry(), "num_terms", num_terms(), "rbf_sigma", rbf_sigma() ) params # Note that `rbf_sigma()` has a default range that does not need to be # finalized but will be changed if used in the function: complete_params <- params %>% mutate(object = map(object, finalize, car_pred)) complete_params params %>% dplyr::filter(parameter == "rbf_sigma") %>% pull(object) complete_params %>% dplyr::filter(parameter == "rbf_sigma") %>% pull(object)
These parameters control the specificity of the filter for near-zero
variance parameters in recipes::step_nzv()
.
freq_cut(range = c(5, 25), trans = NULL) unique_cut(range = c(0, 100), trans = NULL)
freq_cut(range = c(5, 25), trans = NULL) unique_cut(range = c(0, 100), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Smaller values of freq_cut()
and unique_cut()
make the filter less
sensitive.
freq_cut() unique_cut()
freq_cut() unique_cut()
Random and regular grids can be created for any number of parameter objects.
grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'parameters' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'list' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'param' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'parameters' grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'list' grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'param' grid_random(x, ..., size = 5, original = TRUE, filter = NULL)
grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'parameters' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'list' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) ## S3 method for class 'param' grid_regular(x, ..., levels = 3, original = TRUE, filter = NULL) grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'parameters' grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'list' grid_random(x, ..., size = 5, original = TRUE, filter = NULL) ## S3 method for class 'param' grid_random(x, ..., size = 5, original = TRUE, filter = NULL)
x |
A |
... |
One or more |
levels |
An integer for the number of values of each parameter to use
to make the regular grid. |
original |
A logical: should the parameters be in the original units or in the transformed space (if any)? |
filter |
A logical: should the parameters be filtered prior to generating the grid. Must be a single expression referencing parameter names that evaluates to a logical vector. |
size |
A single integer for the total number of parameter value combinations returned for the random grid. If duplicate combinations are generated from this size, the smaller, unique set is returned. |
Note that there may a difference in grids depending on how the function
is called. If the call uses the parameter objects directly the possible
ranges come from the objects in dials
. For example:
mixture()
## Proportion of Lasso Penalty (quantitative) ## Range: [0, 1]
set.seed(283) mix_grid_1 <- grid_random(mixture(), size = 1000) range(mix_grid_1$mixture)
## [1] 0.001490161 0.999741096
However, in some cases, the parsnip
and recipe
packages overrides
the default ranges for specific models and preprocessing steps. If the
grid function uses a parameters
object created from a model or recipe,
the ranges may have different defaults (specific to those models). Using
the example above, the mixture
argument above is different for
glmnet
models:
library(parsnip) library(tune) # When used with glmnet, the range is [0.05, 1.00] glmn_mod <- linear_reg(mixture = tune()) %>% set_engine("glmnet") set.seed(283) mix_grid_2 <- grid_random(extract_parameter_set_dials(glmn_mod), size = 1000) range(mix_grid_2$mixture)
## [1] 0.05141565 0.99975404
A tibble. There are columns for each parameter and a row for every parameter combination.
# filter arg will allow you to filter subsequent grid data frame based on some condition. p <- parameters(penalty(), mixture()) grid_regular(p) grid_regular(p, filter = penalty <= .01) # Will fail due to unknowns: # grid_regular(mtry(), min_n()) grid_regular(penalty(), mixture()) grid_regular(penalty(), mixture(), levels = 3:4) grid_regular(penalty(), mixture(), levels = c(mixture = 4, penalty = 3)) grid_random(penalty(), mixture())
# filter arg will allow you to filter subsequent grid data frame based on some condition. p <- parameters(penalty(), mixture()) grid_regular(p) grid_regular(p, filter = penalty <= .01) # Will fail due to unknowns: # grid_regular(mtry(), min_n()) grid_regular(penalty(), mixture()) grid_regular(penalty(), mixture(), levels = 3:4) grid_regular(penalty(), mixture(), levels = c(mixture = 4, penalty = 3)) grid_random(penalty(), mixture())
Experimental designs for computer experiments are used to construct parameter grids that try to cover the parameter space such that any portion of the space has does not have an observed combination that is unnecessarily close to any other point.
grid_space_filling(x, ..., size = 5, type = "any", original = TRUE) ## S3 method for class 'parameters' grid_space_filling( x, ..., size = 5, type = "any", variogram_range = 0.5, iter = 1000, original = TRUE ) ## S3 method for class 'list' grid_space_filling( x, ..., size = 5, type = "any", variogram_range = 0.5, iter = 1000, original = TRUE ) ## S3 method for class 'param' grid_space_filling( x, ..., size = 5, variogram_range = 0.5, iter = 1000, type = "any", original = TRUE )
grid_space_filling(x, ..., size = 5, type = "any", original = TRUE) ## S3 method for class 'parameters' grid_space_filling( x, ..., size = 5, type = "any", variogram_range = 0.5, iter = 1000, original = TRUE ) ## S3 method for class 'list' grid_space_filling( x, ..., size = 5, type = "any", variogram_range = 0.5, iter = 1000, original = TRUE ) ## S3 method for class 'param' grid_space_filling( x, ..., size = 5, variogram_range = 0.5, iter = 1000, type = "any", original = TRUE )
x |
A |
... |
One or more |
size |
A single integer for the maximum number of parameter value combinations returned. If duplicate combinations are generated from this size, the smaller, unique set is returned. |
type |
A character string with possible values: |
original |
A logical: should the parameters be in the original units or in the transformed space (if any)? |
variogram_range |
A numeric value greater than zero. Larger values
reduce the likelihood of empty regions in the parameter space. Only used
for |
iter |
An integer for the maximum number of iterations used to find
a good design. Only used for |
The types of designs supported here are latin hypercube designs of
different types. The simple designs produced by
grid_latin_hypercube()
are space-filling but
don’t guarantee or optimize any other properties.
grid_space_filling()
might be able to produce
designs that discourage grid points from being close to one another.
There are a lot of methods for doing this, such as maximizing the
minimum distance between points (see Husslage et al 2001).
grid_max_entropy()
attempts to maximize the
determinant of the spatial correlation matrix between coordinates.
Latin hypercube and maximum entropy designs use random numbers to make the designs.
By default, grid_space_filling()
will try to
use a pre-optimized space-filling design from
https://www.spacefillingdesigns.nl/
(see Husslage et al, 2011) or using a uniform design. If no pre-made
design is available, then a maximum entropy design is created.
Also note that there may a difference in grids depending on how the
function is called. If the call uses the parameter objects directly the
possible ranges come from the objects in dials
. For example:
mixture()
## Proportion of Lasso Penalty (quantitative) ## Range: [0, 1]
set.seed(283) mix_grid_1 <- grid_latin_hypercube(mixture(), size = 1000) range(mix_grid_1$mixture)
## [1] 0.0001530482 0.9999530388
However, in some cases, the parsnip
and recipe
packages overrides
the default ranges for specific models and preprocessing steps. If the
grid function uses a parameters
object created from a model or recipe,
the ranges may have different defaults (specific to those models). Using
the example above, the mixture
argument above is different for
glmnet
models:
library(parsnip) library(tune) # When used with glmnet, the range is [0.05, 1.00] glmn_mod <- linear_reg(mixture = tune()) %>% set_engine("glmnet") set.seed(283) mix_grid_2 <- glmn_mod %>% extract_parameter_set_dials() %>% grid_latin_hypercube(size = 1000) range(mix_grid_2$mixture)
## [1] 0.0501454 0.9999554
Sacks, Jerome & Welch, William & J. Mitchell, Toby, and Wynn, Henry. (1989). Design and analysis of computer experiments. With comments and a rejoinder by the authors. Statistical Science. 4. 10.1214/ss/1177012413.
Santner, Thomas, Williams, Brian, and Notz, William. (2003). The Design and Analysis of Computer Experiments. Springer.
Dupuy, D., Helbert, C., and Franco, J. (2015). DiceDesign and DiceEval: Two R packages for design and analysis of computer experiments. Journal of Statistical Software, 65(11)
Husslage, B. G., Rennen, G., Van Dam, E. R., & Den Hertog, D. (2011). Space-filling Latin hypercube designs for computer experiments. Optimization and Engineering, 12, 611-630.
Fang, K. T., Lin, D. K., Winker, P., & Zhang, Y. (2000). Uniform design: Theory and application. _Technometric_s, 42(3), 237-248
grid_space_filling( hidden_units(), penalty(), epochs(), activation(), learn_rate(c(0, 1), trans = scales::transform_log()), size = 10, original = FALSE ) # ------------------------------------------------------------------------------ # comparing methods if (rlang::is_installed("ggplot2")) { library(dplyr) library(ggplot2) set.seed(383) parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "latin_hypercube") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("latin hypercube") set.seed(383) parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "max_entropy") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("maximum entropy") parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "audze_eglais") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("Audze-Eglais") parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "uniform") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("uniform") }
grid_space_filling( hidden_units(), penalty(), epochs(), activation(), learn_rate(c(0, 1), trans = scales::transform_log()), size = 10, original = FALSE ) # ------------------------------------------------------------------------------ # comparing methods if (rlang::is_installed("ggplot2")) { library(dplyr) library(ggplot2) set.seed(383) parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "latin_hypercube") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("latin hypercube") set.seed(383) parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "max_entropy") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("maximum entropy") parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "audze_eglais") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("Audze-Eglais") parameters(trees(), mixture()) %>% grid_space_filling(size = 25, type = "uniform") %>% ggplot(aes(trees, mixture)) + geom_point() + lims(y = 0:1, x = c(1, 2000)) + ggtitle("uniform") }
Used in recipes::step_harmonic()
.
harmonic_frequency(range = c(0.01, 1), trans = NULL)
harmonic_frequency(range = c(0.01, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
harmonic_frequency()
harmonic_frequency()
This parameter is the type of initialization for the UMAP coordinates. Can be
one of "spectral"
, "normlaplacian"
, "random"
, "lvrandom"
,
"laplacian"
, "pca"
, "spca"
, or "agspectral"
. See uwot::umap()
for
more details.
initial_umap(values = values_initial_umap) values_initial_umap
initial_umap(values = values_initial_umap) values_initial_umap
values |
A character string of possible values. See |
An object of class character
of length 8.
This parameter is used in recipes
via embed::step_umap()
.
values_initial_umap initial_umap()
values_initial_umap initial_umap()
Laplace correction for smoothing low-frequency counts.
Laplace(range = c(0, 3), trans = NULL)
Laplace(range = c(0, 3), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter is often used to correct for zero-count data in tables or proportions.
A function with classes "quant_param"
and "param"
.
Laplace()
Laplace()
The parameter is used in boosting methods (parsnip::boost_tree()
) or some
types of neural network optimization methods.
learn_rate(range = c(-10, -1), trans = transform_log10())
learn_rate(range = c(-10, -1), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
The parameter is used on the log10 scale. The units for the range
function
are on this scale.
learn_rate()
corresponds to eta
in xgboost.
learn_rate()
learn_rate()
These parameters are auxiliary to random forest models that use the "randomForest"
engine. They correspond to tuning parameters that would be specified using
set_engine("randomForest", ...)
.
max_nodes(range = c(100L, 10000L), trans = NULL)
max_nodes(range = c(100L, 10000L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
max_nodes()
max_nodes()
These parameters are auxiliary to models that use the "earth"
engine. They correspond to tuning parameters that would be specified using
set_engine("earth", ...)
.
max_num_terms(range = c(20L, 200L), trans = NULL)
max_num_terms(range = c(20L, 200L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
To use these, check ?earth::earth
to see how they are used.
max_num_terms()
max_num_terms()
Used in textrecipes::step_tokenfilter()
.
max_times(range = c(1L, as.integer(10^5)), trans = NULL) min_times(range = c(0L, 1000L), trans = NULL)
max_times(range = c(1L, as.integer(10^5)), trans = NULL) min_times(range = c(0L, 1000L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
max_times() min_times()
max_times() min_times()
Used in textrecipes::step_tokenfilter()
.
max_tokens(range = c(0L, as.integer(10^3)), trans = NULL)
max_tokens(range = c(0L, as.integer(10^3)), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
max_tokens()
max_tokens()
Used in embed::step_umap()
.
min_dist(range = c(-4, 0), trans = transform_log10())
min_dist(range = c(-4, 0), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
min_dist()
min_dist()
Some pre-processing parameters require a minimum number of unique data points
to proceed. Used in recipes::step_discretize()
.
min_unique(range = c(5L, 15L), trans = NULL)
min_unique(range = c(5L, 15L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
min_unique()
min_unique()
A numeric parameter function representing the relative amount of penalties (e.g. L1, L2, etc) in regularized models.
mixture(range = c(0, 1), trans = NULL)
mixture(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter is used for regularized or penalized models such as
parsnip::linear_reg()
, parsnip::logistic_reg()
, and others. It is
formulated as the proportion of L1 regularization (i.e. lasso) in the model.
In the glmnet
model, mixture = 1
is a pure lasso model while mixture = 0
indicates that ridge regression is being used.
mixture()
mixture()
A useful parameter for neural network models using gradient descent
momentum(range = c(0, 1), trans = NULL)
momentum(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
momentum()
momentum()
The number of predictors that will be randomly sampled at each split when creating tree models.
mtry(range = c(1L, unknown()), trans = NULL) mtry_long(range = c(0L, unknown()), trans = transform_log10())
mtry(range = c(1L, unknown()), trans = NULL) mtry_long(range = c(0L, unknown()), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter is used for regularized or penalized models such as
parsnip::rand_forest()
and others. mtry_long()
has the values on the
log10 scale and is helpful when the data contain a large number of predictors.
Since the scale of the parameter depends on the number of columns in the
data set, the upper bound is set to unknown
but can be filled in via the
finalize()
method.
mtry_prop()
is a variation on mtry()
where the value is
interpreted as the proportion of predictors that will be randomly sampled
at each split rather than the count.
This parameter is not intended for use in accommodating engines that take in
this argument as a proportion; mtry
is often a main model argument
rather than an engine-specific argument, and thus should not have an
engine-specific interface.
When wrapping modeling engines that interpret mtry
in its sense as a
proportion, use the mtry()
parameter in parsnip::set_model_arg()
and
process the passed argument in an internal wrapping function as
mtry / number_of_predictors
. In addition, introduce a logical argument
counts
to the wrapping function, defaulting to TRUE
, that indicates
whether to interpret the supplied argument as a count rather than a proportion.
For an example implementation, see parsnip::xgb_train()
.
mtry_prop
mtry(c(1L, 10L)) # in original units mtry_long(c(0, 5)) # in log10 units
mtry(c(1L, 10L)) # in original units mtry_long(c(0, 5)) # in log10 units
The proportion of predictors that will be randomly sampled at each split when creating tree models.
mtry_prop(range = c(0.1, 1), trans = NULL)
mtry_prop(range = c(0.1, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
A dials
object with classes "quant_param" and "param". The
range
element of the object is always converted to a list with elements
"lower" and "upper".
mtry_prop()
is a variation on mtry()
where the value is
interpreted as the proportion of predictors that will be randomly sampled
at each split rather than the count.
This parameter is not intended for use in accommodating engines that take in
this argument as a proportion; mtry
is often a main model argument
rather than an engine-specific argument, and thus should not have an
engine-specific interface.
When wrapping modeling engines that interpret mtry
in its sense as a
proportion, use the mtry()
parameter in parsnip::set_model_arg()
and
process the passed argument in an internal wrapping function as
mtry / number_of_predictors
. In addition, introduce a logical argument
counts
to the wrapping function, defaulting to TRUE
, that indicates
whether to interpret the supplied argument as a count rather than a proportion.
For an example implementation, see parsnip::xgb_train()
.
mtry, mtry_long
mtry_prop()
mtry_prop()
The number of neighbors is used for models (parsnip::nearest_neighbor()
),
imputation (recipes::step_impute_knn()
), and dimension reduction
(recipes::step_isomap()
).
neighbors(range = c(1L, 10L), trans = NULL)
neighbors(range = c(1L, 10L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
A static range is used but a broader range should be used if the data set is large or more neighbors are required.
neighbors()
neighbors()
These functions are used to construct new parameter objects. Generally,
these functions are called from higher level parameter generating functions
like mtry()
.
new_quant_param( type = c("double", "integer"), range = NULL, inclusive = NULL, default = deprecated(), trans = NULL, values = NULL, label = NULL, finalize = NULL, ..., call = caller_env() ) new_qual_param( type = c("character", "logical"), values, default = deprecated(), label = NULL, finalize = NULL, ..., call = caller_env() )
new_quant_param( type = c("double", "integer"), range = NULL, inclusive = NULL, default = deprecated(), trans = NULL, values = NULL, label = NULL, finalize = NULL, ..., call = caller_env() ) new_qual_param( type = c("character", "logical"), values, default = deprecated(), label = NULL, finalize = NULL, ..., call = caller_env() )
type |
A single character value. For quantitative parameters, valid
choices are |
range |
A two-element vector with the smallest or largest possible
values, respectively. If these cannot be set when the parameter is defined,
the |
inclusive |
A two-element logical vector for whether the range
values should be inclusive or exclusive. If |
default |
No longer used. If a value is supplied, it will be ignored and a warning will be thrown. |
trans |
A |
values |
A vector of possible values that is required when |
label |
An optional named character string that can be used for
printing and plotting. The name should match the object name (e.g.
|
finalize |
A function that can be used to set the data-specific
values of a parameter (such as the |
... |
These dots are for future extensions and must be empty. |
call |
The call passed on to |
An object of class "param"
with the primary class being either
"quant_param"
or "qual_param"
. The range
element of the object
is always converted to a list with elements "lower"
and "upper"
.
# Create a function that generates a quantitative parameter # corresponding to the number of subgroups. num_subgroups <- function(range = c(1L, 20L), trans = NULL) { new_quant_param( type = "integer", range = range, inclusive = c(TRUE, TRUE), trans = trans, label = c(num_subgroups = "# Subgroups"), finalize = NULL ) } num_subgroups() num_subgroups(range = c(3L, 5L)) # Custom parameters instantly have access # to sequence generating functions value_seq(num_subgroups(), 5)
# Create a function that generates a quantitative parameter # corresponding to the number of subgroups. num_subgroups <- function(range = c(1L, 20L), trans = NULL) { new_quant_param( type = "integer", range = range, inclusive = c(TRUE, TRUE), trans = trans, label = c(num_subgroups = "# Subgroups"), finalize = NULL ) } num_subgroups() num_subgroups(range = c(3L, 5L)) # Custom parameters instantly have access # to sequence generating functions value_seq(num_subgroups(), 5)
This parameter controls how many bins are used when discretizing predictors.
Used in recipes::step_discretize()
and embed::step_discretize_xgb()
.
num_breaks(range = c(2L, 10L), trans = NULL)
num_breaks(range = c(2L, 10L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
num_breaks()
num_breaks()
Used in most tidyclust
models.
num_clusters(range = c(1L, 10L), trans = NULL)
num_clusters(range = c(1L, 10L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
num_clusters()
num_clusters()
The number of derived predictors from models or feature engineering methods.
num_comp(range = c(1L, unknown()), trans = NULL) num_terms(range = c(1L, unknown()), trans = NULL)
num_comp(range = c(1L, unknown()), trans = NULL) num_terms(range = c(1L, unknown()), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
Since the scale of these parameters often depends on the number of columns
in the data set, the upper bound is set to unknown
. For example, the
number of PCA components is limited by the number of columns and so on.
The difference between num_comp()
and num_terms()
is semantics.
num_terms() num_terms(c(2L, 10L))
num_terms() num_terms(c(2L, 10L))
Used in textrecipes::step_texthash()
and textrecipes::step_dummy_hash()
.
num_hash(range = c(8L, 12L), trans = transform_log2()) signed_hash(values = c(TRUE, FALSE))
num_hash(range = c(8L, 12L), trans = transform_log2()) signed_hash(values = c(TRUE, FALSE))
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
A vector of possible values (TRUE or FALSE). |
num_hash() signed_hash()
num_hash() signed_hash()
The number of knots used for spline model parameters.
num_knots(range = c(0L, 5L), trans = NULL)
num_knots(range = c(0L, 5L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
One context in which this parameter is used is spline basis functions.
num_knots()
num_knots()
These parameters are auxiliary to tree-based models that use the "lightgbm"
engine. They correspond to tuning parameters that would be specified using
set_engine("lightgbm", ...)
.
num_leaves(range = c(5, 100), trans = NULL)
num_leaves(range = c(5, 100), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
"lightbgm" is an available engine in the parsnip extension package bonsai
For more information, see the lightgbm webpage.
num_leaves()
num_leaves()
Used in recipes::step_nnmf()
.
num_runs(range = c(1L, 10L), trans = NULL)
num_runs(range = c(1L, 10L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
num_runs()
num_runs()
Used in textrecipes::step_ngram()
.
num_tokens(range = c(1, 3), trans = NULL)
num_tokens(range = c(1, 3), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
num_tokens()
num_tokens()
For up- and down-sampling methods, these parameters control how much data are
added or removed from the training set. Used in themis::step_rose()
,
themis::step_smotenc()
, themis::step_bsmote()
, themis::step_upsample()
,
themis::step_downsample()
, and themis::step_nearmiss()
.
over_ratio(range = c(0.8, 1.2), trans = NULL) under_ratio(range = c(0.8, 1.2), trans = NULL)
over_ratio(range = c(0.8, 1.2), trans = NULL) under_ratio(range = c(0.8, 1.2), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
under_ratio() over_ratio()
under_ratio() over_ratio()
Information on tuning parameters within an object
parameters(x, ...) ## Default S3 method: parameters(x, ...) ## S3 method for class 'param' parameters(x, ...) ## S3 method for class 'list' parameters(x, ...)
parameters(x, ...) ## Default S3 method: parameters(x, ...) ## S3 method for class 'param' parameters(x, ...) ## S3 method for class 'list' parameters(x, ...)
x |
An object, such as a list of |
... |
Only used for the |
A numeric parameter function representing the amount of penalties (e.g. L1, L2, etc) in regularized models.
penalty(range = c(-10, 0), trans = transform_log10())
penalty(range = c(-10, 0), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter is used for regularized or penalized models such as
parsnip::linear_reg()
, parsnip::logistic_reg()
, and others.
penalty()
penalty()
The parameter is used in models where a parameter is the proportion of predictor variables.
predictor_prop(range = c(0, 1), trans = NULL)
predictor_prop(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
predictor_prop()
is used in step_pls()
.
predictor_prop()
predictor_prop()
A numeric parameter function representing parameters for the spike-and-slab
prior used by embed::step_pca_sparse_bayes()
.
prior_slab_dispersion(range = c(-1/2, log10(3)), trans = transform_log10()) prior_mixture_threshold(range = c(0, 1), trans = NULL)
prior_slab_dispersion(range = c(-1/2, log10(3)), trans = transform_log10()) prior_mixture_threshold(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
prior_slab_dispersion()
is related to the prior for the case where a PCA
loading is selected (i.e. non-zero). Smaller values result in an increase in
zero coefficients.
prior_mixture_threshold()
is used to threshold the prior to determine which
parameters are non-zero or zero. Increasing this parameter increases the
number of zero coefficients.
mixture()
mixture()
MARS pruning methods
prune_method(values = values_prune_method) values_prune_method
prune_method(values = values_prune_method) values_prune_method
values |
A character string of possible values. See |
An object of class character
of length 6.
This parameter is used in parsnip:::mars()
.
values_prune_method prune_method()
values_prune_method prune_method()
Range limits truncate model predictions to a specific range of values, typically to avoid extreme or unrealistic predictions.
lower_limit(range = c(-Inf, Inf), trans = NULL) upper_limit(range = c(-Inf, Inf), trans = NULL)
lower_limit(range = c(-Inf, Inf), trans = NULL) upper_limit(range = c(-Inf, Inf), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
lower_limit() upper_limit()
lower_limit() upper_limit()
Setters, getters, and validators for parameter ranges.
range_validate(object, range, ukn_ok = TRUE, ..., call = caller_env()) range_get(object, original = TRUE) range_set(object, range)
range_validate(object, range, ukn_ok = TRUE, ..., call = caller_env()) range_get(object, original = TRUE) range_set(object, range)
object |
An object with class |
range |
A two-element numeric vector or list (including |
ukn_ok |
A single logical for whether |
... |
These dots are for future extensions and must be empty. |
call |
The call passed on to |
original |
A single logical. Should the range values be in the natural
units ( |
range_validate()
returns the new range if it passes the validation
process (and throws an error otherwise).
range_get()
returns the current range of the object.
range_set()
returns an updated version of the parameter object with
a new range.
library(dplyr) my_lambda <- penalty() %>% value_set(-4:-1) try( range_validate(my_lambda, c(-10, NA)), silent = TRUE ) %>% print() range_get(my_lambda) my_lambda %>% range_set(c(-10, 2)) %>% range_get()
library(dplyr) my_lambda <- penalty() %>% value_set(-4:-1) try( range_validate(my_lambda, c(-10, NA)), silent = TRUE ) %>% print() range_get(my_lambda) my_lambda %>% range_set(c(-10, 2)) %>% range_get()
Parameters related to the radial basis or other kernel functions.
rbf_sigma(range = c(-10, 0), trans = transform_log10()) scale_factor(range = c(-10, -1), trans = transform_log10()) kernel_offset(range = c(0, 2), trans = NULL)
rbf_sigma(range = c(-10, 0), trans = transform_log10()) scale_factor(range = c(-10, -1), trans = transform_log10()) kernel_offset(range = c(0, 2), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
degree()
can also be used in kernel functions.
rbf_sigma() scale_factor() kernel_offset()
rbf_sigma() scale_factor() kernel_offset()
These parameters are auxiliary to random forest models that use the "ranger"
engine. They correspond to tuning parameters that would be specified using
set_engine("ranger", ...)
.
regularization_factor(range = c(0, 1), trans = NULL) regularize_depth(values = c(TRUE, FALSE)) significance_threshold(range = c(-10, 0), trans = transform_log10()) lower_quantile(range = c(0, 1), trans = NULL) splitting_rule(values = ranger_split_rules) ranger_class_rules ranger_reg_rules ranger_split_rules num_random_splits(range = c(1L, 15L), trans = NULL)
regularization_factor(range = c(0, 1), trans = NULL) regularize_depth(values = c(TRUE, FALSE)) significance_threshold(range = c(-10, 0), trans = transform_log10()) lower_quantile(range = c(0, 1), trans = NULL) splitting_rule(values = ranger_split_rules) ranger_class_rules ranger_reg_rules ranger_split_rules num_random_splits(range = c(1L, 15L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
For |
An object of class character
of length 3.
An object of class character
of length 4.
An object of class character
of length 7.
To use these, check ?ranger::ranger
to see how they are used. Some are
conditional on others. For example, significance_threshold()
,
num_random_splits()
, and others are only used when
splitting_rule = "extratrees"
.
regularization_factor() regularize_depth()
regularization_factor() regularize_depth()
Estimation methods for regularized models
regularization_method(values = values_regularization_method) values_regularization_method
regularization_method(values = values_regularization_method) values_regularization_method
values |
A character string of possible values. See |
An object of class character
of length 4.
This parameter is used in parsnip::discrim_linear()
.
values_regularization_method regularization_method()
values_regularization_method regularization_method()
These parameters are auxiliary to tree-based models that use the "xgboost"
engine. They correspond to tuning parameters that would be specified using
set_engine("xgboost", ...)
.
scale_pos_weight(range = c(0.8, 1.2), trans = NULL) penalty_L2(range = c(-10, 1), trans = transform_log10()) penalty_L1(range = c(-10, 1), trans = transform_log10())
scale_pos_weight(range = c(0.8, 1.2), trans = NULL) penalty_L2(range = c(-10, 1), trans = transform_log10()) penalty_L1(range = c(-10, 1), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
For more information, see the xgboost webpage.
scale_pos_weight() penalty_L2() penalty_L1()
scale_pos_weight() penalty_L2() penalty_L1()
Parameters for neural network learning rate schedulers These parameters are used for constructing neural network models.
rate_initial(range = c(-3, -1), trans = transform_log10()) rate_largest(range = c(-1, -1/2), trans = transform_log10()) rate_reduction(range = c(1/5, 1), trans = NULL) rate_steps(range = c(2, 10), trans = NULL) rate_step_size(range = c(2, 20), trans = NULL) rate_decay(range = c(0, 2), trans = NULL) rate_schedule(values = values_scheduler) values_scheduler
rate_initial(range = c(-3, -1), trans = transform_log10()) rate_largest(range = c(-1, -1/2), trans = transform_log10()) rate_reduction(range = c(1/5, 1), trans = NULL) rate_steps(range = c(2, 10), trans = NULL) rate_step_size(range = c(2, 20), trans = NULL) rate_decay(range = c(0, 2), trans = NULL) rate_schedule(values = values_scheduler) values_scheduler
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
A character string of possible values. See |
An object of class character
of length 5.
These parameters are often used with neural networks via
parsnip::mlp(engine = "brulee")
.
The details for how the brulee schedulers change the rates:
schedule_decay_time()
:
schedule_decay_expo()
:
schedule_step()
:
schedule_cyclic()
: ,
, and
Used in parsnip::gen_additive_mod()
.
select_features(values = c(TRUE, FALSE))
select_features(values = c(TRUE, FALSE))
values |
A vector of possible values (TRUE or FALSE). |
select_features()
select_features()
These functions can be used to optimize engine-specific parameters of
sda::sda()
via parsnip::discrim_linear()
.
shrinkage_correlation(range = c(0, 1), trans = NULL) shrinkage_variance(range = c(0, 1), trans = NULL) shrinkage_frequencies(range = c(0, 1), trans = NULL) diagonal_covariance(values = c(TRUE, FALSE))
shrinkage_correlation(range = c(0, 1), trans = NULL) shrinkage_variance(range = c(0, 1), trans = NULL) shrinkage_frequencies(range = c(0, 1), trans = NULL) diagonal_covariance(values = c(TRUE, FALSE))
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
A vector of possible values (TRUE or FALSE). |
These functions map to sda::sda()
arguments via:
shrinkage_correlation()
to lambda
shrinkage_variance()
to lambda.var
shrinkage_frequencies()
to lambda.freqs
diagonal_covariance()
to diagonal
For the functions, they return a function with classes "param"
and
either "quant_param"
or "qual_param"
.
Used in discrim::naive_Bayes()
.
smoothness(range = c(0.5, 1.5), trans = NULL)
smoothness(range = c(0.5, 1.5), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
smoothness()
smoothness()
For some models, the effectiveness of the model can decrease as training
iterations continue. stop_iter()
can be used to tune how many iterations
without an improvement in the objective function occur before training should
be halted.
stop_iter(range = c(3L, 20L), trans = NULL)
stop_iter(range = c(3L, 20L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
stop_iter()
stop_iter()
This parameter is used in recipes::step_window()
.
summary_stat(values = values_summary_stat) values_summary_stat
summary_stat(values = values_summary_stat) values_summary_stat
values |
A character string of possible values. See |
An object of class character
of length 8.
values_summary_stat summary_stat()
values_summary_stat summary_stat()
Parametric distributions for censored data
surv_dist(values = values_surv_dist) values_surv_dist
surv_dist(values = values_surv_dist) values_surv_dist
values |
A character string of possible values. See |
An object of class character
of length 6.
This parameter is used in parsnip::survival_reg()
.
values_surv_dist surv_dist()
values_surv_dist surv_dist()
Survival Model Link Function
survival_link(values = values_survival_link) values_survival_link
survival_link(values = values_survival_link) values_survival_link
values |
A character string of possible values.
See |
An object of class character
of length 3.
This parameter is used in parsnip::set_engine('flexsurvspline')
.
values_survival_link survival_link()
values_survival_link survival_link()
For uwot::umap()
and embed::step_umap()
, this is a weighting factor
between data topology and target topology. A value of 0.0 weights entirely
on data, a value of 1.0 weights entirely on target. The default of 0.5
balances the weighting equally between data and target.
target_weight(range = c(0, 1), trans = NULL)
target_weight(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
This parameter is used in recipes
via embed::step_umap()
.
target_weight()
target_weight()
In a number of cases, there are arguments that are threshold values for
data falling between zero and one. For example, recipes::step_other()
and
so on.
threshold(range = c(0, 1), trans = NULL)
threshold(range = c(0, 1), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
threshold()
threshold()
Token types
token(values = values_token) values_token
token(values = values_token) values_token
values |
A character string of possible values. See |
An object of class character
of length 12.
This parameter is used in textrecipes::step_tokenize()
.
values_token token()
values_token token()
These are parameter generating functions that can be used for modeling, especially in conjunction with the parsnip package.
trees(range = c(1L, 2000L), trans = NULL) min_n(range = c(2L, 40L), trans = NULL) sample_size(range = c(unknown(), unknown()), trans = NULL) sample_prop(range = c(1/10, 1), trans = NULL) loss_reduction(range = c(-10, 1.5), trans = transform_log10()) tree_depth(range = c(1L, 15L), trans = NULL) prune(values = c(TRUE, FALSE)) cost_complexity(range = c(-10, -1), trans = transform_log10())
trees(range = c(1L, 2000L), trans = NULL) min_n(range = c(2L, 40L), trans = NULL) sample_size(range = c(unknown(), unknown()), trans = NULL) sample_prop(range = c(1/10, 1), trans = NULL) loss_reduction(range = c(-10, 1.5), trans = transform_log10()) tree_depth(range = c(1L, 15L), trans = NULL) prune(values = c(TRUE, FALSE)) cost_complexity(range = c(-10, -1), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
values |
A vector of possible values ( |
These functions generate parameters that are useful when the model is based on trees or rules.
trees()
: The number of trees contained in a random forest or boosted
ensemble. In the latter case, this is equal to the number of boosting
iterations. (See parsnip::rand_forest()
and parsnip::boost_tree()
).
min_n()
: The minimum number of data points in a node that is required
for the node to be split further. (See parsnip::rand_forest()
and
parsnip::boost_tree()
).
sample_size()
: The size of the data set used for modeling within an
iteration of the modeling algorithm, such as stochastic gradient boosting.
(See parsnip::boost_tree()
).
sample_prop()
: The same as sample_size()
but as a proportion of the
total sample.
loss_reduction()
: The reduction in the loss function required to split
further. (See parsnip::boost_tree()
). This corresponds to gamma
in
xgboost.
tree_depth()
: The maximum depth of the tree (i.e. number of splits).
(See parsnip::boost_tree()
).
prune()
: A logical for whether a tree or set of rules should be pruned.
cost_complexity()
: The cost-complexity parameter in classical CART models.
trees() min_n() sample_size() loss_reduction() tree_depth() prune() cost_complexity()
trees() min_n() sample_size() loss_reduction() tree_depth() prune() cost_complexity()
Used in recipes::step_impute_mean()
.
trim_amount(range = c(0, 0.5), trans = NULL)
trim_amount(range = c(0, 0.5), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
trim_amount()
trim_amount()
unknown()
creates an expression used to signify that the value will be
specified at a later time.
unknown() is_unknown(x) has_unknowns(object)
unknown() is_unknown(x) has_unknowns(object)
x |
An object or vector or objects to test for unknown-ness. |
object |
An object of class |
unknown()
returns expression value for unknown()
.
is_unknown()
returns a vector of logicals as long as x
that are TRUE
is the element of x
is unknown, and FALSE
otherwise.
has_unknowns()
returns a single logical indicating if the range
of a param
object has any unknown values.
# Just returns an expression unknown() # Of course, true! is_unknown(unknown()) # Create a range with a minimum of 1 # and an unknown maximum range <- c(1, unknown()) range # The first value is known, the # second is not is_unknown(range) # mtry()'s maximum value is not known at # creation time has_unknowns(mtry())
# Just returns an expression unknown() # Of course, true! is_unknown(unknown()) # Create a range with a minimum of 1 # and an unknown maximum range <- c(1, unknown()) range # The first value is known, the # second is not is_unknown(range) # mtry()'s maximum value is not known at # creation time has_unknowns(mtry())
Update a single parameter in a parameter set
## S3 method for class 'parameters' update(object, ...)
## S3 method for class 'parameters' update(object, ...)
object |
A parameter set. |
... |
One or more unquoted named values separated by commas. The names
should correspond to the |
The modified parameter set.
params <- list(lambda = penalty(), alpha = mixture(), `rand forest` = mtry()) pset <- parameters(params) pset update(pset, `rand forest` = finalize(mtry(), mtcars), alpha = mixture(c(.1, .2)))
params <- list(lambda = penalty(), alpha = mixture(), `rand forest` = mtry()) pset <- parameters(params) pset update(pset, `rand forest` = finalize(mtry(), mtcars), alpha = mixture(c(.1, .2)))
Used in embed::step_discretize_xgb()
.
validation_set_prop(range = c(0.05, 0.7), trans = NULL)
validation_set_prop(range = c(0.05, 0.7), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
validation_set_prop()
validation_set_prop()
Setters and validators for parameter values. Additionally, tools for creating sequences of parameter values and for transforming parameter values are provided.
value_validate(object, values, ..., call = caller_env()) value_seq(object, n, original = TRUE) value_sample(object, n, original = TRUE) value_transform(object, values) value_inverse(object, values) value_set(object, values)
value_validate(object, values, ..., call = caller_env()) value_seq(object, n, original = TRUE) value_sample(object, n, original = TRUE) value_transform(object, values) value_inverse(object, values) value_set(object, values)
object |
An object with class |
values |
A numeric vector or list (including |
... |
These dots are for future extensions and must be empty. |
call |
The call passed on to |
n |
An integer for the (maximum) number of values to return. In some
cases where a sequence is requested, the result might have less than |
original |
A single logical. Should the range values be in the natural
units ( |
For sequences of integers, the code uses
unique(floor(seq(min, max, length.out = n)))
and this may generate an
uneven set of values shorter than n
. This also means that if n
is larger
than the range of the integers, a smaller set will be generated. For
qualitative parameters, the first n
values are returned.
For quantitative parameters, any values
contained in the object
are sampled with replacement. Otherwise, a sequence of values
between the range
values is returned. It is possible that less
than n
values are returned.
For qualitative parameters, sampling of the values
is conducted
with replacement. For qualitative values, a random uniform distribution
is used.
value_validate()
throws an error or silently returns values
if they are
contained in the values of the object
.
value_transform()
and value_inverse()
return a vector of
numeric values.
value_seq()
and value_sample()
return a vector of values consistent
with the type
field of object
.
library(dplyr) penalty() %>% value_set(-4:-1) # Is a specific value valid? penalty() penalty() %>% range_get() value_validate(penalty(), 17) # get a sequence of values cost_complexity() cost_complexity() %>% value_seq(4) cost_complexity() %>% value_seq(4, original = FALSE) on_log_scale <- cost_complexity() %>% value_seq(4, original = FALSE) nat_units <- value_inverse(cost_complexity(), on_log_scale) nat_units value_transform(cost_complexity(), nat_units) # random values in the range set.seed(3666) cost_complexity() %>% value_sample(2)
library(dplyr) penalty() %>% value_set(-4:-1) # Is a specific value valid? penalty() penalty() %>% range_get() value_validate(penalty(), 17) # get a sequence of values cost_complexity() cost_complexity() %>% value_seq(4) cost_complexity() %>% value_seq(4, original = FALSE) on_log_scale <- cost_complexity() %>% value_seq(4, original = FALSE) nat_units <- value_inverse(cost_complexity(), on_log_scale) nat_units value_transform(cost_complexity(), nat_units) # random values in the range set.seed(3666) cost_complexity() %>% value_sample(2)
Used in textrecipes::step_tokenize_sentencepiece()
and
textrecipes::step_tokenize_bpe()
.
vocabulary_size(range = c(1000L, 32000L), trans = NULL)
vocabulary_size(range = c(1000L, 32000L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
vocabulary_size()
vocabulary_size()
"double normalization"
when creating token countsUsed in textrecipes::step_tf()
.
weight(range = c(-10, 0), trans = transform_log10())
weight(range = c(-10, 0), trans = transform_log10())
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
weight()
weight()
Kernel functions for distance weighting
weight_func(values = values_weight_func) values_weight_func
weight_func(values = values_weight_func) values_weight_func
values |
A character string of possible values. See |
An object of class character
of length 10.
This parameter is used in parsnip:::nearest_neighbors()
.
values_weight_func weight_func()
values_weight_func weight_func()
Term frequency weighting methods
weight_scheme(values = values_weight_scheme) values_weight_scheme
weight_scheme(values = values_weight_scheme) values_weight_scheme
values |
A character string of possible values. See |
An object of class character
of length 5.
This parameter is used in textrecipes::step_tf()
.
values_weight_scheme weight_scheme()
values_weight_scheme weight_scheme()
Used in recipes::step_window()
and recipes::step_impute_roll()
.
window_size(range = c(3L, 11L), trans = NULL)
window_size(range = c(3L, 11L), trans = NULL)
range |
A two-element vector holding the defaults for the smallest and largest possible values, respectively. If a transformation is specified, these values should be in the transformed units. |
trans |
A |
window_size()
window_size()