Title: | Desirability Functions for Multiparameter Optimization |
---|---|
Description: | In-line functions for multivariate optimization via desirability functions (Derringer and Suich, 1980, <doi:10.1080/00224065.1980.11980968>) with easy use within `dplyr` pipelines. |
Authors: | Max Kuhn [aut, cre] , Posit Software, PBC [cph, fnd] |
Maintainer: | Max Kuhn <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1.9000 |
Built: | 2024-11-22 06:08:12 UTC |
Source: | https://github.com/tidymodels/desirability2 |
These data are a variation of a case study at tidymodels.org
where a
penalized regression model was used for a binary classification task. The
outcome metrics in classification_results
are the areas under the ROC and
PR curve, log-likelihood, and the number of predictors selected for a given
amount of penalization. Two tuning parameters, mixture
and penalty
, were
varied across 300 conditions.
classification_results |
a tibble |
See the example-data
directory in the package with code that is a variation
of the analysis shown at https://www.tidymodels.org/start/case-study/.
data(classification_results)
data(classification_results)
Once desirability columns have been created, determine the overall desirability using a mean (geometric by default).
d_overall(..., geometric = TRUE, tolerance = 0)
d_overall(..., geometric = TRUE, tolerance = 0)
... |
One or more unquoted expressions separated by commas. To choose
multiple columns using selectors, |
geometric |
A logical for whether the geometric or arithmetic mean should be used to summarize the columns. |
tolerance |
A numeric value where values strictly less than this value are capped at the value. For example, if users wish to use the geometric mean without completely excluding settings, a value greater than zero can be used. |
A numeric vector.
library(dplyr) # Choose model tuning parameters that minimize the number of predictors used # while maximizing the area under the ROC curve. classification_results %>% mutate( d_feat = d_min(num_features, 1, 200), d_roc = d_max(roc_auc, 0.5, 0.9), d_all = d_overall(across(starts_with("d_"))) ) %>% arrange(desc(d_all)) # Bias the ranking toward minimizing features by using a larger scale. classification_results %>% mutate( d_feat = d_min(num_features, 1, 200, scale = 3), d_roc = d_max(roc_auc, 0.5, 0.9), d_all = d_overall(across(starts_with("d_"))) ) %>% arrange(desc(d_all))
library(dplyr) # Choose model tuning parameters that minimize the number of predictors used # while maximizing the area under the ROC curve. classification_results %>% mutate( d_feat = d_min(num_features, 1, 200), d_roc = d_max(roc_auc, 0.5, 0.9), d_all = d_overall(across(starts_with("d_"))) ) %>% arrange(desc(d_all)) # Bias the ranking toward minimizing features by using a larger scale. classification_results %>% mutate( d_feat = d_min(num_features, 1, 200, scale = 3), d_roc = d_max(roc_auc, 0.5, 0.9), d_all = d_overall(across(starts_with("d_"))) ) %>% arrange(desc(d_all))
Desirability functions map some input to a [0, 1]
scale where zero is
unacceptable and one is most desirable. The mapping depends on the situation.
For example, d_max()
increases desirability with the input while d_min()
does the opposite. See the plots in the examples to see more examples.
Currently, only the desirability functions defined by Derringer and Suich (1980) are implemented.
d_max(x, low, high, scale = 1, missing = NA_real_, use_data = FALSE) d_min(x, low, high, scale = 1, missing = NA_real_, use_data = FALSE) d_target( x, low, target, high, scale_low = 1, scale_high = 1, missing = NA_real_, use_data = FALSE ) d_box(x, low, high, missing = NA_real_, use_data = FALSE) d_custom(x, x_vals, desirability, missing = NA_real_) d_category(x, categories, missing = NA_real_)
d_max(x, low, high, scale = 1, missing = NA_real_, use_data = FALSE) d_min(x, low, high, scale = 1, missing = NA_real_, use_data = FALSE) d_target( x, low, target, high, scale_low = 1, scale_high = 1, missing = NA_real_, use_data = FALSE ) d_box(x, low, high, missing = NA_real_, use_data = FALSE) d_custom(x, x_vals, desirability, missing = NA_real_) d_category(x, categories, missing = NA_real_)
x |
A vector of data to compute the desirability function |
low , high , target
|
Single numeric values that define the active ranges of desirability. |
scale , scale_low , scale_high
|
A single numeric value to rescale the
desirability function (each should be great than 0.0). Values >1.0 make the
desirability more difficult to satisfy while smaller values make it easier
(see the examples below). |
missing |
A single numeric value on |
use_data |
Should the low, middle, and/or high values be derived from
the data ( |
x_vals , desirability
|
Numeric vectors of the same length that define the
desirability results at specific values of |
categories |
A named vector of desirability values that match all
possible categories to specific desirability values. Data that are not
included in |
Each function translates the values to desirability on [0, 1]
.
data > high
: d = 1.0
data < low
: d = 0.0
low <= data <= high
:
data > high
: d = 0.0
data < low
: d = 1.0
low <= data <= high
:
data > high
: d = 0.0
data < low
: d = 0.0
low <= data <= target
:
target <= data <= high
:
data > high
: d = 0.0
data < low
: d = 0.0
low <= data <= high
: d = 1.0
data = level
: d = 1.0
data != level
: d = 0.0
For the sequence of values given to the function, d_custom()
will return
the desirability values that correspond to data matching values in x_vals
.
Otherwise, linear interpolation is used for values in-between.
By default, most of the d_*()
functions require specific user inputs for
arguments such as low
, target
and high
. When use_data = TRUE
, the
functions can use the minimum, median, and maximum values of the existing
data to estimate those values (respectively) but only when users do not
specify them.
A numeric vector on [0, 1]
where larger values are more
desirable.
Derringer, G. and Suich, R. (1980), Simultaneous Optimization of Several Response Variables. Journal of Quality Technology, 12, 214-219.
library(dplyr) library(ggplot2) set.seed(1) dat <- tibble(x = sort(runif(30)), y = sort(runif(30))) d_max(dat$x[1:10], 0.1, 0.75) dat %>% mutate(d_x = d_max(x, 0.1, 0.75)) set.seed(2) tibble(z = sort(runif(100))) %>% mutate( no_scale = d_max(z, 0.1, 0.75), easier = d_max(z, 0.1, 0.75, scale = 1/2) ) %>% ggplot(aes(x = z)) + geom_point(aes(y = no_scale)) + geom_line(aes(y = no_scale), alpha = .5) + geom_point(aes(y = easier), col = "blue") + geom_line(aes(y = easier), col = "blue", alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Target example dat %>% mutate( triangle = d_target(x, 0.1, 0.5, 0.9, scale_low = 2, scale_high = 1/2) ) %>% ggplot(aes(x = x, y = triangle)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Box constraints dat %>% mutate(box = d_box(x, 1/4, 3/4)) %>% ggplot(aes(x = x, y = box)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Custom function v_x <- seq(0, 1, length.out = 20) v_d <- 1 - exp(-10 * abs(v_x - .5)) dat %>% mutate(v = d_custom(x, v_x, v_d)) %>% ggplot(aes(x = x, y = v)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Qualitative data set.seed(3) groups <- sort(runif(10)) names(groups) <- letters[1:10] tibble(x = letters[1:7]) %>% mutate(d = d_category(x, groups)) %>% ggplot(aes(x = x, y = d)) + geom_bar(stat = "identity") + lims(y = 0:1) + ylab("Desirability") # ------------------------------------------------------------------------------ # Apply the same function to many columns at once (dplyr > 1.0) dat %>% mutate(across(c(everything()), ~ d_min(., .2, .6), .names = "d_{col}")) # ------------------------------------------------------------------------------ # Using current data set.seed(9015) tibble(z = c(0, sort(runif(20)), 1)) %>% mutate( user_specified = d_max(z, 0.1, 0.75), data_driven = d_max(z, use_data = TRUE) ) %>% ggplot(aes(x = z)) + geom_point(aes(y = user_specified)) + geom_line(aes(y = user_specified), alpha = .5) + geom_point(aes(y = data_driven), col = "blue") + geom_line(aes(y = data_driven), col = "blue", alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability")
library(dplyr) library(ggplot2) set.seed(1) dat <- tibble(x = sort(runif(30)), y = sort(runif(30))) d_max(dat$x[1:10], 0.1, 0.75) dat %>% mutate(d_x = d_max(x, 0.1, 0.75)) set.seed(2) tibble(z = sort(runif(100))) %>% mutate( no_scale = d_max(z, 0.1, 0.75), easier = d_max(z, 0.1, 0.75, scale = 1/2) ) %>% ggplot(aes(x = z)) + geom_point(aes(y = no_scale)) + geom_line(aes(y = no_scale), alpha = .5) + geom_point(aes(y = easier), col = "blue") + geom_line(aes(y = easier), col = "blue", alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Target example dat %>% mutate( triangle = d_target(x, 0.1, 0.5, 0.9, scale_low = 2, scale_high = 1/2) ) %>% ggplot(aes(x = x, y = triangle)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Box constraints dat %>% mutate(box = d_box(x, 1/4, 3/4)) %>% ggplot(aes(x = x, y = box)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Custom function v_x <- seq(0, 1, length.out = 20) v_d <- 1 - exp(-10 * abs(v_x - .5)) dat %>% mutate(v = d_custom(x, v_x, v_d)) %>% ggplot(aes(x = x, y = v)) + geom_point() + geom_line(alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability") # ------------------------------------------------------------------------------ # Qualitative data set.seed(3) groups <- sort(runif(10)) names(groups) <- letters[1:10] tibble(x = letters[1:7]) %>% mutate(d = d_category(x, groups)) %>% ggplot(aes(x = x, y = d)) + geom_bar(stat = "identity") + lims(y = 0:1) + ylab("Desirability") # ------------------------------------------------------------------------------ # Apply the same function to many columns at once (dplyr > 1.0) dat %>% mutate(across(c(everything()), ~ d_min(., .2, .6), .names = "d_{col}")) # ------------------------------------------------------------------------------ # Using current data set.seed(9015) tibble(z = c(0, sort(runif(20)), 1)) %>% mutate( user_specified = d_max(z, 0.1, 0.75), data_driven = d_max(z, use_data = TRUE) ) %>% ggplot(aes(x = z)) + geom_point(aes(y = user_specified)) + geom_line(aes(y = user_specified), alpha = .5) + geom_point(aes(y = data_driven), col = "blue") + geom_line(aes(y = data_driven), col = "blue", alpha = .5) + lims(x = 0:1, y = 0:1) + coord_fixed() + ylab("Desirability")