Package 'tailor'

Title: Iterative Steps for Postprocessing Model Predictions
Description: Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations. This package introduces 'tailor' objects, which compose iterative adjustments to model predictions. A number of pre-written adjustments are provided with the package, like calibration and equivocal zones, as well as utilities to compose new ones. Tailors are tightly integrated with the 'tidymodels' framework.
Authors: Simon Couch [aut], Hannah Frick [aut], Emil HvitFeldt [aut], Max Kuhn [aut, cre], Posit Software, PBC [cph, fnd]
Maintainer: Max Kuhn <[email protected]>
License: MIT + file LICENSE
Version: 0.0.0.9001
Built: 2024-09-24 16:18:15 UTC
Source: https://github.com/tidymodels/tailor

Help Index


Apply an equivocal zone to a binary classification model.

Description

Equivocal zones describe intervals of predicted probabilities that are deemed too uncertain or ambiguous to be assigned a hard class. Rather than predicting a hard class when the probability is very close to a threshold, tailors using this adjustment predict "[EQ]".

Usage

adjust_equivocal_zone(x, value = 0.1, threshold = 1/2)

Arguments

x

A tailor().

value

A numeric value (between zero and 1/2) or hardhat::tune(). The value is the size of the buffer around the threshold.

threshold

A numeric value (between zero and one) or hardhat::tune().

Data Usage

This adjustment doesn't require estimation and, as such, the same data that's used to train it with fit() can be predicted on with predict(); fitting this adjustment just collects metadata on the supplied column names and does not risk data leakage.

Examples

library(dplyr)
library(modeldata)

head(two_class_example)

# `predicted` gives hard class predictions based on probabilities
two_class_example %>% count(predicted)

# when probabilities are within (.25, .75), consider them equivocal
tlr <-
  tailor() %>%
  adjust_equivocal_zone(value = 1 / 4)

tlr

# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

tlr_fit

# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% count(predicted)

Re-calibrate numeric predictions

Description

Calibration for regression models involves adjusting the model's predictions to adjust for correlated errors, ensuring that predicted values align closely with actual observed values across the entire range of outputs.

Usage

adjust_numeric_calibration(x, method = NULL)

Arguments

x

A tailor().

method

Character. One of "linear", "isotonic", or "isotonic_boot", corresponding to the function from the probably package probably::cal_estimate_linear(), probably::cal_estimate_isotonic(), or probably::cal_estimate_isotonic_boot(), respectively.

Data Usage

This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions. See the section by the same name in ?workflows::add_tailor() for more information on preventing data leakage with postprocessors that require estimation. When situated in a workflow, tailors will automatically be estimated with appropriate subsets of data.

Examples

library(tibble)

# create example data
set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

d_calibration

# specify calibration
tlr <-
  tailor() %>%
  adjust_numeric_calibration(method = "linear")

# train tailor on a subset of data. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(tlr, d_calibration, outcome = y, estimate = y_pred)

# apply to predictions on another subset of data
d_test

predict(tlr_fit, d_test)

Truncate the range of numeric predictions

Description

Truncating ranges involves limiting the output of a model to a specific range of values, typically to avoid extreme or unrealistic predictions. This technique can help improve the practical applicability of a model's outputs by constraining them within reasonable bounds based on domain knowledge or physical limitations.

Usage

adjust_numeric_range(x, lower_limit = -Inf, upper_limit = Inf)

Arguments

x

A tailor().

upper_limit, lower_limit

A numeric value, NA (for no truncation) or hardhat::tune().

Data Usage

This adjustment doesn't require estimation and, as such, the same data that's used to train it with fit() can be predicted on with predict(); fitting this adjustment just collects metadata on the supplied column names and does not risk data leakage.

Examples

library(tibble)

# create example data
set.seed(1)
d <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d

# specify calibration
tlr <-
  tailor() %>%
  adjust_numeric_range(lower_limit = 1)

# train tailor by passing column names. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(tlr, d, outcome = y, estimate = y_pred)

predict(tlr_fit, d)

Change or add variables

Description

This adjustment functions allows for arbitrary transformations of model predictions using dplyr::mutate() statements.

Usage

adjust_predictions_custom(x, ..., .pkgs = character(0))

Arguments

x

A tailor().

...

Name-value pairs of expressions. See dplyr::mutate().

.pkgs

A character string of extra packages that are needed to execute the commands.

Data Usage

This adjustment doesn't require estimation and, as such, the same data that's used to train it with fit() can be predicted on with predict(); fitting this adjustment just collects metadata on the supplied column names and does not risk data leakage.

Examples

library(modeldata)

head(two_class_example)

tlr <-
  tailor() %>%
  adjust_equivocal_zone() %>%
  adjust_predictions_custom(linear_predictor = binomial()$linkfun(Class2))

tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

predict(tlr_fit, two_class_example) %>% head()

Re-calibrate classification probability predictions

Description

Calibration is the process of adjusting a model's outputted probabilities to match the observed frequencies of events. This technique aims to ensure that when a model predicts a certain probability for an outcome, that probability accurately reflects the true likelihood of that outcome occurring.

Usage

adjust_probability_calibration(x, method = NULL)

Arguments

x

A tailor().

method

Character. One of "logistic", "multinomial", "beta", "isotonic", or "isotonic_boot", corresponding to the function from the probably package probably::cal_estimate_logistic(), probably::cal_estimate_multinomial(), etc., respectively.

Data Usage

This adjustment requires estimation and, as such, different subsets of data should be used to train it and evaluate its predictions. See the section by the same name in ?workflows::add_tailor() for more information on preventing data leakage with postprocessors that require estimation. When situated in a workflow, tailors will automatically be estimated with appropriate subsets of data.

Examples

library(modeldata)

# split example data
set.seed(1)
in_rows <- sample(c(TRUE, FALSE), nrow(two_class_example), replace = TRUE)
d_calibration <- two_class_example[in_rows, ]
d_test <- two_class_example[!in_rows, ]

head(d_calibration)

# specify calibration
tlr <-
  tailor() %>%
  adjust_probability_calibration(method = "logistic")

# train tailor on a subset of data. situate in a modeling workflow with
# `workflows::add_tailor()` to avoid having to specify column names manually
tlr_fit <- fit(
  tlr,
  d_calibration,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

# apply to predictions on another subset of data
head(d_test)

predict(tlr_fit, d_test)

Change the event threshold

Description

Many machine learning systems determine hard class predictions by first predicting the probability of an event and then predicting that an event will occur if its respective probability is above 0.5. This adjustment allows practitioners to determine hard class predictions using a threshold other than 0.5. By setting appropriate thresholds, one can balance the trade-off between different types of errors (such as false positives and false negatives) to optimize the model's performance for specific use cases.

Usage

adjust_probability_threshold(x, threshold = 0.5)

Arguments

x

A tailor().

threshold

A numeric value (between zero and one) or hardhat::tune().

Data Usage

This adjustment doesn't require estimation and, as such, the same data that's used to train it with fit() can be predicted on with predict(); fitting this adjustment just collects metadata on the supplied column names and does not risk data leakage.

Examples

library(modeldata)

# `predicted` gives hard class predictions based on probability threshold .5
head(two_class_example)

# use a threshold of .1 instead:
tlr <-
  tailor() %>%
  adjust_probability_threshold(.1)

# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% head()

Declare post-processing for model predictions

Description

Tailors compose iterative adjustments to model predictions. After initializing a tailor with this function, add adjustment specifications with ⁠adjust_*()⁠ functions:

For ad-hoc adjustments, see adjust_predictions_custom().

Tailors must be trained with fit() before being applied to new data with predict(). Tailors are tightly integrated with the tidymodels framework; for greatest ease of use, situate tailors in model workflows with ?workflows::add_tailor().

Usage

tailor(outcome = NULL, estimate = NULL, probabilities = NULL)

Arguments

outcome

<tidy-select> Only required when used independently of ?workflows::add_tailor(), and can also be passed at fit() time instead. The column name of the outcome variable.

estimate

<tidy-select> Only required when used independently of ?workflows::add_tailor(), and can also be passed at fit() time instead. The column name of the point estimate (e.g. predicted class), In tidymodels, this corresponds to column names .pred, .pred_class, or .pred_time.

probabilities

<tidy-select> Only required when used independently of ?workflows::add_tailor() for the "binary" or "multiclass" types, and can also be passed at fit() time instead. The column names of class probability estimates. These should be given in the order of the factor levels of the estimate.

Examples

library(dplyr)
library(modeldata)

# `predicted` gives hard class predictions based on probabilities
two_class_example %>% count(predicted)

# change the probability threshold to allot one class vs the other
tlr <-
  tailor() %>%
  adjust_probability_threshold(threshold = .1)

tlr

# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

tlr_fit

# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% count(predicted)