Title: | Run Predictions Inside the Database |
---|---|
Description: | It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models. |
Authors: | Edgar Ruiz [aut, cre], Max Kuhn [aut] |
Maintainer: | Edgar Ruiz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5 |
Built: | 2024-11-18 03:56:00 UTC |
Source: | https://github.com/tidymodels/tidypredict |
Uses an S3 method to check that a given formula can be parsed based on its class. It currently scans for contrasts that are not supported and in-line functions. (e.g: lm(wt ~ as.factor(am))). Since this function is meant for function interaction, as opposed to human interaction, a successful check is silent.
acceptable_formula(model)
acceptable_formula(model)
model |
An R model object |
model <- lm(mpg ~ wt, mtcars) acceptable_formula(model)
model <- lm(mpg ~ wt, mtcars) acceptable_formula(model)
Prepares parsed model object
as_parsed_model(x)
as_parsed_model(x)
x |
A parsed model object |
It parses a fitted R model's structure and extracts the components needed to create a dplyr formula for prediction. The function also creates a data frame using a specific format so that other functions in the future can also pass parsed tables to a given formula creating function.
parse_model(model)
parse_model(model)
model |
An R model object. |
library(dplyr) df <- mutate(mtcars, cyl = paste0("cyl", cyl)) model <- lm(mpg ~ wt + cyl * disp, offset = am, data = df) parse_model(model)
library(dplyr) df <- mutate(mtcars, cyl = paste0("cyl", cyl)) model <- lm(mpg ~ wt + cyl * disp, offset = am, data = df) parse_model(model)
Tidy the parsed model results
## S3 method for class 'pm_regression' tidy(x, ...)
## S3 method for class 'pm_regression' tidy(x, ...)
x |
A parsed_model object |
... |
Reserved for future use |
It parses a model or uses an already parsed model to return a Tidy Eval formula that can then be used inside a dplyr command.
tidypredict_fit(model)
tidypredict_fit(model)
model |
An R model or a list with a parsed model. |
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_fit(model)
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_fit(model)
It parses a model or uses an already parsed model to return a Tidy Eval formula that can then be used inside a dplyr command.
tidypredict_interval(model, interval = 0.95)
tidypredict_interval(model, interval = 0.95)
model |
An R model or a list with a parsed model |
interval |
The prediction interval, defaults to 0.95 |
The result still has to be added to and subtracted from the fit to obtain the upper and lower bound respectively.
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_interval(model)
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_interval(model)
Compares the results of predict() and tidypredict_to_column() functions.
tidypredict_test( model, df = model$model, threshold = 1e-12, include_intervals = FALSE, max_rows = NULL, xg_df = NULL )
tidypredict_test( model, df = model$model, threshold = 1e-12, include_intervals = FALSE, max_rows = NULL, xg_df = NULL )
model |
An R model or a list with a parsed model. It currently supports lm(), glm() and randomForest() models. |
df |
A data frame that contains all of the needed fields to run the prediction. It defaults to the "model" data frame object inside the model object. |
threshold |
The number that a given result difference, between predict() and tidypredict_to_column() should not exceed. For continuous predictions, the default value is 0.000000000001 (1e-12), and for categorical predictions, the default value is 0. |
include_intervals |
Switch to indicate if the prediction intervals should be included in the test. It defaults to FALSE. |
max_rows |
The number of rows in the object passed in the df argument. Highly recommended for large data sets. |
xg_df |
A xgb.DMatrix object, required only for XGBoost models. It defaults to NULL recommended for large data sets. |
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_test(model)
model <- lm(mpg ~ wt + cyl * disp, offset = am, data = mtcars) tidypredict_test(model)
Adds a new column with the results from tidypredict_fit() to a piped command set. If add_interval is set to TRUE, it will add two additional columns- one for the lower and another for the upper prediction interval bounds.
tidypredict_to_column( df, model, add_interval = FALSE, interval = 0.95, vars = c("fit", "upper", "lower") )
tidypredict_to_column( df, model, add_interval = FALSE, interval = 0.95, vars = c("fit", "upper", "lower") )
df |
A data.frame or tibble |
model |
An R model or a parsed model inside a data frame |
add_interval |
Switch that indicates if the prediction interval columns should be added. Defaults to FALSE |
interval |
The prediction interval, defaults to 0.95. Ignored if add_interval is set to FALSE |
vars |
The name of the variables that this function will produce. Defaults to "fit", "upper", and "lower". |