--- title: "Random Forest" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Random Forest} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r pre, include = FALSE} if (!rlang::is_installed("randomForest")) { knitr::opts_chunk$set( eval = FALSE ) } ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(tidypredict) library(randomForest) library(parsnip) set.seed(100) ``` | Function |Works| |---------------------------------------------------------------|-----| |`tidypredict_fit()`, `tidypredict_sql()`, `parse_model()` | ✔ | |`tidypredict_to_column()` | ✗ | |`tidypredict_test()` | ✗ | |`tidypredict_interval()`, `tidypredict_sql_interval()` | ✗ | |`parsnip` | ✔ | ## How it works Here is a simple `randomForest()` model using the `iris` dataset: ```{r} library(dplyr) library(tidypredict) library(randomForest) model <- randomForest(Species ~ ., data = iris, ntree = 100, proximity = TRUE) ``` ## Under the hood The parser is based on the output from the `randomForest::getTree()` function. It will return as many decision paths as there are non-NA rows in the `prediction` field. ```{r} getTree(model, labelVar = TRUE) %>% head() ``` The output from `parse_model()` is transformed into a `dplyr`, a.k.a Tidy Eval, formula. The entire decision tree becomes one `dplyr::case_when()` statement ```{r} tidypredict_fit(model)[1] ``` From there, the Tidy Eval formula can be used anywhere where it can be operated. `tidypredict` provides three paths: - Use directly inside `dplyr`, `mutate(iris, !! tidypredict_fit(model))` - Use `tidypredict_to_column(model)` to a piped command set - Use `tidypredict_to_sql(model)` to retrieve the SQL statement ## parsnip `tidypredict` also supports `randomForest` model objects fitted via the `parsnip` package. ```{r} library(parsnip) parsnip_model <- rand_forest(mode = "classification") %>% set_engine("randomForest") %>% fit(Species ~ ., data = iris) tidypredict_fit(parsnip_model)[[1]] ```