---
title: "Random Forest"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Random Forest}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r pre, include = FALSE}
if (!rlang::is_installed("randomForest")) {
knitr::opts_chunk$set(
eval = FALSE
)
}
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(tidypredict)
library(randomForest)
library(parsnip)
set.seed(100)
```
| Function |Works|
|---------------------------------------------------------------|-----|
|`tidypredict_fit()`, `tidypredict_sql()`, `parse_model()` | ✔ |
|`tidypredict_to_column()` | ✗ |
|`tidypredict_test()` | ✗ |
|`tidypredict_interval()`, `tidypredict_sql_interval()` | ✗ |
|`parsnip` | ✔ |
## How it works
Here is a simple `randomForest()` model using the `iris` dataset:
```{r}
library(dplyr)
library(tidypredict)
library(randomForest)
model <- randomForest(Species ~ ., data = iris, ntree = 100, proximity = TRUE)
```
## Under the hood
The parser is based on the output from the `randomForest::getTree()` function. It will return as many decision paths as there are non-NA rows in the `prediction` field.
```{r}
getTree(model, labelVar = TRUE) %>%
head()
```
The output from `parse_model()` is transformed into a `dplyr`, a.k.a Tidy Eval, formula. The entire decision tree becomes one `dplyr::case_when()` statement
```{r}
tidypredict_fit(model)[1]
```
From there, the Tidy Eval formula can be used anywhere where it can be operated. `tidypredict` provides three paths:
- Use directly inside `dplyr`, `mutate(iris, !! tidypredict_fit(model))`
- Use `tidypredict_to_column(model)` to a piped command set
- Use `tidypredict_to_sql(model)` to retrieve the SQL statement
## parsnip
`tidypredict` also supports `randomForest` model objects fitted via the `parsnip` package.
```{r}
library(parsnip)
parsnip_model <- rand_forest(mode = "classification") %>%
set_engine("randomForest") %>%
fit(Species ~ ., data = iris)
tidypredict_fit(parsnip_model)[[1]]
```