A model that is trained in any
language are able to integrate with tidypredict
, and thus
with broom
. The requirement is that the model in that
language is exported using the parse model spec. The easiest file format
would be YAML.
A model that was fitted using sklearn
’s
linear_model
. The model is based on diabetes data. Ten
baseline variables, age, sex, body mass index, average blood pressure,
and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a quantitative
measure of disease progression one year after baseline. The model’s
results were converted to YAML by the same python script, I copied and
pasted the top part here:
general:
is_glm: 0
model: lm
residual: 0
sigma2: 0
type: regression
version: 2.0
terms:
- coef: 152.76430691633442
fields:
- col: (Intercept)
type: ordinary
is_intercept: 1
label: (Intercept)
The YAML data can be read in R by using the yaml
package. In this example, we have copy-pasted most of the models inside
a variable called sklearn_model
. Because yaml
requires local YAML variables to be split by line, we use
strsplit()
.
library(yaml)
sklearn_model <- strsplit("general:
is_glm: 0
model: lm
residual: 0
sigma2: 0
type: regression
version: 2.0
terms:
- coef: 152.76430691633442
fields:
- col: (Intercept)
type: ordinary
is_intercept: 1
label: (Intercept)
- coef: 0.3034995490660432
fields:
- col: age
type: ordinary
is_intercept: 0
label: age
- coef: -237.63931533353403
fields:
- col: sex
type: ordinary
is_intercept: 0
label: sex
- coef: 510.5306054362253
fields:
- col: bmi
type: ordinary
is_intercept: 0
label: bmi
- coef: 327.7369804093466
fields:
- col: bp
type: ordinary
is_intercept: 0
label: bp
- coef: -814.1317093725387
fields:
- col: s1
type: ordinary
is_intercept: 0
label: s1
", split = "\n")[[1]]
Now the model is converted to an R list
using
yaml.load
.
sklearn_model <- yaml.load(sklearn_model)
str(sklearn_model, 2)
#> List of 2
#> $ general:List of 6
#> ..$ is_glm : int 0
#> ..$ model : chr "lm"
#> ..$ residual: int 0
#> ..$ sigma2 : int 0
#> ..$ type : chr "regression"
#> ..$ version : num 2
#> $ terms :List of 6
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
tidypredict
The list
object needs to be recognized as a
tidypredict
parsed model. To do that, we use
as_parsed_model()
library(tidypredict)
spm <- as_parsed_model(sklearn_model)
class(spm)
#> [1] "parsed_model" "pm_regression" "list"
The spm
variable now works just as any parsed model
inside R. Use tidypredict_fit()
to view the resulting
formula.
tidypredict_fit(spm)
#> 152.764306916334 + (age * 0.303499549066043) + (sex * -237.639315333534) +
#> (bmi * 510.530605436225) + (bp * 327.736980409347) + (s1 *
#> -814.131709372539)
Now, the model can run inside a database
broom
Now that we have a parsed_model
object, it is possible
to use broom
’s tidy()
function. This means
that we are able to integrate a totally external model, with
broom
.