broom - Convert Statistical Objects into Tidy Tibbles
Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
Last updated 3 months ago
modelingtidy-data
21.50 score 1.5k stars 1.4k dependents 37k scripts 660k downloadsrecipes - Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Last updated 14 days ago
18.37 score 578 stars 369 dependents 6.5k scripts 108k downloadsrsample - General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Last updated 4 months ago
16.77 score 341 stars 78 dependents 4.8k scripts 73k downloadsparsnip - A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Last updated 2 days ago
16.13 score 610 stars 68 dependents 3.4k scripts 25k downloadstidymodels - Easily Install and Load the 'Tidymodels' Packages
The tidy modeling "verse" is a collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
Last updated 4 months ago
16.11 score 774 stars 14 dependents 65k scripts 25k downloadsinfer - Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
Last updated 5 months ago
15.65 score 731 stars 16 dependents 3.4k scripts 21k downloadsyardstick - Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
Last updated 4 days ago
15.24 score 383 stars 59 dependents 2.2k scripts 36k downloadshardhat - Construct Modeling Packages
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
Last updated 5 days ago
14.65 score 103 stars 427 dependents 175 scripts 108k downloadstune - Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Last updated 2 months ago
14.11 score 289 stars 38 dependents 768 scripts 22k downloadsdials - Tools for Creating Tuning Parameter Values
Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.
Last updated 5 days ago
14.09 score 114 stars 51 dependents 414 scripts 24k downloadsworkflows - Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Last updated 19 days ago
13.62 score 207 stars 42 dependents 852 scripts 21k downloadscorrr - Correlations in R
A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualizing the matrix in terms of the strength of the correlations.
Last updated 1 years ago
13.48 score 591 stars 7 dependents 2.8k scripts 6.2k downloadsworkflowsets - Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Last updated 3 months ago
12.07 score 92 stars 18 dependents 300 scripts 15k downloadsprobably - Tools for Post-Processing Predicted Values
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. 'probably' contains tools for conducting these operations as well as calibration tools and conformal inference techniques for regression models.
Last updated 4 months ago
12.01 score 115 stars 1 dependents 21k scripts 1.7k downloadsstacks - Tidy Model Stacking
Model stacking is an ensemble technique that involves training a model to combine the outputs of many diverse statistical models, and has been shown to improve predictive performance in a variety of settings. 'stacks' implements a grammar for 'tidymodels'-aligned model stacking.
Last updated 4 months ago
11.44 score 295 stars 860 scripts 1.4k downloadsbutcher - Model Butcher
Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.
Last updated 7 days ago
11.33 score 132 stars 13 dependents 146 scripts 4.0k downloadstidypredict - Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.
Last updated 2 months ago
dbplyrdplyrpurrrrlang
11.19 score 259 stars 2 dependents 251 scripts 1.4k downloadsmodeldata - Data Sets Useful for Modeling Examples
Data sets used for demonstrating or testing model-related packages are contained in this package.
Last updated 4 months ago
10.88 score 22 stars 14 dependents 2.1k scripts 27k downloadstextrecipes - Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Last updated 12 days ago
10.76 score 160 stars 1 dependents 992 scripts 729 downloadsthemis - Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Last updated 13 days ago
9.73 score 143 stars 1 dependents 1.1k scripts 5.0k downloadsrules - Model Wrappers for Rule-Based Models
Bindings for additional models for use with the 'parsnip' package. Models include prediction rule ensembles (Friedman and Popescu, 2008) <doi:10.1214/07-AOAS148>, C5.0 rules (Quinlan, 1992 ISBN: 1558602380), and Cubist (Kuhn and Johnson, 2013) <doi:10.1007/978-1-4614-6849-3>.
Last updated 4 months ago
9.47 score 40 stars 1 dependents 20k scripts 830 downloadsembed - Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Last updated 7 days ago
9.30 score 142 stars 1.1k scripts 1.5k downloadsbonsai - Model Wrappers for Tree-Based Models
Bindings for additional tree-based model engines for use with the 'parsnip' package. Models include gradient boosted decision trees with 'LightGBM' (Ke et al, 2017.), conditional inference trees and conditional random forests with 'partykit' (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. <doi:10.1198/106186006X133933>), and accelerated oblique random forests with 'aorsf' (Jaeger et al, 2022 <doi:10.5281/zenodo.7116854>).
Last updated 3 months ago
9.28 score 52 stars 1 dependents 612 scripts 818 downloadsfinetune - Additional Functions for Model Tuning
The ability to tune models is important. 'finetune' enhances the 'tune' package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <arXiv:1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Last updated 6 months ago
8.64 score 62 stars 1 dependents 708 scripts 1.2k downloadscensored - 'parsnip' Engines for Survival Models
Engines for survival models from the 'parsnip' package. These include parametric models (e.g., Jackson (2016) <doi:10.18637/jss.v070.i08>), semi-parametric (e.g., Simon et al (2011) <doi:10.18637/jss.v039.i05>), and tree-based models (e.g., Buehlmann and Hothorn (2007) <doi:10.1214/07-STS242>).
Last updated 6 months ago
parsniptidymodels
8.55 score 123 stars 1 dependents 246 scripts 1.4k downloadstidyposterior - Bayesian Analysis to Compare Models using Resampling Statistics
Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <https://jmlr.org/papers/v18/16-305.html>.
Last updated 4 months ago
8.42 score 102 stars 257 scripts 345 downloadsspatialsample - Spatial Resampling Infrastructure
Functions and classes for spatial resampling to use with the 'rsample' package, such as spatial cross-validation (Brenning, 2012) <doi:10.1109/IGARSS.2012.6352393>. The scope of 'rsample' and 'spatialsample' is to provide the basic building blocks for creating and analyzing resamples of a spatial data set, but neither package includes functions for modeling or computing statistics. The resampled spatial data sets created by 'spatialsample' do not contain much overhead in memory.
Last updated 4 months ago
cpp
8.18 score 72 stars 2 dependents 118 scripts 726 downloadsbaguette - Efficient Model Functions for Bagging
Tree- and rule-based models can be bagged (<doi:10.1007/BF00058655>) using this package and their predictions equations are stored in an efficient format to reduce the model objects size and speed.
Last updated 7 days ago
8.12 score 25 stars 1 dependents 624 scripts 819 downloadsmultilevelmod - Model Wrappers for Multi-Level Models
Bindings for hierarchical regression models for use with the 'parsnip' package. Models include longitudinal generalized linear models (Liang and Zeger, 1986) <doi:10.1093/biomet/73.1.13>, and mixed-effect models (Pinheiro and Bates) <doi:10.1007/978-1-4419-0318-1_1>.
Last updated 4 months ago
8.07 score 74 stars 211 scripts 319 downloadsdiscrim - Model Wrappers for Discriminant Analysis
Bindings for additional classification models for use with the 'parsnip' package. Models include flavors of discriminant analysis, such as linear (Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>), regularized (Friedman (1989) <doi:10.1080/01621459.1989.10478752>), and flexible (Hastie, Tibshirani, and Buja (1994) <doi:10.1080/01621459.1994.10476866>), as well as naive Bayes classifiers (Hand and Yu (2007) <doi:10.1111/j.1751-5823.2001.tb00465.x>).
Last updated 4 months ago
8.02 score 28 stars 1 dependents 992 scripts 955 downloadsbrulee - High-Level Modeling Functions with 'torch'
Provides high-level modeling functions to define and train models using the 'torch' R package. Models include linear, logistic, and multinomial regression as well as multilayer perceptrons.
Last updated 5 days ago
7.60 score 67 stars 214 scripts 588 downloadsmodeldb - Fits Models Inside the Database
Uses 'dplyr' and 'tidyeval' to fit statistical models inside the database. It currently supports KMeans and linear regression models.
Last updated 1 years ago
databasedbplyrdplyrggplot2modelingrlangsqltidyevalvisualization
7.59 score 79 stars 62 scripts 297 downloadsapplicable - A Compilation of Applicability Domain Methods
A modeling package compiling applicability domain methods in R. It combines different methods to measure the amount of extrapolation new samples can have from the training set. See Netzeva et al (2005) <doi:10.1177/026119290503300209> for an overview of applicability domains.
Last updated 2 years ago
7.44 score 47 stars 1 dependents 49 scripts 681 downloadstidyclust - A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Last updated 8 days ago
7.36 score 110 stars 139 scripts 861 downloadspoissonreg - Model Wrappers for Poisson Regression
Bindings for Poisson regression models for use with the 'parsnip' package. Models include simple generalized linear models, Bayesian models, and zero-inflated Poisson models (Zeileis, Kleiber, and Jackman (2008) <doi:10.18637/jss.v027.i08>).
Last updated 3 months ago
7.26 score 22 stars 1 dependents 342 scripts 357 downloadsmodelenv - Provide Tools to Register Models for Use in 'tidymodels'
An developer focused, low dependency package in 'tidymodels' that provides functions to register how models are to be used. Functions to register models are complimented with accessor functions to retrieve registered model information to aid in model fitting and error handling.
Last updated 4 months ago
7.01 score 4 stars 43 dependents 1 scripts 20k downloadsusemodels - Boilerplate Code for 'Tidymodels' Analyses
Code snippets to fit models using the tidymodels framework can be easily created for a given data set.
Last updated 4 months ago
6.90 score 84 stars 134 scripts 243 downloadsagua - 'tidymodels' Integration with 'h2o'
Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.
Last updated 8 months ago
6.85 score 22 stars 80 scripts 920 downloadsplsmod - Model Wrappers for Projection Methods
Bindings for additional regression models for use with the 'parsnip' package, including ordinary and spare partial least squares models for regression and classification (Rohart et al (2017) <doi:10.1371/journal.pcbi.1005752>).
Last updated 4 months ago
mixomics
6.47 score 14 stars 1 dependents 58 scripts 526 downloadsshinymodels - Interactive Assessments of Models
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Last updated 3 months ago
shiny
6.39 score 47 stars 50 scripts 212 downloadsorbital - Predict with 'tidymodels' Workflows in Databases
Turn 'tidymodels' workflows into objects containing the sufficient sequential equations to perform predictions. These smaller objects allow for low dependency prediction locally or directly in databases.
Last updated 2 months ago
6.22 score 25 stars 11 scripts 353 downloadsmodeldatatoo - More Data Sets Useful for Modeling Examples
More data sets used for demonstrating or testing model-related packages are contained in this package. The data sets are downloaded and cached, allowing for more and bigger data sets.
Last updated 10 months ago
4.55 score 7 stars 34 scripts 201 downloadsdesirability2 - Desirability Functions for Multiparameter Optimization
In-line functions for multivariate optimization via desirability functions (Derringer and Suich, 1980, <doi:10.1080/00224065.1980.11980968>) with easy use within `dplyr` pipelines.
Last updated 4 months ago
4.53 score 10 stars 17 scripts 207 downloads