Package: textrecipes 1.1.0.9000

Emil Hvitfeldt

textrecipes: Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Authors:Emil Hvitfeldt [aut, cre], Michael W. Kearney [cph], Posit Software, PBC [cph, fnd]

textrecipes_1.1.0.9000.tar.gz
textrecipes_1.1.0.9000.zip(r-4.7)textrecipes_1.1.0.9000.zip(r-4.6)textrecipes_1.1.0.9000.zip(r-4.5)
textrecipes_1.1.0.9000.tgz(r-4.6-x86_64)textrecipes_1.1.0.9000.tgz(r-4.6-arm64)textrecipes_1.1.0.9000.tgz(r-4.5-x86_64)textrecipes_1.1.0.9000.tgz(r-4.5-arm64)
textrecipes_1.1.0.9000.tar.gz(r-4.7-arm64)textrecipes_1.1.0.9000.tar.gz(r-4.7-x86_64)textrecipes_1.1.0.9000.tar.gz(r-4.6-arm64)textrecipes_1.1.0.9000.tar.gz(r-4.6-x86_64)
textrecipes_1.1.0.9000.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
textrecipes/json (API)
NEWS

# Install 'textrecipes' in R:
install.packages('textrecipes', repos = c('https://tidymodels.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/tidymodels/textrecipes/issues

Pkgdown/docs site:https://textrecipes.tidymodels.org

Datasets:

On CRAN:

Conda:

10.06 score 164 stars 1 packages 1.0k scripts 1.4k downloads 33 exports 65 dependencies

Last updated from:8d91e505ff. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK252
linux-devel-x86_64OK223
source / vignettesOK424
linux-release-arm64OK252
linux-release-x86_64OK245
macos-release-arm64OK135
macos-release-x86_64OK263
macos-oldrel-arm64OK144
macos-oldrel-x86_64OK436
windows-develOK188
windows-releaseOK180
windows-oldrelOK162
wasm-releaseOK127

Exports:%>%all_tokenizedall_tokenized_predictorscount_functionsngramrequired_pkgsshow_tokensstep_clean_levelsstep_clean_namesstep_dummy_hashstep_ldastep_lemmastep_ngramstep_pos_filterstep_sequence_onehotstep_stemstep_stopwordsstep_text_normalizationstep_textfeaturestep_texthashstep_tfstep_tfidfstep_tokenfilterstep_tokenizestep_tokenize_bpestep_tokenize_sentencepiecestep_tokenize_wordpiecestep_tokenmergestep_untokenizestep_word_embeddingstidytokenlisttunable

Dependencies:classcliclockcodetoolscpp11data.tablediagramdigestdplyrfarverfuturefuture.applygenericsggplot2globalsgluegowergtablehardhatipredisobandKernSmoothlabelinglatticelavalifecyclelistenvlubridatemagrittrMASSMatrixnnetnumDerivparallellypillarpkgconfigprodlimprogressrpurrrR6RColorBrewerRcpprecipesrlangrpartS7scalesshapeSnowballCsparsevctrsSQUAREMstringistringrsurvivaltibbletidyrtidyselecttimechangetimeDatetokenizerstzdbutf8vctrsviridisLitewithr

Cookbook - Using more complex recipes involving text

Rendered fromcookbook---using-more-complex-recipes-involving-text.Rmdusingknitr::rmarkdownon May 30 2026.

Last update: 2025-04-24
Started: 2018-11-04

Under the hood - tokenlist

Rendered fromtokenlist.Rmdusingknitr::rmarkdownon May 30 2026.

Last update: 2025-04-24
Started: 2020-04-08

Working with n-grams

Rendered fromWorking-with-n-grams.Rmdusingknitr::rmarkdownon May 30 2026.

Last update: 2025-04-24
Started: 2020-04-08

Readme and manuals

Help Manual

Help pageTopics
Role Selectionall_tokenized all_tokenized_predictors
List of all feature counting functionscount_functions
Sample sentences with emojisemoji_samples
Show token output of recipeshow_tokens
Clean Categorical Levelsstep_clean_levels tidy.step_clean_levels
Clean Variable Namesstep_clean_names tidy.step_clean_names
Indicator Variables via Feature Hashingstep_dummy_hash tidy.step_dummy_hash
Calculate LDA Dimension Estimates of Tokensstep_lda tidy.step_lda
Lemmatization of Token Variablesstep_lemma tidy.step_lemma
Generate n-grams From Token Variablesstep_ngram tidy.step_ngram
Part of Speech Filtering of Token Variablesstep_pos_filter tidy.step_pos_filter
Positional One-Hot encoding of Tokensstep_sequence_onehot tidy.step_sequence_onehot
Stemming of Token Variablesstep_stem tidy.step_stem
Filtering of Stop Words for Tokens Variablesstep_stopwords tidy.step_stopwords
Normalization of Character Variablesstep_text_normalization tidy.step_text_normalization
Calculate Set of Text Featuresstep_textfeature tidy.step_textfeature
Feature Hashing of Tokensstep_texthash tidy.step_texthash
Term frequency of Tokensstep_tf tidy.step_tf
Term Frequency-Inverse Document Frequency of Tokensstep_tfidf tidy.step_tfidf
Filter Tokens Based on Term Frequencystep_tokenfilter tidy.step_tokenfilter
Tokenization of Character Variablesstep_tokenize tidy.step_tokenize
BPE Tokenization of Character Variablesstep_tokenize_bpe tidy.step_tokenize_bpe
Sentencepiece Tokenization of Character Variablesstep_tokenize_sentencepiece tidy.step_tokenize_sentencepiece
Wordpiece Tokenization of Character Variablesstep_tokenize_wordpiece tidy.step_tokenize_wordpiece
Combine Multiple Token Variables Into Onestep_tokenmerge tidy.step_tokenmerge
Untokenization of Token Variablesstep_untokenize tidy.step_untokenize
Pretrained Word Embeddings of Tokensstep_word_embeddings tidy.step_word_embeddings
Create Token Objecttokenlist