Package: textrecipes 1.0.6.9000

Emil Hvitfeldt

textrecipes: Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Authors:Emil Hvitfeldt [aut, cre], Michael W. Kearney [cph], Posit Software, PBC [cph, fnd]

textrecipes_1.0.6.9000.tar.gz
textrecipes_1.0.6.9000.zip(r-4.5)textrecipes_1.0.6.9000.zip(r-4.4)textrecipes_1.0.6.9000.zip(r-4.3)
textrecipes_1.0.6.9000.tgz(r-4.4-x86_64)textrecipes_1.0.6.9000.tgz(r-4.4-arm64)textrecipes_1.0.6.9000.tgz(r-4.3-x86_64)textrecipes_1.0.6.9000.tgz(r-4.3-arm64)
textrecipes_1.0.6.9000.tar.gz(r-4.5-noble)textrecipes_1.0.6.9000.tar.gz(r-4.4-noble)
textrecipes_1.0.6.9000.tgz(r-4.4-emscripten)textrecipes_1.0.6.9000.tgz(r-4.3-emscripten)
textrecipes.pdf |textrecipes.html
textrecipes/json (API)
NEWS

# Install 'textrecipes' in R:
install.packages('textrecipes', repos = c('https://tidymodels.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/tidymodels/textrecipes/issues

Pkgdown:https://textrecipes.tidymodels.org

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

cpp

10.84 score 160 stars 1 packages 1.0k scripts 1.4k downloads 32 exports 57 dependencies

Last updated 1 months agofrom:a7700f0d53. Checks:OK: 1 NOTE: 8. Indexed: yes.

TargetResultDate
Doc / VignettesOKDec 11 2024
R-4.5-win-x86_64NOTEDec 11 2024
R-4.5-linux-x86_64NOTEDec 11 2024
R-4.4-win-x86_64NOTEDec 11 2024
R-4.4-mac-x86_64NOTEDec 11 2024
R-4.4-mac-aarch64NOTEDec 11 2024
R-4.3-win-x86_64NOTEDec 11 2024
R-4.3-mac-x86_64NOTEDec 11 2024
R-4.3-mac-aarch64NOTEDec 11 2024

Exports:%>%all_tokenizedall_tokenized_predictorscount_functionsrequired_pkgsshow_tokensstep_clean_levelsstep_clean_namesstep_dummy_hashstep_ldastep_lemmastep_ngramstep_pos_filterstep_sequence_onehotstep_stemstep_stopwordsstep_text_normalizationstep_textfeaturestep_texthashstep_tfstep_tfidfstep_tokenfilterstep_tokenizestep_tokenize_bpestep_tokenize_sentencepiecestep_tokenize_wordpiecestep_tokenmergestep_untokenizestep_word_embeddingstidytokenlisttunable

Dependencies:classcliclockcodetoolscpp11data.tablediagramdigestdplyrfansifuturefuture.applygenericsglobalsgluegowerhardhatipredKernSmoothlatticelavalifecyclelistenvlubridatemagrittrMASSMatrixnnetnumDerivparallellypillarpkgconfigprodlimprogressrpurrrR6RcpprecipesrlangrpartshapeSnowballCsparsevctrsSQUAREMstringistringrsurvivaltibbletidyrtidyselecttimechangetimeDatetokenizerstzdbutf8vctrswithr

Cookbook - Using more complex recipes involving text

Rendered fromcookbook---using-more-complex-recipes-involving-text.Rmdusingknitr::rmarkdownon Dec 11 2024.

Last update: 2024-11-09
Started: 2018-11-04

Under the hood - tokenlist

Rendered fromtokenlist.Rmdusingknitr::rmarkdownon Dec 11 2024.

Last update: 2024-04-01
Started: 2020-04-08

Working with n-grams

Rendered fromWorking-with-n-grams.Rmdusingknitr::rmarkdownon Dec 11 2024.

Last update: 2024-04-01
Started: 2020-04-08

Readme and manuals

Help Manual

Help pageTopics
Role Selectionall_tokenized all_tokenized_predictors
List of all feature counting functionscount_functions
Sample sentences with emojisemoji_samples
Show token output of recipeshow_tokens
Clean Categorical Levelsstep_clean_levels tidy.step_clean_levels
Clean Variable Namesstep_clean_names tidy.step_clean_names
Indicator Variables via Feature Hashingstep_dummy_hash tidy.step_dummy_hash
Calculate LDA Dimension Estimates of Tokensstep_lda tidy.step_lda
Lemmatization of Token Variablesstep_lemma tidy.step_lemma
Generate n-grams From Token Variablesstep_ngram tidy.step_ngram
Part of Speech Filtering of Token Variablesstep_pos_filter tidy.step_pos_filter
Positional One-Hot encoding of Tokensstep_sequence_onehot tidy.step_sequence_onehot
Stemming of Token Variablesstep_stem tidy.step_stem
Filtering of Stop Words for Tokens Variablesstep_stopwords tidy.step_stopwords
Normalization of Character Variablesstep_text_normalization tidy.step_text_normalization
Calculate Set of Text Featuresstep_textfeature
Feature Hashing of Tokensstep_texthash tidy.step_texthash
Term frequency of Tokensstep_tf tidy.step_tf
Term Frequency-Inverse Document Frequency of Tokensstep_tfidf tidy.step_tfidf
Filter Tokens Based on Term Frequencystep_tokenfilter tidy.step_tokenfilter
Tokenization of Character Variablesstep_tokenize tidy.step_tokenize
BPE Tokenization of Character Variablesstep_tokenize_bpe tidy.step_tokenize_bpe
Sentencepiece Tokenization of Character Variablesstep_tokenize_sentencepiece tidy.step_tokenize_sentencepiece
Wordpiece Tokenization of Character Variablesstep_tokenize_wordpiece tidy.step_tokenize_wordpiece
Combine Multiple Token Variables Into Onestep_tokenmerge tidy.step_tokenmerge
Untokenization of Token Variablesstep_untokenize tidy.step_untokenize
Pretrained Word Embeddings of Tokensstep_word_embeddings tidy.step_word_embeddings
Create Token Objecttokenlist