Changes in version 1.1.0.9000                      

                 Changes in version 1.1.0 (2025-03-18)                  

Improvements

  - The following steps has gained the argument sparse. When set to
    "yes", they will produce sparse vectors. (#277)
      - step_dummy_hash()
      - step_texthash()
      - step_tf()
      - step_tfidf()

                 Changes in version 1.0.7 (2025-01-23)                  

Improvements

  - Documentation for tidy methods for all steps has been improved to
    describe the return value more accurately. (#262)

  - Calling ?tidy.step_*() now sends you to the documentation for
    step_*() where the outcome is documented. (#261)

  - step_textfeatures() has been made faster and more robust. (#265)

Bug Fixes

  - Fixed bug in step_clean_levels() where it would produce NAs for
    character columns. (#274)

                 Changes in version 1.0.6 (2023-11-15)                  

  - textfeatures has been removed from Suggests. (#255)

  - step_textfeatures() no longer returns a politeness feature. (#254)

                 Changes in version 1.0.5 (2023-10-20)                  

  - step_untokenize() and step_normalization() now returns factors
    instead of strings. (#247)

                 Changes in version 1.0.4 (2023-08-17)                  

Improvements

  - step_clean_names() now throw an informative error if needed
    non-standard role columns are missing during bake(). (#235)

  - The keep_original_cols argument has been added to step_tokenmerge.
    This change should mean that every step that produces new columns
    has the keep_original_cols argument. (#242)

  - Many internal changes to improve consistency and slight speed
    increases.

Bug Fixes

  - Fixed bug where step_dummy_hash() and step_texthash() would add new
    columns before old columns. (#235)

  - Fixed bug where vocabulary_size wasn't tunable in
    step_tokenize_bpe(). (#239)

                 Changes in version 1.0.3 (2023-04-14)                  

Improvements

  - Steps with tunable arguments now have those arguments listed in the
    documentation.

  - All steps that add new columns will now informatively error if name
    collision occurs.

Bug Fixes

  - Fixed bug where step_tf() wasn't tunable for weight argument.

                 Changes in version 1.0.2 (2022-12-21)                  

  - Setting token = "tweets" in step_tokenize() have been deprecated due
    to tokenizers::tokenize_tweets() being deprecated. (#209)

  - step_sequence_onehot(), step_dummy_hash(), step_dummy_texthash() now
    return integers. step_tf() returns integer when weight_scheme is
    "binary" or "raw count".

  - All steps now have required_pkgs() methods.

                 Changes in version 1.0.1 (2022-10-06)                  

  - Examples no longer include if (require(...)) code.

                 Changes in version 1.0.0 (2022-07-02)                  

  - Indicate which steps support case weights (none), to align
    documentation with other packages.

                 Changes in version 0.5.2 (2022-05-04)                  

  - Remove use of okc_text in vignette

  - Fix bug in printing of tokenlists

                 Changes in version 0.5.1 (2022-03-29)                  

  - step_tfidf() now correctly saves the idf values and applies them to
    the testing data set.

  - tidy.step_tfidf() now returns calculated IDF weights.

                 Changes in version 0.5.0 (2022-03-20)                  

New steps

  - step_dummy_hash() generates binary indicators (possibly signed) from
    simple factor or character vectors.

  - step_tokenize() has gotten a couple of cousin functions
    step_tokenize_bpe(), step_tokenize_sentencepiece() and
    step_tokenize_wordpiece() which wraps {tokenizers.bpe},
    {sentencepiece} and {wordpiece} respectively (#147).

Improvements and Other Changes

  - Added all_tokenized() and all_tokenized_predictors() to more easily
    select tokenized columns (#132).

  - Use show_tokens() to more easily debug a recipe involving
    tokenization.

  - Reorganize documentation for all recipe step tidy methods (#126).

  - Steps now have a dedicated subsection detailing what happens when
    tidy() is applied. (#163)

  - All recipe steps now officially support empty selections to be more
    aligned with dplyr and other packages that use tidyselect (#141).

  - step_ngram() has been given a speed increase to put it in line with
    other packages performance.

  - step_tokenize() will now try to error if vocabulary size is too low
    when using engine = "tokenizers.bpe" (#119).

  - Warning given by step_tokenfilter() when filtering failed to apply
    now correctly refers to the right argument name (#137).

  - step_tf() now returns 0 instead of NaN when there aren't any tokens
    present (#118).

  - step_tokenfilter() now has a new argument filter_fun will takes a
    function which can be used to filter tokens. (#164)

  - tidy.step_stem() now correctly shows if custom stemmer was used.

  - Added keep_original_cols argument to step_lda, step_texthash(),
    step_tf(), step_tfidf(), step_word_embeddings(), step_dummy_hash(),
    step_sequence_onehot(), and step_textfeatures() (#139).

Breaking Changes

  - Steps with prefix argument now creates names according to the
    pattern prefix_variablename_name/number. (#124)

                 Changes in version 0.4.1 (2021-07-11)                  

Bug fixes

  - Fixed a bug in step_tokenfilter() and step_sequence_onehot() that
    sometimes caused crashes in R 4.1.0.

                 Changes in version 0.4.0 (2020-11-12)                  

Breaking Changes

  - step_lda() now takes a tokenlist instead of a character variable.
    See readme for more detail.

New Features

  - step_sequence_onehot() now takes tokenlists as input.
  - added {tokenizers.bpe} engine to step_tokenize().
  - added {udpipe} engine to step_tokenize().
  - added new steps for cleaning variable names or levels with
    {janitor}, step_clean_names() and step_clean_levels(). (#101)

                 Changes in version 0.3.0 (2020-07-08)                  

  - stopwords package have been moved from Imports to Suggests.
  - step_ngram() gained an argument min_num_tokens to be able to return
    multiple n-grams together. (#90)
  - Adds step_text_normalization() to perform unicode normalization on
    character vectors. (#86)

                 Changes in version 0.2.3 (2020-05-22)                  

                 Changes in version 0.2.2 (2020-05-10)                  

  - step_word_embeddings() got a argument aggregation_default to specify
    value in cases where no words matches embedding.

                 Changes in version 0.2.1 (2020-05-04)                  

                 Changes in version 0.2.0 (2020-04-14)                  

  - step_tokenize() got an engine argument to specify packages other
    then tokenizers to tokenize.
  - spacyr have been added as an engine to step_tokenize().
  - step_lemma() has been added to extract lemma attribute from
    tokenlists.
  - step_pos_filter() has been added to allow filtering of tokens bases
    on their pat of speech tags.
  - step_ngram() has been added to generate ngrams from tokenlists.
  - step_stem() not correctly uses the options argument. (Thanks to
    @grayskripko for finding bug, #64)

                 Changes in version 0.1.0 (2020-03-05)                  

  - step_word2vec() have been changed to step_lda() to reflect what is
    actually happening.
  - step_word_embeddings() has been added. Allows for use of pre-trained
    word embeddings to convert token columns to vectors in a
    high-dimensional "meaning" space. (@jonthegeek, #20)
  - text2vec have been changed from Imports to Suggests.
  - textfeatures have been changed from Imports to Suggests.
  - step_tfidf() calculations are slightly changed due to flaw in
    original implementation
    https://github.com/dselivanov/text2vec/issues/280.

                 Changes in version 0.0.2 (2019-09-07)                  

  - Custom stemming function can now be used in step_stem using the
    custom_stemmer argument.
  - step_textfeatures() have been added, allows for multiple numerical
    features to be pulled from text.
  - step_sequence_onehot() have been added, allows for one hot encoding
    of sequences of fixed width.
  - step_word2vec() have been added, calculates word2vec dimensions.
  - step_tokenmerge() have been added, combines multiple list columns
    into one list-columns.
  - step_texthash() now correctly accepts signed argument.
  - Documentation have been improved to showcase the importance of
    filtering tokens before applying step_tf() and step_tfidf().

                 Changes in version 0.0.1 (2018-12-17)                  

First CRAN version