Package index • textrecipes

Step Functions - Tokenization

step_tokenize(): Tokenization of Character Variables

step_tokenize_bpe(): BPE Tokenization of Character Variables

step_tokenize_sentencepiece(): Sentencepiece Tokenization of Character Variables

step_tokenize_wordpiece(): Wordpiece Tokenization of Character Variables

Step Functions - Un-Tokenization

step_untokenize(): Untokenization of Token Variables

Step Functions - Token Modification

step_lemma(): Lemmatization of Token Variables

step_ngram(): Generate n-grams From Token Variables

step_pos_filter(): Part of Speech Filtering of Token Variables

step_stem(): Stemming of Token Variables

step_stopwords(): Filtering of Stop Words for Tokens Variables

step_tokenfilter(): Filter Tokens Based on Term Frequency

step_tokenmerge(): Combine Multiple Token Variables Into One

Step Functions - Numeric Variables From Tokens

step_lda(): Calculate LDA Dimension Estimates of Tokens

step_texthash(): Feature Hashing of Tokens

step_tf(): Term frequency of Tokens

step_tfidf(): Term Frequency-Inverse Document Frequency of Tokens

step_word_embeddings(): Pretrained Word Embeddings of Tokens

Step Functions - Numeric Variables From Characters

step_dummy_hash(): Indicator Variables via Feature Hashing

step_sequence_onehot(): Positional One-Hot encoding of Tokens

step_textfeature(): Calculate Set of Text Features

Step Functions - Text Normalization

step_text_normalization(): Normalization of Character Variables

Step Functions - Text Cleaning

step_clean_levels(): Clean Categorical Levels

step_clean_names(): Clean Variable Names

Token Functions

tokenlist(): Create Token Object

show_tokens(): Show token output of recipe

Selectors

all_tokenized() all_tokenized_predictors(): Role Selection

Count Functions

count_functions: List of all feature counting functions

Data Sets

emoji_samples: Sample sentences with emojis