Skip to content

Step Functions - Tokenization

step_tokenize()
Tokenization of Character Variables
step_tokenize_bpe()
BPE Tokenization of Character Variables
step_tokenize_sentencepiece()
Sentencepiece Tokenization of Character Variables
step_tokenize_wordpiece()
Wordpiece Tokenization of Character Variables

Step Functions - Un-Tokenization

step_untokenize()
Untokenization of Token Variables

Step Functions - Token Modification

step_lemma()
Lemmatization of Token Variables
step_ngram()
Generate n-grams From Token Variables
step_pos_filter()
Part of Speech Filtering of Token Variables
step_stem()
Stemming of Token Variables
step_stopwords()
Filtering of Stop Words for Tokens Variables
step_tokenfilter()
Filter Tokens Based on Term Frequency
step_tokenmerge()
Combine Multiple Token Variables Into One

Step Functions - Numeric Variables From Tokens

step_lda()
Calculate LDA Dimension Estimates of Tokens
step_texthash()
Feature Hashing of Tokens
step_tf()
Term frequency of Tokens
step_tfidf()
Term Frequency-Inverse Document Frequency of Tokens
step_word_embeddings()
Pretrained Word Embeddings of Tokens

Step Functions - Numeric Variables From Characters

step_dummy_hash()
Indicator Variables via Feature Hashing
step_sequence_onehot()
Positional One-Hot encoding of Tokens
step_textfeature()
Calculate Set of Text Features

Step Functions - Text Normalization

step_text_normalization()
Normalization of Character Variables

Step Functions - Text Cleaning

step_clean_levels()
Clean Categorical Levels
step_clean_names()
Clean Variable Names

Token Functions

tokenlist()
Create Token Object
show_tokens()
Show token output of recipe

Selectors

Count Functions

count_functions
List of all feature counting functions

Data Sets

emoji_samples
Sample sentences with emojis