Fast Text Tokenization [R package tok version 0.2.1]

Daniel Falbel

tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Version:	0.2.1
Depends:	R (≥ 4.2.0)
Imports:	R6, cli
Suggests:	rmarkdown, testthat (≥ 3.0.0), hfhub (≥ 0.1.1), withr
Published:	2025-09-30
DOI:	10.32614/CRAN.package.tok
Author:	Daniel Falbel [aut, cre], Regouby Christophe [ctb], Posit [cph] tok author details
Maintainer:	Daniel Falbel <daniel at posit.co>
BugReports:	https://github.com/mlverse/tok/issues
License:	MIT + file LICENSE
URL:	https://github.com/mlverse/tok
NeedsCompilation:	yes
SystemRequirements:	Cargo (Rust's package manager), rustc >= 1.75
Materials:	README, NEWS
CRAN checks:	tok results

Documentation:

Reference manual:

tok.html , tok.pdf

Downloads:

Package source:	tok_0.2.1.tar.gz
Windows binaries:	r-devel: tok_0.2.1.zip, r-release: tok_0.2.1.zip, r-oldrel: tok_0.2.1.zip
macOS binaries:	r-release (arm64): tok_0.2.1.tgz, r-oldrel (arm64): tok_0.2.1.tgz, r-release (x86_64): tok_0.2.1.tgz, r-oldrel (x86_64): tok_0.2.1.tgz
Old sources:	tok archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=tok to link to this page.