How to Install tokenizers in Python
pip install tokenizers
What is tokenizers?
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.
Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.
- Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions). - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. - Easy to use, but also extremely versatile. - Designed for research and production. - Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
Quick Start
Minimal example to get started with tokenizers:
import tokenizers
print(tokenizers.__version__)
Installation
pip (standard)
pip install tokenizers
Virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install tokenizers
pip3
pip3 install tokenizers
conda
conda install -c conda-forge tokenizers
Poetry
poetry add tokenizers
Dependencies
Installing tokenizers will also install these packages:
Verify the Installation
After installing, confirm the package is available:
python -c "import tokenizers; print(tokenizers.__version__)"
If this prints a version number, installation succeeded. If you see a ModuleNotFoundError, see the errors section below.
Installation Errors
Common errors when installing tokenizers with pip.
ModuleNotFoundError: No module named 'tokenizers'
Cause: The package is not installed in the current Python environment.
Fix: Run pip install tokenizers. If using a virtual environment, ensure it is activated first.
ModuleNotFoundError: No module named 'tokenizers' (installed but still failing)
Cause: pip installed the package into a different Python than the one running your script.
Fix: Use python -m pip install tokenizers to install into the interpreter you are running.
ImportError: cannot import name 'X' from 'tokenizers'
Cause: The function or class does not exist in the installed version.
Fix: Check the version with pip show tokenizers and upgrade with pip install --upgrade tokenizers.
pip: command not found
Cause: pip is not in PATH or Python was not added to PATH during installation.
Fix: Try python -m pip install tokenizers. On macOS/Linux try pip3.
PermissionError: [Errno 13] Permission denied
Cause: No write access to the system Python package directory.
Fix: Use a virtual environment, or add --user: pip install --user tokenizers
SSL: CERTIFICATE_VERIFY_FAILED
Cause: pip cannot verify PyPI's SSL certificate — common behind corporate proxies.
Fix: Try: pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org tokenizers
MemoryError when loading data
Cause: Dataset is too large to fit in RAM.
Fix: Read in chunks, filter columns on load, or consider Polars/Dask for out-of-core processing.
Recent Releases
| Version | Released |
|---|---|
0.22.2 latest |
2026-01-05 |
0.22.2rc0 |
2025-12-02 |
0.22.1 |
2025-09-19 |
0.22.1rc0 |
2025-09-19 |
0.22.0 |
2025-08-29 |
Manage tokenizers
Upgrade to latest version
pip install --upgrade tokenizers
Install a specific version
pip install tokenizers==0.22.2
Uninstall
pip uninstall tokenizers
Check what is installed
pip show tokenizers