How to Install pyspark in Python
Apache Spark Python API
pip install pyspark
What is pyspark?
Apache Spark Python API
Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
You can find the latest Spark documentation, including a programming guide, on the project web page
This README file only contains basic information related to pip installed PySpark. This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility). Using PySpark requires the Spark JARs, and if you are building this from source please see the builder instructions at "Building Spark".
Quick Start
Minimal example to get started with pyspark:
import pyspark
print(pyspark.__version__)
Installation
pip (standard)
pip install pyspark
Virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install pyspark
pip3
pip3 install pyspark
conda
conda install -c conda-forge pyspark
Poetry
poetry add pyspark
Dependencies
Installing pyspark will also install these packages:
Verify the Installation
After installing, confirm the package is available:
python -c "import pyspark; print(pyspark.__version__)"
If this prints a version number, installation succeeded. If you see a ModuleNotFoundError, see the errors section below.
Installation Errors
Common errors when installing pyspark with pip.
ModuleNotFoundError: No module named 'pyspark'
Cause: The package is not installed in the current Python environment.
Fix: Run pip install pyspark. If using a virtual environment, ensure it is activated first.
ModuleNotFoundError: No module named 'pyspark' (installed but still failing)
Cause: pip installed the package into a different Python than the one running your script.
Fix: Use python -m pip install pyspark to install into the interpreter you are running.
ImportError: cannot import name 'X' from 'pyspark'
Cause: The function or class does not exist in the installed version.
Fix: Check the version with pip show pyspark and upgrade with pip install --upgrade pyspark.
pip: command not found
Cause: pip is not in PATH or Python was not added to PATH during installation.
Fix: Try python -m pip install pyspark. On macOS/Linux try pip3.
PermissionError: [Errno 13] Permission denied
Cause: No write access to the system Python package directory.
Fix: Use a virtual environment, or add --user: pip install --user pyspark
SSL: CERTIFICATE_VERIFY_FAILED
Cause: pip cannot verify PyPI's SSL certificate — common behind corporate proxies.
Fix: Try: pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org pyspark
Recent Releases
| Version | Released |
|---|---|
4.2.0.dev4 |
2026-04-10 |
4.2.0.dev3 |
2026-03-12 |
4.2.0.dev2 |
2026-02-08 |
4.0.2 |
2026-02-05 |
3.5.8 |
2026-01-15 |
Manage pyspark
Upgrade to latest version
pip install --upgrade pyspark
Install a specific version
pip install pyspark==4.1.1
Uninstall
pip uninstall pyspark
Check what is installed
pip show pyspark