Skip to content

Commit 0429b64

Browse files
authored
DOC/REF: Clarify pip extras dependencies & cleanups (#49852)
* DOC/REF: Clarify pip extras dependencies & cleanups * quote the install
1 parent 182ba5a commit 0429b64

File tree

2 files changed

+71
-65
lines changed

2 files changed

+71
-65
lines changed

doc/source/getting_started/install.rst

+64-58
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,16 @@ pandas can be installed via pip from
139139

140140
pip install pandas
141141

142+
pandas can also be installed with sets of optional dependencies to enable certain functionality. For example,
143+
to install pandas with the optional dependencies to read Excel files.
144+
145+
::
146+
147+
pip install "pandas[excel]"
148+
149+
150+
The full list of extras that can be installed can be found in the :ref:`dependency section.<install.optional_dependencies>`
151+
142152
Installing with ActivePython
143153
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144154

@@ -232,6 +242,13 @@ This is just an example of what information is shown. You might see a slightly d
232242
Dependencies
233243
------------
234244

245+
.. _install.required_dependencies:
246+
247+
Required dependencies
248+
~~~~~~~~~~~~~~~~~~~~~
249+
250+
pandas requires the following dependencies.
251+
235252
================================================================ ==========================
236253
Package Minimum supported version
237254
================================================================ ==========================
@@ -240,56 +257,48 @@ Package Minimum support
240257
`pytz <https://pypi.org/project/pytz/>`__ 2020.1
241258
================================================================ ==========================
242259

243-
.. _install.recommended_dependencies:
260+
.. _install.optional_dependencies:
244261

245-
Performance dependencies (recommended)
246-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262+
Optional dependencies
263+
~~~~~~~~~~~~~~~~~~~~~
247264

248-
pandas recommends the following optional dependencies for performance gains. These dependencies can be specifically
249-
installed with ``pandas[performance]`` (i.e. add as optional_extra to the pandas requirement)
265+
pandas has many optional dependencies that are only used for specific methods.
266+
For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while
267+
:meth:`DataFrame.to_markdown` requires the ``tabulate`` package. If the
268+
optional dependency is not installed, pandas will raise an ``ImportError`` when
269+
the method requiring that dependency is called.
250270

251-
* `numexpr <https://github.com/pydata/numexpr>`__: for accelerating certain numerical operations.
252-
``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups.
253-
If installed, must be Version 2.7.3 or higher.
271+
If using pip, optional pandas dependencies can be installed or managed in a file (e.g. requirements.txt or pyproject.toml)
272+
as optional extras (e.g.,``pandas[performance, aws]>=1.5.0``). All optional dependencies can be installed with ``pandas[all]``,
273+
and specific sets of dependencies are listed in the sections below.
254274

255-
* `bottleneck <https://github.com/pydata/bottleneck>`__: for accelerating certain types of ``nan``
256-
evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed,
257-
must be Version 1.3.2 or higher.
275+
.. _install.recommended_dependencies:
258276

259-
* `numba <https://github.com/numba/numba>`__: alternative execution engine for operations that accept `engine="numba"
260-
argument (eg. apply). ``numba`` is a JIT compiler that translates Python functions to optimized machine code using
261-
the LLVM compiler library. If installed, must be Version 0.53.1 or higher.
277+
Performance dependencies (recommended)
278+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262279

263280
.. note::
264281

265282
You are highly encouraged to install these libraries, as they provide speed improvements, especially
266283
when working with large data sets.
267284

285+
Installable with ``pip install "pandas[performance]"``
268286

269-
.. _install.optional_dependencies:
270-
271-
Optional dependencies
272-
~~~~~~~~~~~~~~~~~~~~~
273-
274-
pandas has many optional dependencies that are only used for specific methods.
275-
For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while
276-
:meth:`DataFrame.to_markdown` requires the ``tabulate`` package. If the
277-
optional dependency is not installed, pandas will raise an ``ImportError`` when
278-
the method requiring that dependency is called.
279-
280-
Optional pandas dependencies can be managed as optional extras (e.g.,``pandas[performance, aws]>=1.5.0``)
281-
in a requirements.txt, setup, or pyproject.toml file.
282-
Available optional dependencies are ``[all, performance, computation, aws,
283-
gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml,
284-
plot, output_formatting, compression, test]``
287+
===================================================== ================== ================== ===================================================================================================================================================================================
288+
Dependency Minimum Version pip extra Notes
289+
===================================================== ================== ================== ===================================================================================================================================================================================
290+
`numexpr <https://github.com/pydata/numexpr>`__ 2.7.3 performance Accelerates certain numerical operations by using uses multiple cores as well as smart chunking and caching to achieve large speedups
291+
`bottleneck <https://github.com/pydata/bottleneck>`__ 1.3.2 performance Accelerates certain types of ``nan`` by using specialized cython routines to achieve large speedup.
292+
`numba <https://github.com/numba/numba>`__ 0.53.1 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler.
293+
===================================================== ================== ================== ===================================================================================================================================================================================
285294

286295
Timezones
287296
^^^^^^^^^
288297

289-
Can be managed as optional_extra with ``pandas[timezone]``.
298+
Installable with ``pip install "pandas[timezone]"``
290299

291300
========================= ========================= =============== =============================================================
292-
Dependency Minimum Version optional_extra Notes
301+
Dependency Minimum Version pip extra Notes
293302
========================= ========================= =============== =============================================================
294303
tzdata 2022.1(pypi)/ timezone Allows the use of ``zoneinfo`` timezones with pandas.
295304
2022a(for system tzdata) **Note**: You only need to install the pypi package if your
@@ -305,10 +314,10 @@ tzdata 2022.1(pypi)/ timezone Allows the u
305314
Visualization
306315
^^^^^^^^^^^^^
307316

308-
Can be managed as optional_extra with ``pandas[plot, output_formatting]``, depending on the required functionality.
317+
Installable with ``pip install "pandas[plot, output_formatting]"``.
309318

310319
========================= ================== ================== =============================================================
311-
Dependency Minimum Version optional_extra Notes
320+
Dependency Minimum Version pip extra Notes
312321
========================= ================== ================== =============================================================
313322
matplotlib 3.6.1 plot Plotting library
314323
Jinja2 3.0.0 output_formatting Conditional formatting with DataFrame.style
@@ -318,10 +327,10 @@ tabulate 0.8.9 output_formatting Printing in Mark
318327
Computation
319328
^^^^^^^^^^^
320329

321-
Can be managed as optional_extra with ``pandas[computation]``.
330+
Installable with ``pip install "pandas[computation]"``.
322331

323332
========================= ================== =============== =============================================================
324-
Dependency Minimum Version optional_extra Notes
333+
Dependency Minimum Version pip extra Notes
325334
========================= ================== =============== =============================================================
326335
SciPy 1.7.1 computation Miscellaneous statistical functions
327336
xarray 0.19.0 computation pandas-like API for N-dimensional data
@@ -330,10 +339,10 @@ xarray 0.19.0 computation pandas-like API for
330339
Excel files
331340
^^^^^^^^^^^
332341

333-
Can be managed as optional_extra with ``pandas[excel]``.
342+
Installable with ``pip install "pandas[excel]"``.
334343

335344
========================= ================== =============== =============================================================
336-
Dependency Minimum Version optional_extra Notes
345+
Dependency Minimum Version pip extra Notes
337346
========================= ================== =============== =============================================================
338347
xlrd 2.0.1 excel Reading Excel
339348
xlsxwriter 1.4.3 excel Writing Excel
@@ -344,10 +353,10 @@ pyxlsb 1.0.8 excel Reading for xlsb fi
344353
HTML
345354
^^^^
346355

347-
These dependencies can be specifically installed with ``pandas[html]``.
356+
Installable with ``pip install "pandas[html]"``.
348357

349358
========================= ================== =============== =============================================================
350-
Dependency Minimum Version optional_extra Notes
359+
Dependency Minimum Version pip extra Notes
351360
========================= ================== =============== =============================================================
352361
BeautifulSoup4 4.9.3 html HTML parser for read_html
353362
html5lib 1.1 html HTML parser for read_html
@@ -381,22 +390,21 @@ top-level :func:`~pandas.read_html` function:
381390
XML
382391
^^^
383392

384-
Can be managed as optional_extra with ``pandas[xml]``.
393+
Installable with ``pip install "pandas[xml]"``.
385394

386395
========================= ================== =============== =============================================================
387-
Dependency Minimum Version optional_extra Notes
396+
Dependency Minimum Version pip extra Notes
388397
========================= ================== =============== =============================================================
389398
lxml 4.6.3 xml XML parser for read_xml and tree builder for to_xml
390399
========================= ================== =============== =============================================================
391400

392401
SQL databases
393402
^^^^^^^^^^^^^
394403

395-
Can be managed as optional_extra with ``pandas[postgresql, mysql, sql-other]``,
396-
depending on required sql compatibility.
404+
Installable with ``pip install "pandas[postgresql, mysql, sql-other]"``.
397405

398406
========================= ================== =============== =============================================================
399-
Dependency Minimum Version optional_extra Notes
407+
Dependency Minimum Version pip extra Notes
400408
========================= ================== =============== =============================================================
401409
SQLAlchemy 1.4.16 postgresql, SQL support for databases other than sqlite
402410
mysql,
@@ -408,11 +416,10 @@ pymysql 1.0.2 mysql MySQL engine for sq
408416
Other data sources
409417
^^^^^^^^^^^^^^^^^^
410418

411-
Can be managed as optional_extra with ``pandas[hdf5, parquet, feather, spss, excel]``,
412-
depending on required compatibility.
419+
Installable with ``pip install "pandas[hdf5, parquet, feather, spss, excel]"``
413420

414421
========================= ================== ================ =============================================================
415-
Dependency Minimum Version optional_extra Notes
422+
Dependency Minimum Version pip extra Notes
416423
========================= ================== ================ =============================================================
417424
PyTables 3.6.1 hdf5 HDF5-based reading / writing
418425
blosc 1.21.0 hdf5 Compression for HDF5; only available on ``conda``
@@ -441,10 +448,10 @@ odfpy 1.4.1 excel Open document form
441448
Access data in the cloud
442449
^^^^^^^^^^^^^^^^^^^^^^^^
443450

444-
Can be managed as optional_extra with ``pandas[fss, aws, gcp]``, depending on required compatibility.
451+
Installable with ``pip install "pandas[fss, aws, gcp]"``
445452

446453
========================= ================== =============== =============================================================
447-
Dependency Minimum Version optional_extra Notes
454+
Dependency Minimum Version pip extra Notes
448455
========================= ================== =============== =============================================================
449456
fsspec 2021.7.0 fss, gcp, aws Handling files aside from simple local and HTTP (required
450457
dependency of s3fs, gcsfs).
@@ -456,29 +463,28 @@ s3fs 2021.08.0 aws Amazon S3 access
456463
Clipboard
457464
^^^^^^^^^
458465

459-
Can be managed as optional_extra with ``pandas[clipboard]``. However, depending on operating system, system-level
460-
packages may need to installed.
466+
Installable with ``pip install "pandas[clipboard]"``.
461467

462468
========================= ================== =============== =============================================================
463-
Dependency Minimum Version optional_extra Notes
469+
Dependency Minimum Version pip extra Notes
464470
========================= ================== =============== =============================================================
465-
PyQt4/PyQt5 5.15.1 Clipboard I/O
466-
qtpy 2.2.0 Clipboard I/O
471+
PyQt4/PyQt5 5.15.1 clipboard Clipboard I/O
472+
qtpy 2.2.0 clipboard Clipboard I/O
467473
========================= ================== =============== =============================================================
468474

469475
.. note::
470476

477+
Depending on operating system, system-level packages may need to installed.
471478
For clipboard to operate on Linux one of the CLI tools ``xclip`` or ``xsel`` must be installed on your system.
472479

473480

474481
Compression
475482
^^^^^^^^^^^
476483

477-
Can be managed as optional_extra with ``pandas[compression]``.
478-
If only one specific compression lib is required, please request it as an independent requirement.
484+
Installable with ``pip install "pandas[compression]"``
479485

480486
========================= ================== =============== =============================================================
481-
Dependency Minimum Version optional_extra Notes
487+
Dependency Minimum Version pip extra Notes
482488
========================= ================== =============== =============================================================
483489
brotli 0.7.0 compression Brotli compression
484490
python-snappy 0.6.0 compression Snappy compression

doc/source/whatsnew/v2.0.0.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17-
.. _whatsnew_200.enhancements.optional_dependency_management:
17+
.. _whatsnew_200.enhancements.optional_dependency_management_pip:
1818

19-
Optional dependencies version management
20-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21-
Optional pandas dependencies can be managed as extras in a requirements/setup file, for example:
19+
Installing optional dependencies with pip extras
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras.
2222

23-
.. code-block:: python
23+
.. code-block:: bash
2424
25-
pandas[performance, aws]>=2.0.0
25+
pip install "pandas[performance, aws]>=2.0.0"
2626
27-
Available optional dependencies (listed in order of appearance at `install guide <https://pandas.pydata.org/docs/getting_started/install>`_) are
27+
The available extras, found in the :ref:`installation guide<install.dependencies>`, are
2828
``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
2929
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).
3030

0 commit comments

Comments
 (0)