Skip to content

Commit 5854693

Browse files
committed
Merge remote-tracking branch 'upstream/main' into bump/optional
2 parents 4ae1ed2 + 78acf94 commit 5854693

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+880
-357
lines changed

.github/workflows/wheels.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ jobs:
153153
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
154154

155155
- name: Build wheels
156-
uses: pypa/[email protected].0
156+
uses: pypa/[email protected].1
157157
with:
158158
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
159159
env:

ci/code_checks.sh

-4
Original file line numberDiff line numberDiff line change
@@ -72,13 +72,9 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7272
-i "pandas.Series.dt PR01" `# Accessors are implemented as classes, but we do not document the Parameters section` \
7373
-i "pandas.Period.freq GL08" \
7474
-i "pandas.Period.ordinal GL08" \
75-
-i "pandas.Timedelta.max PR02" \
76-
-i "pandas.Timedelta.min PR02" \
77-
-i "pandas.Timedelta.resolution PR02" \
7875
-i "pandas.Timestamp.max PR02" \
7976
-i "pandas.Timestamp.min PR02" \
8077
-i "pandas.Timestamp.resolution PR02" \
81-
-i "pandas.Timestamp.tzinfo GL08" \
8278
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02" \
8379
-i "pandas.core.groupby.SeriesGroupBy.plot PR02" \
8480
-i "pandas.core.resample.Resampler.quantile PR01,PR07" \

doc/source/development/contributing_codebase.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ Preferred ``pytest`` idioms
537537
test and does not check if the test will fail. If this is the behavior you desire, use ``pytest.skip`` instead.
538538

539539
If a test is known to fail but the manner in which it fails
540-
is not meant to be captured, use ``pytest.mark.xfail`` It is common to use this method for a test that
540+
is not meant to be captured, use ``pytest.mark.xfail``. It is common to use this method for a test that
541541
exhibits buggy behavior or a non-implemented feature. If
542542
the failing test has flaky behavior, use the argument ``strict=False``. This
543543
will make it so pytest does not fail if the test happens to pass. Using ``strict=False`` is highly undesirable, please use it only as a last resort.

doc/source/user_guide/basics.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ of elements to display is five, but you may pass a custom number.
3636
Attributes and underlying data
3737
------------------------------
3838

39-
pandas objects have a number of attributes enabling you to access the metadata
39+
pandas objects have a number of attributes enabling you to access the metadata.
4040

4141
* **shape**: gives the axis dimensions of the object, consistent with ndarray
4242
* Axis labels
@@ -59,7 +59,7 @@ NumPy's type system to add support for custom arrays
5959
(see :ref:`basics.dtypes`).
6060

6161
To get the actual data inside a :class:`Index` or :class:`Series`, use
62-
the ``.array`` property
62+
the ``.array`` property.
6363

6464
.. ipython:: python
6565
@@ -88,18 +88,18 @@ NumPy doesn't have a dtype to represent timezone-aware datetimes, so there
8888
are two possibly useful representations:
8989

9090
1. An object-dtype :class:`numpy.ndarray` with :class:`Timestamp` objects, each
91-
with the correct ``tz``
91+
with the correct ``tz``.
9292
2. A ``datetime64[ns]`` -dtype :class:`numpy.ndarray`, where the values have
93-
been converted to UTC and the timezone discarded
93+
been converted to UTC and the timezone discarded.
9494

95-
Timezones may be preserved with ``dtype=object``
95+
Timezones may be preserved with ``dtype=object``:
9696

9797
.. ipython:: python
9898
9999
ser = pd.Series(pd.date_range("2000", periods=2, tz="CET"))
100100
ser.to_numpy(dtype=object)
101101
102-
Or thrown away with ``dtype='datetime64[ns]'``
102+
Or thrown away with ``dtype='datetime64[ns]'``:
103103

104104
.. ipython:: python
105105

doc/source/whatsnew/v2.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Other enhancements
3737
updated to work correctly with NumPy >= 2 (:issue:`57739`)
3838
- :meth:`Series.str.decode` result now has ``StringDtype`` when ``future.infer_string`` is True (:issue:`60709`)
3939
- :meth:`~Series.to_hdf` and :meth:`~DataFrame.to_hdf` now round-trip with ``StringDtype`` (:issue:`60663`)
40+
- Improved ``repr`` of :class:`.NumpyExtensionArray` to account for NEP51 (:issue:`61085`)
4041
- The :meth:`Series.str.decode` has gained the argument ``dtype`` to control the dtype of the result (:issue:`60940`)
4142
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for ``StringDtype`` columns (:issue:`60633`)
4243
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)

doc/source/whatsnew/v3.0.0.rst

+9
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,13 @@ Other enhancements
6161
- :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`)
6262
- :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`)
6363
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`)
64+
- :class:`ArrowDtype` now supports ``pyarrow.JsonType`` (:issue:`60958`)
6465
- :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` methods ``sum``, ``mean``, ``median``, ``prod``, ``min``, ``max``, ``std``, ``var`` and ``sem`` now accept ``skipna`` parameter (:issue:`15675`)
6566
- :class:`Rolling` and :class:`Expanding` now support ``nunique`` (:issue:`26958`)
6667
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
6768
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
6869
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
70+
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
6971
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
7072
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
7173
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
@@ -662,6 +664,7 @@ Bug fixes
662664
Categorical
663665
^^^^^^^^^^^
664666
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
667+
- Bug in :meth:`Series.convert_dtypes` with ``dtype_backend="pyarrow"`` where empty :class:`CategoricalDtype` :class:`Series` raised an error or got converted to ``null[pyarrow]`` (:issue:`59934`)
665668
-
666669

667670
Datetimelike
@@ -729,6 +732,7 @@ Indexing
729732
- Bug in :meth:`Index.get_indexer` and similar methods when ``NaN`` is located at or after position 128 (:issue:`58924`)
730733
- Bug in :meth:`MultiIndex.insert` when a new value inserted to a datetime-like level gets cast to ``NaT`` and fails indexing (:issue:`60388`)
731734
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
735+
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
732736

733737
Missing
734738
^^^^^^^
@@ -765,6 +769,7 @@ I/O
765769
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
766770
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
767771
- Bug in :meth:`read_html` where ``rowspan`` in header row causes incorrect conversion to ``DataFrame``. (:issue:`60210`)
772+
- Bug in :meth:`read_json` ignoring the given ``dtype`` when ``engine="pyarrow"`` (:issue:`59516`)
768773
- Bug in :meth:`read_json` not validating the ``typ`` argument to not be exactly ``"frame"`` or ``"series"`` (:issue:`59124`)
769774
- Bug in :meth:`read_json` where extreme value integers in string format were incorrectly parsed as a different integer number (:issue:`20608`)
770775
- Bug in :meth:`read_stata` raising ``KeyError`` when input file is stored in big-endian format and contains strL data. (:issue:`58638`)
@@ -795,6 +800,7 @@ Groupby/resample/rolling
795800
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
796801
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
797802
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
803+
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
798804
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
799805
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
800806
- Bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
@@ -810,6 +816,7 @@ Reshaping
810816
^^^^^^^^^
811817
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
812818
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
819+
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
813820
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
814821
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
815822
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
@@ -860,9 +867,11 @@ Other
860867
- Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`)
861868
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
862869
- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
870+
- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
863871
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
864872
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
865873
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
874+
- Bug in :meth:`Series.mode` where an exception was raised when taking the mode with nullable types with no null values in the series. (:issue:`58926`)
866875
- Bug in :meth:`Series.rank` that doesn't preserve missing values for nullable integers when ``na_option='keep'``. (:issue:`56976`)
867876
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently replacing matching instances when ``regex=True`` and missing values are present. (:issue:`56599`)
868877
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` throwing ``ValueError`` when ``regex=True`` and all NA values. (:issue:`60688`)

pandas/_libs/hashtable_func_helper.pxi.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ def mode(ndarray[htfunc_t] values, bint dropna, const uint8_t[:] mask=None):
430430

431431
if na_counter > 0:
432432
res_mask = np.zeros(j+1, dtype=np.bool_)
433-
res_mask[j] = True
433+
res_mask[j] = (na_counter == max_count)
434434
return modes[:j + 1], res_mask
435435

436436

pandas/_libs/lib.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -1518,7 +1518,7 @@ cdef object _try_infer_map(object dtype):
15181518

15191519
def infer_dtype(value: object, skipna: bool = True) -> str:
15201520
"""
1521-
Return a string label of the type of a scalar or list-like of values.
1521+
Return a string label of the type of the elements in a list-like input.
15221522

15231523
This method inspects the elements of the provided input and determines
15241524
classification of its data type. It is particularly useful for
@@ -1527,7 +1527,7 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
15271527

15281528
Parameters
15291529
----------
1530-
value : scalar, list, ndarray, or pandas type
1530+
value : list, ndarray, or pandas type
15311531
The input data to infer the dtype.
15321532
skipna : bool, default True
15331533
Ignore NaN values when inferring the type.

pandas/_libs/tslibs/period.pyx

+2-5
Original file line numberDiff line numberDiff line change
@@ -1752,9 +1752,6 @@ cdef class _Period(PeriodMixin):
17521752
def __cinit__(self, int64_t ordinal, BaseOffset freq):
17531753
self.ordinal = ordinal
17541754
self.freq = freq
1755-
# Note: this is more performant than PeriodDtype.from_date_offset(freq)
1756-
# because from_date_offset cannot be made a cdef method (until cython
1757-
# supported cdef classmethods)
17581755
self._dtype = PeriodDtypeBase(freq._period_dtype_code, freq.n)
17591756

17601757
@classmethod
@@ -1913,7 +1910,7 @@ cdef class _Period(PeriodMixin):
19131910

19141911
Parameters
19151912
----------
1916-
freq : str, BaseOffset
1913+
freq : str, DateOffset
19171914
The target frequency to convert the Period object to.
19181915
If a string is provided,
19191916
it must be a valid :ref:`period alias <timeseries.period_aliases>`.
@@ -2599,7 +2596,7 @@ cdef class _Period(PeriodMixin):
25992596
26002597
Parameters
26012598
----------
2602-
freq : str, BaseOffset
2599+
freq : str, DateOffset
26032600
Frequency to use for the returned period.
26042601
26052602
See Also

pandas/_libs/tslibs/timedeltas.pyx

+77-6
Original file line numberDiff line numberDiff line change
@@ -998,8 +998,9 @@ class MinMaxReso:
998998
and Timedelta class. On an instance, these depend on the object's _reso.
999999
On the class, we default to the values we would get with nanosecond _reso.
10001000
"""
1001-
def __init__(self, name):
1001+
def __init__(self, name, docstring):
10021002
self._name = name
1003+
self.__doc__ = docstring
10031004

10041005
def __get__(self, obj, type=None):
10051006
if self._name == "min":
@@ -1012,9 +1013,13 @@ class MinMaxReso:
10121013

10131014
if obj is None:
10141015
# i.e. this is on the class, default to nanos
1015-
return Timedelta(val)
1016+
result = Timedelta(val)
10161017
else:
1017-
return Timedelta._from_value_and_reso(val, obj._creso)
1018+
result = Timedelta._from_value_and_reso(val, obj._creso)
1019+
1020+
result.__doc__ = self.__doc__
1021+
1022+
return result
10181023

10191024
def __set__(self, obj, value):
10201025
raise AttributeError(f"{self._name} is not settable.")
@@ -1033,9 +1038,75 @@ cdef class _Timedelta(timedelta):
10331038

10341039
# higher than np.ndarray and np.matrix
10351040
__array_priority__ = 100
1036-
min = MinMaxReso("min")
1037-
max = MinMaxReso("max")
1038-
resolution = MinMaxReso("resolution")
1041+
1042+
_docstring_min = """
1043+
Returns the minimum bound possible for Timedelta.
1044+
1045+
This property provides access to the smallest possible value that
1046+
can be represented by a Timedelta object.
1047+
1048+
Returns
1049+
-------
1050+
Timedelta
1051+
1052+
See Also
1053+
--------
1054+
Timedelta.max: Returns the maximum bound possible for Timedelta.
1055+
Timedelta.resolution: Returns the smallest possible difference between
1056+
non-equal Timedelta objects.
1057+
1058+
Examples
1059+
--------
1060+
>>> pd.Timedelta.min
1061+
-106752 days +00:12:43.145224193
1062+
"""
1063+
1064+
_docstring_max = """
1065+
Returns the maximum bound possible for Timedelta.
1066+
1067+
This property provides access to the largest possible value that
1068+
can be represented by a Timedelta object.
1069+
1070+
Returns
1071+
-------
1072+
Timedelta
1073+
1074+
See Also
1075+
--------
1076+
Timedelta.min: Returns the minimum bound possible for Timedelta.
1077+
Timedelta.resolution: Returns the smallest possible difference between
1078+
non-equal Timedelta objects.
1079+
1080+
Examples
1081+
--------
1082+
>>> pd.Timedelta.max
1083+
106751 days 23:47:16.854775807
1084+
"""
1085+
1086+
_docstring_reso = """
1087+
Returns the smallest possible difference between non-equal Timedelta objects.
1088+
1089+
The resolution value is determined by the underlying representation of time
1090+
units and is equivalent to Timedelta(nanoseconds=1).
1091+
1092+
Returns
1093+
-------
1094+
Timedelta
1095+
1096+
See Also
1097+
--------
1098+
Timedelta.max: Returns the maximum bound possible for Timedelta.
1099+
Timedelta.min: Returns the minimum bound possible for Timedelta.
1100+
1101+
Examples
1102+
--------
1103+
>>> pd.Timedelta.resolution
1104+
0 days 00:00:00.000000001
1105+
"""
1106+
1107+
min = MinMaxReso("min", _docstring_min)
1108+
max = MinMaxReso("max", _docstring_max)
1109+
resolution = MinMaxReso("resolution", _docstring_reso)
10391110

10401111
@property
10411112
def value(self):

pandas/_libs/tslibs/timestamps.pyx

+28
Original file line numberDiff line numberDiff line change
@@ -2208,6 +2208,34 @@ class Timestamp(_Timestamp):
22082208
"""
22092209
return super().tzname()
22102210
2211+
@property
2212+
def tzinfo(self):
2213+
"""
2214+
Returns the timezone info of the Timestamp.
2215+
2216+
This property returns a `datetime.tzinfo` object if the Timestamp
2217+
is timezone-aware. If the Timestamp has no timezone, it returns `None`.
2218+
If the Timestamp is in UTC or a fixed-offset timezone,
2219+
it returns `datetime.timezone`. If the Timestamp uses an
2220+
IANA timezone (e.g., "America/New_York"), it returns `zoneinfo.ZoneInfo`.
2221+
2222+
See Also
2223+
--------
2224+
Timestamp.tz : Alias for `tzinfo`, may return a `zoneinfo.ZoneInfo` object.
2225+
Timestamp.tz_convert : Convert timezone-aware Timestamp to another time zone.
2226+
Timestamp.tz_localize : Localize the Timestamp to a specific timezone.
2227+
2228+
Examples
2229+
--------
2230+
>>> ts = pd.Timestamp("2023-01-01 12:00:00", tz="UTC")
2231+
>>> ts.tzinfo
2232+
datetime.timezone.utc
2233+
2234+
>>> ts_naive = pd.Timestamp("2023-01-01 12:00:00")
2235+
>>> ts_naive.tzinfo
2236+
"""
2237+
return super().tzinfo
2238+
22112239
def utcoffset(self):
22122240
"""
22132241
Return utc offset.

pandas/api/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""public toolkit API"""
22

33
from pandas.api import (
4+
executors,
45
extensions,
56
indexers,
67
interchange,
@@ -9,6 +10,7 @@
910
)
1011

1112
__all__ = [
13+
"executors",
1214
"extensions",
1315
"indexers",
1416
"interchange",

pandas/api/executors/__init__.py

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""
2+
Public API for function executor engines to be used with ``map`` and ``apply``.
3+
"""
4+
5+
from pandas.core.apply import BaseExecutionEngine
6+
7+
__all__ = ["BaseExecutionEngine"]

pandas/compat/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
pa_version_under17p0,
3636
pa_version_under18p0,
3737
pa_version_under19p0,
38+
pa_version_under20p0,
3839
)
3940

4041
if TYPE_CHECKING:
@@ -168,4 +169,5 @@ def is_ci_environment() -> bool:
168169
"pa_version_under17p0",
169170
"pa_version_under18p0",
170171
"pa_version_under19p0",
172+
"pa_version_under20p0",
171173
]

0 commit comments

Comments
 (0)