ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

nikhilweee · 2025-04-14T03:17:54Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Creating a dataframe from a dictionary with columns of varying lengths is not supported.

As of pandas 2.2.3, the following snippet results in ValueError: All arrays must be of the same length

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]})

Feature Description

Pandas should automatically pad columns as necessary to make sure they are the same length. Especially because that's the behavior when the orient argument is set to index. The following works perfectly fine.

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]}, orient="index")

Alternative Solutions

Since pandas already supports rows of varying lengths when the orient argument is set to index, to load a dictionary where not all columns are the same length, an alternative solution would be to set orient to index and transpose the resulting dataframe.

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]}, orient='index').T

Additional Context

Since there is a discrepancy in the way pandas handles loading dictionaries based on the value of the orient argument, it would be great to have parity between the two.

The text was updated successfully, but these errors were encountered:

ShauryaDusht · 2025-04-14T17:20:05Z

Hi! I’d like to work on this.
I propose adding an optional autopad parameter to from_dict() that pads shorter columns
with a fill_value (default np.nan). This keeps existing behavior unchanged unless explicitly enabled.

   df = pd.DataFrame.from_dict(
       {"col1": [1, 2, 3], "col2": [4, 5]},
       autopad=True,
       fill_value=np.nan
   )

Let me know if this approach sounds good!

ShauryaDusht · 2025-04-14T17:23:52Z

take

nikhilweee · 2025-04-14T18:21:38Z

@ShauryaDusht I wonder if it makes sense to have parity between the two (orient='index' and orient='columns') and avoid introducing the new autopad argument? Especially because this new parameter would only affect orient='columns' and would not have any effect (or essentially be always True) when orient='index'

ShauryaDusht · 2025-04-14T18:52:04Z

@nikhilweee
Thanks for the feedback! To maintain parity between orientations, adding three options — index, columns, and all — could be a more consistent solution.
index: pads rows
columns: pads columns
all: pads both
This would offer flexibility while keeping existing behaviour intact. Let me know if this approach works.

nikhilweee · 2025-04-14T20:11:58Z

@ShauryaDusht Are you suggesting we add a new option to the orient argument? Or are you suggesting that these options would be applicable to the new autopad argument?

Either way I still think it makes sense to just update the behaviour of pd.DataFrame.from_dict() to auto pad when orient is set to the default value of columns.

ShauryaDusht · 2025-04-15T04:39:25Z

@nikhilweee I was referring to your approach — adding a new option to the orient argument itself. That felt like a cleaner and more sensible, rather than introducing a separate autopad argument.
So yeah, adding columns and all to orient as part of the overall solution sounds good.

nikhilweee · 2025-04-15T05:05:15Z

@ShauryaDusht Sorry if I was unclear but I am not suggesting that we add any arguments at all. My suggestion is to merely update the behavior of pd.DataFrame.from_dict(data, orient='columns') to match pd.DataFrame.from_dict(data, orient='index') such that pd.DataFrame.from_dict(data, orient='columns') does not complain when the dict values are of varying lengths.

ShauryaDusht · 2025-04-15T05:25:58Z

@nikhilweee Got it — looks like the second orient='columns' was meant to be orient='index' (just a typo).
Now I understand everything. The idea is just to make orient='columns' work like orient='index' does.

So should I wait for the maintainers' review before starting(it is still in triage), or would it be okay to begin working on it now?

nikhilweee · 2025-04-15T05:56:40Z

@ShauryaDusht Yes, that's exactly what I meant (I fixed the typo). I think it's a good idea to wait for what the maintainers have to say.

nikhilweee added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 14, 2025

github-actions bot assigned ShauryaDusht Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

nikhilweee commented Apr 14, 2025

ShauryaDusht commented Apr 14, 2025

ShauryaDusht commented Apr 14, 2025

nikhilweee commented Apr 14, 2025 •

edited

Loading

ShauryaDusht commented Apr 14, 2025

nikhilweee commented Apr 14, 2025

ShauryaDusht commented Apr 15, 2025

nikhilweee commented Apr 15, 2025 •

edited

Loading

ShauryaDusht commented Apr 15, 2025

nikhilweee commented Apr 15, 2025

ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

Comments

nikhilweee commented Apr 14, 2025

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

ShauryaDusht commented Apr 14, 2025

ShauryaDusht commented Apr 14, 2025

nikhilweee commented Apr 14, 2025 • edited Loading

ShauryaDusht commented Apr 14, 2025

nikhilweee commented Apr 14, 2025

ShauryaDusht commented Apr 15, 2025

nikhilweee commented Apr 15, 2025 • edited Loading

ShauryaDusht commented Apr 15, 2025

nikhilweee commented Apr 15, 2025

nikhilweee commented Apr 14, 2025 •

edited

Loading

nikhilweee commented Apr 15, 2025 •

edited

Loading