Skip to content

ENH: pd.DataFrame.from_dict() should support loading columns of varying lengths #61282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
nikhilweee opened this issue Apr 14, 2025 · 9 comments
Open
1 of 3 tasks
Assignees
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@nikhilweee
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Creating a dataframe from a dictionary with columns of varying lengths is not supported.

As of pandas 2.2.3, the following snippet results in ValueError: All arrays must be of the same length

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]})

Feature Description

Pandas should automatically pad columns as necessary to make sure they are the same length. Especially because that's the behavior when the orient argument is set to index. The following works perfectly fine.

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]}, orient="index")

Alternative Solutions

Since pandas already supports rows of varying lengths when the orient argument is set to index, to load a dictionary where not all columns are the same length, an alternative solution would be to set orient to index and transpose the resulting dataframe.

df = pd.DataFrame.from_dict({"col1": [1, 2, 3], "col2": [4, 5]}, orient='index').T

Additional Context

Since there is a discrepancy in the way pandas handles loading dictionaries based on the value of the orient argument, it would be great to have parity between the two.

@nikhilweee nikhilweee added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 14, 2025
@ShauryaDusht
Copy link

Hi! I’d like to work on this.
I propose adding an optional autopad parameter to from_dict() that pads shorter columns
with a fill_value (default np.nan). This keeps existing behavior unchanged unless explicitly enabled.

   df = pd.DataFrame.from_dict(
       {"col1": [1, 2, 3], "col2": [4, 5]},
       autopad=True,
       fill_value=np.nan
   )

Let me know if this approach sounds good!

@ShauryaDusht
Copy link

take

@nikhilweee
Copy link
Author

nikhilweee commented Apr 14, 2025

@ShauryaDusht I wonder if it makes sense to have parity between the two (orient='index' and orient='columns') and avoid introducing the new autopad argument? Especially because this new parameter would only affect orient='columns' and would not have any effect (or essentially be always True) when orient='index'

@ShauryaDusht
Copy link

@nikhilweee
Thanks for the feedback! To maintain parity between orientations, adding three options — index, columns, and all — could be a more consistent solution.
index: pads rows
columns: pads columns
all: pads both
This would offer flexibility while keeping existing behaviour intact. Let me know if this approach works.

@nikhilweee
Copy link
Author

@ShauryaDusht Are you suggesting we add a new option to the orient argument? Or are you suggesting that these options would be applicable to the new autopad argument?

Either way I still think it makes sense to just update the behaviour of pd.DataFrame.from_dict() to auto pad when orient is set to the default value of columns.

@ShauryaDusht
Copy link

@nikhilweee I was referring to your approach — adding a new option to the orient argument itself. That felt like a cleaner and more sensible, rather than introducing a separate autopad argument.
So yeah, adding columns and all to orient as part of the overall solution sounds good.

@nikhilweee
Copy link
Author

nikhilweee commented Apr 15, 2025

@ShauryaDusht Sorry if I was unclear but I am not suggesting that we add any arguments at all. My suggestion is to merely update the behavior of pd.DataFrame.from_dict(data, orient='columns') to match pd.DataFrame.from_dict(data, orient='index') such that pd.DataFrame.from_dict(data, orient='columns') does not complain when the dict values are of varying lengths.

@ShauryaDusht
Copy link

@nikhilweee Got it — looks like the second orient='columns' was meant to be orient='index' (just a typo).
Now I understand everything. The idea is just to make orient='columns' work like orient='index' does.

So should I wait for the maintainers' review before starting(it is still in triage), or would it be okay to begin working on it now?

@nikhilweee
Copy link
Author

@ShauryaDusht Yes, that's exactly what I meant (I fixed the typo). I think it's a good idea to wait for what the maintainers have to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants