Pandas version checks
- [x] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
doc/source/user_guide/reshaping.rst
Documentation problem
the table given as an example for pivot() is wrong and cant be used. it would return "error duplicate index" as there are duplicate values in the column given for "index" parameter.
Suggested fix for documentation
The "foo" column must contain unique values
Comment From: goutam-kul
@mheskett It will not throw ValueError: Index contains duplicate entries, cannot reshape
, because the index
(fool
) and columns
(bar
) have unique combinations:
import pandas as pd
data = {"foo": ['one', 'one', 'one', 'two', 'two', 'two'],
"bar": ['A', 'B', 'C', 'A', 'B', 'C'],
"baz": [1, 2, 3, 4, 5, 6],
"zoo": ['x', 'y', 'z', 'q', 'w', 't']
}
df = pd.DataFrame(data=data)
# print(df)
out = df.pivot(index='foo', columns='bar', values='baz')
print(out)
Output:
bar A B C
foo
one 1 2 3
two 4 5 6
What happens if I introduce a non-unique combination? yes it will throw duplicate index error. E.g:
data = {"foo": ['one', 'one', 'one', 'two', 'two', 'two'],
"bar": ['A', 'A', 'C', 'A', 'B', 'C'],
"baz": [1, 2, 3, 4, 5, 6],
"zoo": ['x', 'y', 'z', 'q', 'w', 't']
}
Output:
ValueError: Index contains duplicate entries, cannot reshape
While you can use pivot_table
method when your have duplicate values in index and column
out = df.pivot_table(index='foo', columns='bar', values='baz')
print(out)
Output:
bar A B C
foo
one 1.5 NaN 3.0
two 4.0 5.0 6.0
Hope this helps!
Comment From: mheskett
thank you. so in that case, the ValueError message is misleading. I can raise a separate issue about that. it should read "must contain unique combinations of index and column"
Comment From: rhshadrach
I can raise a separate issue about that.
We can rework this issue instead. Why do you feel the ValueError is misleading?
Comment From: mroeschke
Closing as needing more information to be actionable