Code Sample, a copy-pastable example if possible
There might be a simpler minimal example, but I was already really struggeling to identify this problem and to find this example. The problem seems to be related to strings reappearing in different positions of the tuples, different length tuples and unequal sets of columns
.
(btw. I'm aware of MultiIndex, I would like to convert the Index to MultiIndex after the concatenation)
items_a = [("b","e","c","a","b"),
("e","e","c","a","c"),
("e","a","c","a","d"),
("b","a","b","e"),
("e","b","a"),
("e","c","c","a")]
items_b = [("b","e","c","a","b"),
("a","a","d","b","d"),
("a","b","d","b","e"),
("c","b","c","a"),
("a","c","b"),
("a","d","d","b")]
df1=pd.DataFrame([range(6)], columns=items_a)
df2=pd.DataFrame([range(6)], columns=items_b)
pd.concat([df1, df2])
Problem description
This yields
AssertionError: invalid dtype determination in get_concat_dtype
Expected Output
Something similar to
df1.columns = [str(c) for c in df1.columns]
df2.columns = [str(c) for c in df2.columns]
pd.concat([df1, df2])
Output of pd.show_versions()
(same result with pandas=0.17.1)
Comment From: jreback
you are fighting pandas here - i suppose this could be supported but its not efficient in the least, nor very useful in terms of indexing
you would very likely need a custom index type to have an real support here - quite a major effort - if you wanted to contribute this great
Comment From: jreback
cc @toobaz
Comment From: Mofef
Oh, so you are aware of the problem? So could you explain me a bit more about why it fails, please? I can assure you that i'm not fighting pandas on purpose. ;) But I don't really understand what is going on. So I can't find a workaround except of converting the column names to string and back. I can't even consistently reproduce the error yet. Maybe a more informative error message would already be enough to resolve this issue? I sure would help as soon as I understand the problem.
I in case you were wondering, what I actually do is to convert a tree structures to a pandas DataFrame. One line representing one tree. (The trees are very similar but not always identical in structure). So those tuples (columns) give the path through the tree. The data is given by the leafs.
The problem apparently occures when a child contains a similar object as its parent. For some cases it fails with the same error also if I use pd.MultiIndex.from_tuples
. Though not in the example described in the OP.
Comment From: jreback
why are you not using a MultiIndex?
Comment From: Mofef
Originally i wanted to convert it to a MultiIndex after concatenating, but sure, that would be an acceptable workaround. Though, for my case it also failed with the same error when concatenating. (Not for the example above)
Comment From: jreback
then show an example using MI that fails
Comment From: Mofef
Weird... my testcase must have been flawed... I can't reproduce it anymore. So thanks a lot for the help.
Still, if you had the patience to explain I would be really interested in what is going wrong in the example above.
Comment From: jreback
this actually breaks in a different place in master. cc @TomAugspurger
In [3]: items_a = [("b","e","c","a","b"),
...: ("e","e","c","a","c"),
...: ("e","a","c","a","d"),
...: ("b","a","b","e"),
...: ("e","b","a"),
...: ("e","c","c","a")]
...: items_b = [("b","e","c","a","b"),
...: ("a","a","d","b","d"),
...: ("a","b","d","b","e"),
...: ("c","b","c","a"),
...: ("a","c","b"),
...: ("a","d","d","b")]
...: df1=pd.DataFrame([range(6)], columns=items_a)
...: df2=pd.DataFrame([range(6)], columns=items_b)
...: pd.concat([df1, df2])
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-355de89f5317> in <module>()
13 df1=pd.DataFrame([range(6)], columns=items_a)
14 df2=pd.DataFrame([range(6)], columns=items_b)
---> 15 pd.concat([df1, df2])
16
~/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
211 verify_integrity=verify_integrity,
212 copy=copy)
--> 213 return op.get_result()
214
215
~/pandas/pandas/core/reshape/concat.py in get_result(self)
406 new_data = concatenate_block_managers(
407 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 408 copy=self.copy)
409 if not self.copy:
410 new_data._consolidate_inplace()
~/pandas/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
5372 values = values.view()
5373 b = b.make_block_same_class(values, placement=placement)
-> 5374 elif is_uniform_join_units(join_units):
5375 b = join_units[0].block.concat_same_type(
5376 [ju.block for ju in join_units], placement=placement)
~/pandas/pandas/core/internals.py in is_uniform_join_units(join_units)
5396 # no blocks that would get missing values (can lead to type upcasts)
5397 # unless we're an extension dtype.
-> 5398 all(not ju.is_na or ju.block.is_extension for ju in join_units) and
5399 # no blocks with indexers (as then the dimensions do not fit)
5400 all(not ju.indexers for ju in join_units) and
~/pandas/pandas/core/internals.py in <genexpr>(.0)
5396 # no blocks that would get missing values (can lead to type upcasts)
5397 # unless we're an extension dtype.
-> 5398 all(not ju.is_na or ju.block.is_extension for ju in join_units) and
5399 # no blocks with indexers (as then the dimensions do not fit)
5400 all(not ju.indexers for ju in join_units) and
AttributeError: 'NoneType' object has no attribute 'is_extension'
> /Users/jreback/pandas/pandas/core/internals.py(5398)<genexpr>()
5396 # no blocks that would get missing values (can lead to type upcasts)
5397 # unless we're an extension dtype.
-> 5398 all(not ju.is_na or ju.block.is_extension for ju in join_units) and
5399 # no blocks with indexers (as then the dimensions do not fit)
5400 all(not ju.indexers for ju in join_units) and
I didn't think a JoinUnit
could be None
Comment From: Mofef
#20757 might be what caused my observation that this issue also occured when using MultiIndex (referring to @jreback 's comment here https://github.com/pandas-dev/pandas/issues/20597#issuecomment-378609474 )
Comment From: TomAugspurger
Is this a blocker for 0.23?
Comment From: jreback
no - it’s pretty unusual
Comment From: jbrockmendel
Works on main, closing.