Pandas Applying on empty DataFrame returns different types

Code Sample

import pandas as pd

df = pd.DataFrame(columns = ['a', 'b'])

def foo(row):
    return True

def bar(row):
    row['a']
    return True

t = df.apply(bar, axis=1)
print(type(t))

t = df.apply(foo, axis=1)
print(type(t))

Problem description

When apply individual functions on the same empty DataFrame, it return different types.

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

That's wired since their only difference is row['a'] expression which is never executed.

When df is not empty, both return pd.Series.

Test on pandas 1.19.1 and latest version 0.20.2.

Expected Output

I expected it always return pd.Series

Output of `pd.show_versions()`

2017-06-07 19:52:12 [pip.vcs] DEBUG: Registered VCS backend: git
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: hg
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: svn
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: bzr

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: None
pip: 7.1.0
setuptools: 18.0.1
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: None
sqlalchemy: 0.9.10
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

Comment From: TomAugspurger

I suspect that bar raises a KeyError, which is caught inside .apply and sends it down a different code path. You're welcome to take a look at whats going on.

Comment From: jreback

agree with @TomAugspurger might be excepting out of the inner loop. @gzcf if you want to investigate and see if you can make a fix that passes the test suite would be fine.

Comment From: gzcf

I'm glad to help. Let me take some time to fix it.

Comment From: gzcf

After I read related codes, I found this is a intended behavior. Check issue #2476 and _apply_empty_result in 'pandas/core/frame.py'.

    def _apply_empty_result(self, func, axis, reduce, *args, **kwds):
        if reduce is None:
            reduce = False
            try:
                reduce = not isinstance(func(_EMPTY_SERIES, *args, **kwds),
                                        Series)
            except Exception:
                pass

        if reduce:
            return Series(NA, index=self._get_agg_axis(axis))
        else:
            return self.copy()

Look, these code will try guessing return type by calling func an empty Series. I don't think this is a good implementation. it's bad to except Exception, it will swallow all exceptions. At many cases, calling func with an empty Series will raise KeyError. But I am new to pandas source, I am not sure what's next to do.

There are some choices:

Default to Series without guessing type, maybe give some warning message meanwhile
Default to DataFrame...
Don't change this behavior
Let it fail and raise exception to user. This is original behavior.

Please give me some advice.

Comment From: Mega-Tom

I think the correct resolution is to use a series with data types from the data frame, instead of _EMPTY_SERIES. Just using simple default values for each datatype should make it much less likely to raise an exception.

Pandas Applying on empty DataFrame returns different types

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`