Code Sample

import pandas as pd

df = pd.DataFrame(columns = ['a', 'b'])

def foo(row):
    return True

def bar(row):
    row['a']
    return True

t = df.apply(bar, axis=1)
print(type(t))

t = df.apply(foo, axis=1)
print(type(t))

Problem description

When apply individual functions on the same empty DataFrame, it return different types.

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

That's wired since their only difference is row['a'] expression which is never executed.

When df is not empty, both return pd.Series.

Test on pandas 1.19.1 and latest version 0.20.2.

Expected Output

I expected it always return pd.Series

Output of pd.show_versions()

2017-06-07 19:52:12 [pip.vcs] DEBUG: Registered VCS backend: git
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: hg
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: svn
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: bzr

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: None
pip: 7.1.0
setuptools: 18.0.1
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: None
sqlalchemy: 0.9.10
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

Comment From: TomAugspurger

I suspect that bar raises a KeyError, which is caught inside .apply and sends it down a different code path. You're welcome to take a look at whats going on.

Comment From: jreback

agree with @TomAugspurger might be excepting out of the inner loop. @gzcf if you want to investigate and see if you can make a fix that passes the test suite would be fine.

Comment From: gzcf

I'm glad to help. Let me take some time to fix it.

Comment From: gzcf

After I read related codes, I found this is a intended behavior. Check issue #2476 and _apply_empty_result in 'pandas/core/frame.py'.

    def _apply_empty_result(self, func, axis, reduce, *args, **kwds):
        if reduce is None:
            reduce = False
            try:
                reduce = not isinstance(func(_EMPTY_SERIES, *args, **kwds),
                                        Series)
            except Exception:
                pass

        if reduce:
            return Series(NA, index=self._get_agg_axis(axis))
        else:
            return self.copy()

Look, these code will try guessing return type by calling func an empty Series. I don't think this is a good implementation. it's bad to except Exception, it will swallow all exceptions. At many cases, calling func with an empty Series will raise KeyError. But I am new to pandas source, I am not sure what's next to do.

There are some choices:

  • Default to Series without guessing type, maybe give some warning message meanwhile
  • Default to DataFrame...
  • Don't change this behavior
  • Let it fail and raise exception to user. This is original behavior.

Please give me some advice.

Comment From: Mega-Tom

I think the correct resolution is to use a series with data types from the data frame, instead of _EMPTY_SERIES. Just using simple default values for each datatype should make it much less likely to raise an exception.