similar to how we work with .dt
, .str
, and .cat
accessors, it might be nice to expose an .interval
accessor; in particular this might make nice indexing expressions, xref to #16316
http://stackoverflow.com/questions/44088460/interval-datatype-in-pandas-find-midpoint-left-center-etc/44088970#44088970
In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)})
In [14]: df
Out[14]:
distances month value
0 0 1 0
1 1 1 1
2 2 2 2
3 3 2 3
In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean()
In [16]: result
Out[16]:
month distances
1 (-0.003, 1.5] 0.5
2 (1.5, 3.0] 2.5
Name: value, dtype: float64
In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left
Out[17]: Float64Index([-0.003, 1.5], dtype='float64')
In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right
Out[18]: Float64Index([1.5, 3.0], dtype='float64')
In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid
Out[19]: Float64Index([0.7485, 2.25], dtype='float64')
Comment From: jreback
cc @shoyer @zfrenchee @buyology @TomAugspurger
Comment From: jreback
e.g. this might make a reasonable syntax for indexing
df.loc[df.my_interval_column.interval.overlaps(.....)]
df.loc[df.my_interval_column.interval.contains(....)]
we do with for example now
In [20]: df = pd.DataFrame({'A': pd.date_range('20170101', periods=10), 'value': range(10)})
In [21]: df.loc[df.A.dt.weekday]
Out[21]:
A value
6 2017-01-07 6
0 2017-01-01 0
1 2017-01-02 1
2 2017-01-03 2
3 2017-01-04 3
4 2017-01-05 4
5 2017-01-06 5
6 2017-01-07 6
0 2017-01-01 0
1 2017-01-02 1
In [22]: df.loc[df.A.dt.weekday==2]
Out[22]:
A value
3 2017-01-04 3
In [23]: df.loc[df.A.dt.weekday==1]
Out[23]:
A value
2 2017-01-03 2
9 2017-01-10 9
Comment From: jreback
cc @jschendel