Feature Type

  • [ ] Adding new functionality to pandas

  • [x] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish I could pass method and tolerance kwargs to pandas.Index.slice_indexer, as I can to pandas.Index.get_indexer.

Feature Description

Add new parameters to pandas.Index.slice_indexer, so that it looks like

class Index:
    def slice_indexer(start=None, end=None, step=None, method=None, tolerance=None):
        """
        Compute the slice indexer for input labels and step.

        Index needs to be ordered and unique.

        Parameters
        ----------
        start : label, default None
            If None, defaults to the beginning.
        end : label, default None
            If None, defaults to the end.
        step : int, default None
        method : {None, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’}, optional
            - default: exact matches only.
            - pad / ffill: find the PREVIOUS index value if no exact match.
            - backfill / bfill: use NEXT index value if no exact match
            - nearest: use the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.
        tolerance : optional 
            Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

            Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

        Returns
        -------
        slice

Alternative Solutions

Currently I've effectively written a custom version of .slice_indexer that calls .get_indexer on the start and end bounds, but I'm worried that doing this downstream will miss edge cases compared to solving it upstream in pandas.

Additional Context

This would be nice for use within the internals of xarray's .sel() methods. See https://github.com/pydata/xarray/issues/10710.

Comment From: rhshadrach

Thanks for the request, I'm confused as to what the desired behavior of the additional arguments are. Are you possibly suggesting that slice_indexer not return a slice? Or that the provided arguments only apply to what is determined to be the start / end?

Closing until further details are provided - happy to reopen!

Comment From: TomNicholas

Are you possibly suggesting that slice_indexer not return a slice?

No I still want it to return a slice.

Or that the provided arguments only apply to what is determined to be the start / end?

Exactly - so that the start and end points of the slice can be fuzzily specified.

In my xarray PR here you can see the consequences of these kwargs not being available in pandas. Inside my _query_slice function you can see that I now am forced to have two totally different codepaths, one which uses slice_indexer but doesn't support method and tolerance, and one that does support method and tolerance but only by some much more complicated logic involving multiple calls to get_indexer. It would be nice to consolidate these codepaths. There is further discussion about the desired behaviour in that PR too.

Comment From: rhshadrach

So let's say the user is using ffill on the index 5, 3, 1, 4 and specifies start=2. What start will be chosen? And similarly with bfill.

Comment From: rhshadrach

Unfortunately the desire here is still not clear to me. Closing until details are added.