Golang x/tools/gopls/internal/analysis: best-effort simplification of custom comparison functions

This is a speculative feature proposal for a new analyzer. I would be sympathetic to an objection that this is too prescriptive, too complex, or too narrow to be included as a hint in gopls. I'd be happy to just implement this downstream for my own use case.

Background

A search of open-source and Google-internal Go code shows that a large majority of custom comparison functions (for slices.SortFunc and friends) fall into one of two buckets:

ordering by a field value (x < y iff x.Field < y.Field)
lexicographic ordering (order by a field, but if they're equal, order by a different field, and so on)

(Really, ordering by a field value is a degenerate case of lexicographic ordering.)

There are a variety of ways that code authors express a lexicographic ordering, too many to capture with a straightforward AST analysis (examples below). But most lex ordering implementations can be grouped into a small handful of control-flow patterns.

Go 1.22 introduced cmp.Or, which enables a simple and uniform way to express lexicographic orderings:

cmp.Or(
  cmp.Compare(x.Field, y.Field),
  cmp.Compare(x.OtherField, y.OtherField)
)

Analyzer idea

There could be an analyzer that identifies a few simple patterns for custom comparisons (as function literals in calls to slices.SortFunc et al.) and suggests simplification using cmp.Or and/or cmp.Compare.

If a simple analysis can't detect a lexicographic ordering, we will not suggest any changes.

Ulterior motive

There are tons of calls to an old version of x/exp/slices in Google code, where orderings are "less"-style (return true iff a < b) instead of "cmp"-style (return an int, so that a comparison a _ b is written as compare(a, b) _ 0).

An analyzer that handles a few common patterns for custom comparison functions would allow for conversion of 90+% of calls to x/exp/slices.SortFunc to the standard slices.SortFunc, saving a substantial amount of manual conversion work.

Open questions

Does Go tooling want to support a migration for users relying on old unstable experimental packages? If not, a more narrowly focused analyzer could still provide value for pre-go1.22 code (and for code where the authors were unaware of cmp.Or or cmp.Compare)
Would we want to handle certain special comparisons, or leave it simple? (e.g. ~~using strings.Compare for strings instead of cmp.Compare~~ [per #71270 we don't want to do this], or handling time.Compare rather than restricting the analyzer to only look at fields with types satisfying cmp.Ordered)
Trying to preserve comments from the original code would be really complex at best and probably infeasible in practice. We don't like when analyzers remove comments. Should we just refuse to suggest a fix if the function contains any comments?

Examples

// simple field comparison
slices.SortFunc(xs, func(a, b X) int {
  if a.Field < b.Field {
    return -1
  }
  if b.Field < a.Field {
    return 1
  }
  return 0
})

// rewrite to:
slices.SortFunc(xs, func(a, b X) int { return cmp.Compare(a.Field, b.Field) })

// lexicographical ordering
slices.SortFunc(xs, func(a, b X) int {
  if a.ID < b.ID {
    return -1
  }
  if a.ID > b.ID {
    return 1
  }
  if a.Name < b.Name {
    return -1
  }
  if a.Name > b.Name {
    return 1
  }
  return 0
}

// rewrite to:
slices.SortFunc(xs, func(a, b X) int {
  return cmp.Or(cmp.Compare(a.ID, b.ID), 
    cmp.Compare(a.Name, b.Name))
})

// another lex ordering (in https://cs.opensource.google/go/x/tools/+/master:internal/modindex/modindex.go)
slices.SortFunc(w.newIndex.Entries, func(l, r Entry) int {
  if n := strings.Compare(l.PkgName, r.PkgName); n != 0 {
    return n
  }
  return strings.Compare(l.ImportPath, r.ImportPath)
})

// rewrite to:
slices.SortFunc(w.newIndex.Entries, func(l, r Entry) int {
  return cmp.Or(strings.Compare(l.PkgName, r.PkgName),
    strings.Compare(l.ImportPath, r.ImportPath))
})

// using the old version of x/exp/slices
slices.SortFunc(xs, func(a, b X) bool { return a.Field1 < b.Field1 }
slices.SortFunc(ys, func(a, b Y) bool {
  if a.Field1 != b.Field1 {
    return a.Field1 < b.Field1
  }
  return a.Field2 < b.Field2
})

// import standard slices package and rewrite to:
slices.SortFunc(xs, func(a, b X) int { return cmp.Compare(a.Field1, b.Field1) }
slices.SortFunc(xs, func(a, b X) int {
  return cmp.Or(cmp.Compare(a.Field1, b.Field1), 
    cmp.Compare(a.Field2, b.Field2))
})

cc: @adonovan

Comment From: gabyhelp

Related Issues

Related Code Changes

strings: intrinsify and optimize Compare

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: kwjw

There is a performance trade-off I should have mentioned. Using cmp.Or evaluates each comparison every time.

Even though cmp.Or makes lexicographic comparisons clearer and less error-prone, it could be annoying to get a hint if you're spelling out a longer short-circuiting function intentionally.

Comment From: adonovan

There is a performance trade-off I should have mentioned. Using cmp.Or evaluates each comparison every time.

Yeah, the eager evaluation of cmp.Or is really undesirable in case of nontrivial values (e.g. strings or bigger things). Still, the cmp.Or trick is very clever (and even permits negation: -cmp.Compare(...)). If only we had Lisp-style macros...

I suspect there is room for a style-improvement analyzer that simplifies the verbose three-check functions (<, >, ==) into a single three-valued comparison. Also, comparison functions that use weird if-nesting should be replaced by a Iinear sequence of checks of primary, secondary, etc, keys. I feel like I have seen a number of errors in such logic, but we'd need more data to know what the actual problems are.

If short function literals become a thing, a function such as CompareX below could efficiently reduce the redundancy of the field (or whatever) accessor:

func compareFoo(a, b Foo) bool {
    if sign := cmp.CompareX(a, b, (x) -> x.f); sign != 0 {
        return sign
    }
    ...etc...
    return 0
}

Comment From: kwjw

I suspect there is room for a style-improvement analyzer that simplifies the verbose three-check functions (<, >, ==) into a single three-valued comparison.

Agreed, this would be helpful.

Also, comparison functions that use weird if-nesting should be replaced by a Iinear sequence of checks of primary, secondary, etc, keys.

Weird nesting seems to be common in "less"-style comparisons (sort.Slice and the old x/exp/slices.SortFunc). From a quick glance, it looks like "cmp"-style comparisons don't tend to have this problem.

A couple of considerations:

I still can't think of a great way to handle comments if we want to rewrite verbose comparisons.
There are a huge number of uses like if a.GetX() != b.GetX() { return cmp.Compare(a.GetX(), b.GetX()) }. Unfortunately we can't 100% safely rewrite code where methods are used like fields, since methods can have arbitrary side-effects or return arbitrary values. (This is maybe a downside of the new opaque protobuf API.)

On the topic of sort.Slice, I saw there is a TODO for smarter analysis to replace sort.Slice calls with slices.SortFunc. The logic there would also apply nicely to old "less"-style x/exp/slices.SortFunc calls. What do you think?

Comment From: adonovan

There are a huge number of uses like if a.GetX() != b.GetX() { return cmp.Compare(a.GetX(), b.GetX()) }. Unfortunately we can't 100% safely rewrite code where methods are used like fields, since methods can have arbitrary side-effects

While it's true in general that we shouldn't ignore side effects of functions, and we have learned the hard way that we really shouldn't cut corners with modernizers, it's hard to imagine that a GetX method with significant side effects, called from the narrow situation of a lambda passed to CompareFunc, could be anything other than a bug.

On the topic of sort.Slice, I saw there is a TODO for smarter analysis to replace sort.Slice calls with slices.SortFunc. The logic there would also apply nicely to old "less"-style x/exp/slices.SortFunc calls. What do you think?

That seems like a reasonable transformation for a modernizer.