We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.

The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.

Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.

/* Package xiter implements basic adapters for composing iterator sequences:

[Concat] and [Concat2] concatenate sequences.
[Equal], [Equal2], [EqualFunc], and [EqualFunc2] check whether two sequences contain equal values.
[Filter] and [Filter2] filter a sequence according to a function f.
[Limit] and [Limit2] truncate a sequence after n items.
[Map] and [Map2] apply a function f to a sequence.
[Merge], [Merge2], [MergeFunc], and [MergeFunc2] merge two ordered sequences.
[Reduce] and [Reduce2] combine the values in a sequence.
[Zip] and [Zip2] iterate over two sequences in parallel.

*/

package xiter

import (
    "cmp"
    "iter"
)

// Concat returns an iterator over the concatenation of the sequences.
func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
    return func(yield func(V) bool) {
        for _, seq := range seqs {
            for e := range seq {
                if !yield(e) {
                    return
                }
            }
        }
    }
}

// Concat2 returns an iterator over the concatenation of the sequences.
func Concat2[K, V any](seqs ...iter.Seq2[K, V]) iter.Seq2[K, V] {
    return func(yield func(K, V) bool) {
        for _, seq := range seqs {
            for k, v := range seq {
                if !yield(k, v) {
                    return
                }
            }
        }
    }
}

// Equal reports whether the two sequences are equal.
func Equal[V comparable](x, y iter.Seq[V]) bool {
    for z := range Zip(x, y) {
        if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
            return false
        }
    }
    return true
}

// Equal2 reports whether the two sequences are equal.
func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
    for z := range Zip2(x, y) {
        if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
            return false
        }
    }
    return true
}

// EqualFunc reports whether the two sequences are equal according to the function f.
func EqualFunc[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2], f func(V1, V2) bool) bool {
    for z := range Zip(x, y) {
        if z.Ok1 != z.Ok2 || !f(z.V1, z.V2) {
            return false
        }
    }
    return true
}

// EqualFunc2 reports whether the two sequences are equal according to the function f.
func EqualFunc2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2], f func(K1, V1, K2, V2) bool) bool {
    for z := range Zip2(x, y) {
        if z.Ok1 != z.Ok2 || !f(z.K1, z.V1, z.K2, z.V2) {
            return false
        }
    }
    return true
}

// Filter returns an iterator over seq that only includes
// the values v for which f(v) is true.
func Filter[V any](f func(V) bool, seq iter.Seq[V]) iter.Seq[V] {
    return func(yield func(V) bool) {
        for v := range seq {
            if f(v) && !yield(v) {
                return
            }
        }
    }
}

// Filter2 returns an iterator over seq that only includes
// the pairs k, v for which f(k, v) is true.
func Filter2[K, V any](f func(K, V) bool, seq iter.Seq2[K, V]) iter.Seq2[K, V] {
    return func(yield func(K, V) bool) {
        for k, v := range seq {
            if f(k, v) && !yield(k, v) {
                return
            }
        }
    }
}

// Limit returns an iterator over seq that stops after n values.
func Limit[V any](seq iter.Seq[V], n int) iter.Seq[V] {
    return func(yield func(V) bool) {
        if n <= 0 {
            return
        }
        for v := range seq {
            if !yield(v) {
                return
            }
            if n--; n <= 0 {
                break
            }
        }
    }
}

// Limit2 returns an iterator over seq that stops after n key-value pairs.
func Limit2[K, V any](seq iter.Seq2[K, V], n int) iter.Seq2[K, V] {
    return func(yield func(K, V) bool) {
        if n <= 0 {
            return
        }
        for k, v := range seq {
            if !yield(k, v) {
                return
            }
            if n--; n <= 0 {
                break
            }
        }
    }
}

// Map returns an iterator over f applied to seq.
func Map[In, Out any](f func(In) Out, seq iter.Seq[In]) iter.Seq[Out] {
    return func(yield func(Out) bool) {
        for in := range seq {
            if !yield(f(in)) {
                return
            }
        }
    }
}

// Map2 returns an iterator over f applied to seq.
func Map2[KIn, VIn, KOut, VOut any](f func(KIn, VIn) (KOut, VOut), seq iter.Seq2[KIn, VIn]) iter.Seq2[KOut, VOut] {
    return func(yield func(KOut, VOut) bool) {
        for k, v := range seq {
            if !yield(f(k, v)) {
                return
            }
        }
    }
}

// Merge merges two sequences of ordered values.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered,
// the output sequence will not be ordered,
// but it will still contain every value from x and y exactly once.
//
// Merge is equivalent to calling MergeFunc with cmp.Compare[V]
// as the ordering function.
func Merge[V cmp.Ordered](x, y iter.Seq[V]) iter.Seq[V] {
    return MergeFunc(x, y, cmp.Compare[V])
}

// MergeFunc merges two sequences of values ordered by the function f.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When equal values appear in both sequences,
// the output contains the values from x before the values from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every value from x and y exactly once.
func MergeFunc[V any](x, y iter.Seq[V], f func(V, V) int) iter.Seq[V] {
    return func(yield func(V) bool) {
        next, stop := iter.Pull(y)
        defer stop()
        v2, ok2 := next()
        for v1 := range x {
            for ok2 && f(v1, v2) > 0 {
                if !yield(v2) {
                    return
                }
                v2, ok2 = next()
            }
            if !yield(v1) {
                return
            }
        }
        for ok2 {
            if !yield(v2) {
                return
            }
            v2, ok2 = next()
        }
    }
}

// Merge2 merges two sequences of key-value pairs ordered by their keys.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered by their keys,
// the output sequence will not be ordered by its keys,
// but it will still contain every pair from x and y exactly once.
//
// Merge2 is equivalent to calling MergeFunc2 with cmp.Compare[K]
// as the ordering function.
func Merge2[K cmp.Ordered, V any](x, y iter.Seq2[K, V]) iter.Seq2[K, V] {
    return MergeFunc2(x, y, cmp.Compare[K])
}

// MergeFunc2 merges two sequences of key-value pairs ordered by the function f.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When pairs with equal keys appear in both sequences,
// the output contains the pairs from x before the pairs from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every pair from x and y exactly once.
func MergeFunc2[K, V any](x, y iter.Seq2[K, V], f func(K, K) int) iter.Seq2[K, V] {
    return func(yield func(K, V) bool) {
        next, stop := iter.Pull2(y)
        defer stop()
        k2, v2, ok2 := next()
        for k1, v1 := range x {
            for ok2 && f(k1, k2) > 0 {
                if !yield(k2, v2) {
                    return
                }
                k2, v2, ok2 = next()
            }
            if !yield(k1, v1) {
                return
            }
        }
        for ok2 {
            if !yield(k2, v2) {
                return
            }
            k2, v2, ok2 = next()
        }
    }
}

// Reduce combines the values in seq using f.
// For each value v in seq, it updates sum = f(sum, v)
// and then returns the final sum.
// For example, if iterating over seq yields v1, v2, v3,
// Reduce returns f(f(f(sum, v1), v2), v3).
func Reduce[Sum, V any](f func(Sum, V) Sum, sum Sum, seq iter.Seq[V]) Sum {
    for v := range seq {
        sum = f(sum, v)
    }
    return sum
}

// Reduce2 combines the values in seq using f.
// For each pair k, v in seq, it updates sum = f(sum, k, v)
// and then returns the final sum.
// For example, if iterating over seq yields (k1, v1), (k2, v2), (k3, v3)
// Reduce returns f(f(f(sum, k1, v1), k2, v2), k3, v3).
func Reduce2[Sum, K, V any](f func(Sum, K, V) Sum, sum Sum, seq iter.Seq2[K, V]) Sum {
    for k, v := range seq {
        sum = f(sum, k, v)
    }
    return sum
}

// A Zipped is a pair of zipped values, one of which may be missing,
// drawn from two different sequences.
type Zipped[V1, V2 any] struct {
    V1  V1
    Ok1 bool // whether V1 is present (if not, it will be zero)
    V2  V2
    Ok2 bool // whether V2 is present (if not, it will be zero)
}

// Zip returns an iterator that iterates x and y in parallel,
// yielding Zipped values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip is a useful building block for adapters that process
// pairs of sequences. For example, Equal can be defined as:
//
//  func Equal[V comparable](x, y iter.Seq[V]) bool {
//      for z := range Zip(x, y) {
//          if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
//              return false
//          }
//      }
//      return true
//  }
func Zip[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2]) iter.Seq[Zipped[V1, V2]] {
    return func(yield func(z Zipped[V1, V2]) bool) {
        next, stop := iter.Pull(y)
        defer stop()
        v2, ok2 := next()
        for v1 := range x {
            if !yield(Zipped[V1, V2]{v1, true, v2, ok2}) {
                return
            }
            v2, ok2 = next()
        }
        var zv1 V1
        for ok2 {
            if !yield(Zipped[V1, V2]{zv1, false, v2, ok2}) {
                return
            }
            v2, ok2 = next()
        }
    }
}

// A Zipped2 is a pair of zipped key-value pairs,
// one of which may be missing, drawn from two different sequences.
type Zipped2[K1, V1, K2, V2 any] struct {
    K1  K1
    V1  V1
    Ok1 bool // whether K1, V1 are present (if not, they will be zero)
    K2  K2
    V2  V2
    Ok2 bool // whether K2, V2 are present (if not, they will be zero)
}

// Zip2 returns an iterator that iterates x and y in parallel,
// yielding Zipped2 values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped2 values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip2 is a useful building block for adapters that process
// pairs of sequences. For example, Equal2 can be defined as:
//
//  func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
//      for z := range Zip2(x, y) {
//          if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
//              return false
//          }
//      }
//      return true
//  }
func Zip2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2]) iter.Seq[Zipped2[K1, V1, K2, V2]] {
    return func(yield func(z Zipped2[K1, V1, K2, V2]) bool) {
        next, stop := iter.Pull2(y)
        defer stop()
        k2, v2, ok2 := next()
        for k1, v1 := range x {
            if !yield(Zipped2[K1, V1, K2, V2]{k1, v1, true, k2, v2, ok2}) {
                return
            }
            k2, v2, ok2 = next()
        }
        var zk1 K1
        var zv1 V1
        for ok2 {
            if !yield(Zipped2[K1, V1, K2, V2]{zk1, zv1, false, k2, v2, ok2}) {
                return
            }
            k2, v2, ok2 = next()
        }
    }
}

Comment From: gophun

The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable?

Comment From: gophun

What about an adapter that converts an iter.Seq[V] to an iter.Seq2[int, V] and an adapter that converts an iter.Seq2[K, V] to an iter.Seq[V]?

Comment From: zephyrtronium

Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation.

Comment From: earthboundkid

May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation.

Comment From: earthboundkid

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

Comment From: DeedleFake

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

I'd actually prefer func Reduce[Sum, V any](seq Seq[V], sum Sum, f func(Sum, V) Sum) Sum.

Edit: I just realized that if Reduce() is being used to build an array, putting sum first puts everything in the same order as Append() and other functions that put the destination first. I'm not sure if that's worth it or not.

Comment From: rsc

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

Comment From: DeedleFake

The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare

nonNegative := xiter.Filter(
  xiter.Map(
    bufio.Lines(r),
    parseLine,
  ),
  func(v int) bool { return v >= 0 },
)

vs.

nonNegative := bufio.Lines(r).
  Map(parseLine).
  Filter(func(v int) bool { return v >= 0 })

Go's a little weird because of the need to put the .on the previous line, but other than that one oddity, which I could get used to, the second is better in every way. It reads in the order that actions actually happen, it's less repetitive, etc. The only real way to emulate it currently is something like

lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 })

That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls.

Comment From: ianlancetaylor

What type does bufio.Lines return to make that work in Go? What methods does that type support? What is the type of nonNegative? I mean these as honest questions. Can we write this kind of code in Go today, or would we need new language features?

Comment From: hherman1

You would probably have to wrap the base iterator like:

stream.New(bufio.Lines).
    Filter(…).
    …

Comment From: DeedleFake

@ianlancetaylor

Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an iter.Seq[string]. In this case, the idea was that it would internally use a bufio.Scanner to yield lines from an io.Reader or something. My original code had an anonymous func(string) int instead of the vague parseLine but I removed it because it was clogging up the example with irrelevant code and I didn't clarify when I did.

@hherman1

Not necessarily. The transformative and sink functions on iterators could just be defined as methods on iter.Seq.

Comment From: hherman1

~But iter.Seq is an interface type no? Are you saying it should be a struct type?~

I was wrong, it’s not an interface.

Comment From: benhoyt

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle? Most other functions in the stdlib take funcs as the last parameter, such as sort.Slice, slices.*Func, ServeMux.HandleFunc, and so on. This makes code that uses them with inline function literals more readable:

names := xiter.Map(func (p Person) string {
    return p.Name
}, people) // "people" gets lost

// vs

names := xiter.Map(people, func (p Person) string {
    return p.Name
})

Comment From: Merovius

@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

Comment From: DeedleFake

What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as range-over-func exists while I wait.

Comment From: gophun

Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Comment From: Merovius

I can wait a bit longer. If a bad implementation goes in, I'll never get a good version.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

Comment From: DeedleFake

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others? I have no problem with functions like slices.Backwards() and other source function proposals. My only problem is the transformative and sink functions.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do https://github.com/golang/go/issues/49085 in a decade or so.

Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in x/exp that they may not go into the standard library at all, so maybe my point is moot after all. I still think that this is a valid issue with the API design to bring up, but maybe it's a bit off-topic for this particular proposal and should wait until after they're in x/exp and it can be more easily demonstrated how awkward they are to use. I don't like the idea that existing code will be broken when some variant of them does potentially go into the standard library, but it's less of a problem than I was worried about. Never mind. Please ignore my rambling below.

That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a v2 is an option, especially if #61716 is accepted, but that was created out of necessity to deal with problems in an existing package, while this would essentially be purposefully putting problems into a new package. It's not like I'm saying that iterators are unacceptable to me in this state, just that features have been delayed or cut before because of possible changes coming later and that I think that it's prudent to discuss the possibility here. That just happened in the last few weeks in the maps package because of the possibility of the acceptance of #61405. I think the same should be done with the transformative and sink functions for now, or at the very least those functions should be planned to stay in x/exp until some way to clean up the API is decided on, that's all.

One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible.

Comment From: gophun

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those?

Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.

Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. "

This means that not everyone who has read these proposals in advance believes that this part is a good idea.

Comment From: jba

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls.

As an analogy, Go has if. Isn't the intention of if to allow conditional execution? Why then shouldn't Go have the ternary operator ?:? Because it often leads to hard-to-read code.

Comment From: rsc

Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly.

Comment From: Merovius

@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say

I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics.

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision.

Comment From: DeedleFake

Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there.

Comment From: thediveo

Hope that it's not noise: I wondered if naming it the sum parameter might be implying to the less experienced dev that reduce does only summation, so I looked at Javascript's array reduce: that uses accumulator. I don't know if that is much better, I just wanted to point it out. If anything, let's have a good laugh.

Comment From: jimmyfrasche

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time. Those can be recovered from the proposed with some postprocessing but I'd hate to have to always do that.

These should be considered along with Limit:

LimitFunc - stop iterating after a predicate matches (often called TakeWhile in other languages)

Skip, SkipFunc - drop the first n items (or until the predicate matches) before yielding (opposite of Limit/LimitFunc, often called drop/dropWhile)

Comment From: jba

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time.

Can you explain the difference? Is it just that zip typically stops at the end of the shorter sequence? That is definitely less useful as a building block, and easy to write given these functions. What are some examples where stopping at the shortest is better?

Comment From: jimmyfrasche

zip stops after the shorter sequence. zipLongest pads out the missing values of the shorter sequence with a specified value.

The provided ones are more general and can be used to build those but I can't really think of any time I've used zip where I needed to know that. I've always either known the lengths were equal by construction so it didn't matter or couldn't do anything other than drop the excess so it didn't matter. Maybe that's peculiar to me and the situations in which I reach for zip, but they've been defined like that in every language I can think I've used which has to be some kind of indicator that I'm not alone in this.

I'm not arguing for them to be replaced with the less general more common versions: I want those versions here too so I can use them directly without having to write a shim to the standard definition.

Comment From: jimmyfrasche

Ideally I'd like something like this

func Zip[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq2[V1, V2]
func ZipLongest[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq2[V1, V2]

func ZipAll[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq[Zipped[V1, V2]]
func ZipAll2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq2[K2, V2]) Seq[Zipped2[K1, V1, K2, V2]]

Where ZipAll{,2} are the original and Zip, ZipLongest are defined in terms of them and ZipLongest pads with zero.

Since the more general versions exist, these can stick to the nice case where two 1-iterables form a 2-iterable without needing anything extra so if you can get away with using them they're much more ergonomic.

Comment From: aarzilli

@benhoyt

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle?

They concatenate better:

Map(func(x int) int { return x*2 },
    Filter(func(x int) bool { return x%2 == 2 },
         seq))

to read this you start from the last line and move up, as opposed to:

Map(
     Filter(seq,
         func(x int) bool { return x%2 == 2 }),
    func(x int) int { return x*2})

where you have to start in the middle and zig-zag outwards. For this reason function first is the most common order of arguments. For example:

common lisp (mapcar function list), (remove-if test list)
python itertools map(function, iterable), filter(pred, iterable)
perl map { function} @array, grep { predicate } @array
haskell map :: (a -> b) -> [a] -> [b], filter :: (a -> [Bool] -> [a] -> [a])

the only exception to this, that I know of, is a certain javascript library

Comment From: aarzilli

Should the order of type parameters of Map be inverted? Should it be Map[Out, In]? If a proposal for lightweight function syntax (#21498) is ever accepted the type of In can be easily inferred, but the type of Out probably not. Because the generic instantiation syntax lets you omit only types at the end, the types that are easily inferred should come later.

Comment From: b97tsk

Changing Filter, Map, Reduce etc. to return Ops and a built-in operator to chain Ops might be a good idea.

type Op[In, Out any] func(In) Out

func Filter[V any](f func(V) bool) Op[Seq[V], Seq[V]]
func Map[In, Out any](f func(In) Out) Op[Seq[In], Seq[Out]]
func Reduce[Sum, V any](sum Sum, f func(Sum, V) Sum) Op[Seq[V], Sum]

Usage:

for v := range slices.Values(s) | xiter.Filter(...) | xiter.Map(...) {
    fmt.Println(v)
}

sum := slices.Values(s) | xiter.Filter(...) | xiter.Map(...) | xiter.Reduce(...)
fmt.Println(sum)

// equivalent to

for v := range xiter.Map(...)(xiter.Filter(...)(slices.Values(s))) {
    fmt.Println(v)
}

sum := xiter.Reduce(...)(xiter.Map(...)(xiter.Filter(...)(slices.Values(s))))
fmt.Println(sum)

The compiler can do its best to inline every Op call.

Comment From: Merovius

@b97tsk Your Usage code doesn't make a lot of sense to me. ~~You use a boolean or | operator, but those functions do not involve any booleans.~~ [edit] sorry, I misread. It seems you are suggesting adding a function-piping operator. Note that this has been suggested in the past and been rejected [/edit]. I also don't really see the point of the Op type - what's the advantage of Op[A, B] over just writing func(A) B?

Comment From: earthboundkid

They concatenate better:

Map(func(x int) int { return x*2 }, Filter(func(x int) bool { return x%2 == 2 }, seq))

to read this you start from the last line and move up, as opposed to:

Map( Filter(seq, func(x int) bool { return x%2 == 2 }), func(x int) int { return x*2})

where you have to start in the middle and zig-zag outwards.

Both read strangely because the order of transformations is seq → filter → map, not map → filter → seq.

I think this is the best you can do without chaining:

filteredSeq := iter.Filter(initSeq,func(x int) bool { 
    return x%2 == 0 
})
finalSeq := iter.Map(filteredSeq, func(x int) int { 
    return x*2 
})

If #21498 were accepted, maybe the function definitions could be shortened somehow.

That said, I think one of the nice things about old Go was that there wasn't a "functional" variation, so you never had to sit down a decide "do I write a regular for-loop here? Or a list comprehension? Or use map?" There was just one way to do it. This code is still pretty appealing to me:

func doubleEven(seq iter.Seq[int]) iter.Seq[int] {
    return func(yield func(int) bool) {
        for n := range seq {
            if x%2 == 0 {
                if !yield(x*2) { return }
            }
        }
    }
}

Comment From: aarzilli

That said, I think one of the nice things about old Go was that there wasn't a "functional" variation, so you never had to sit down a decide "do I write a regular for-loop here?

Agreed. AFAIC it would be better to not have them in the standard library at all.

Comment From: zephyrtronium

Agreed. AFAIC it would be better to not have them in the standard library at all.

Note that this proposal is not for the standard library, precisely because "[t]here are concerns about how these would affect idiomatic Go code." Not that that means you shouldn't dislike the idea, of course.

Comment From: AndrewHarrisSPU

@aarzilli

They concatenate better:

This was interesting!

Map( Filter(seq, func(x int) bool { return x%2 == 2 }), func(x int) int { return x*2})

go fmt seems to preserve this, which I think reads more clearly just because of the spacing:

xs = Map(Filter(xs,
    func(x int) bool { return x%2 == 0 }),
    func(x int) int { return x * x })

With this formatting, the order of reading this reminds me of pipe-forward |> operators I'm seeing emerge in a number of languages, in addition to the more classic LISP style.

Comment From: DeedleFake

That is nice. Right-to-left is a bit odd, but much better than inside-out. Things do get a bit wonky if any of them have more arguments or if you use multi-line anonymous function, though:

product = Reduce(Filter(Map(numericStrings(),
    func(str string) int {
        n, _ := strconv.ParseInt(str, 10, 0)
        return n
    }),
    func(n int) bool { return n > 0 }),
    0,
    func(acc, cur int) int { return acc * cur })

Only the lines that end in a ) end a function in the chain, which makes for some messy reading with something like Reduce() because of the lack of extra indentation. I think it's probably overall still better than just calling them all manually and the bizarre inside-out mess that makes, especially because I think calls with more arguments are less common and are probably mostly sink functions, like Reduce(). Alternatively, gofmt will preserve putting both of the final arguments to Reduce() on a single line above, i.e. 0, func(acc, cur int) int { ... }, but that might not always be feasible either if combined they're awkwardly long. Another annoyance is the way that you have to count to figure out which function goes with which arguments. Filter() is second from the right so its arguments are second from the bottom.

I like the idea of the |> operator. It's been proposed before multiple times, including in the context of iterators, and if I remember right one of the main arguments against it was that it could encourage people to design their APIs specifically so that they could be used with it, even though that isn't necessarily the best design. I'm not sure of my own opinion on that argument, as you could say the same thing about methods and chained method call APIs, but I do think that it could be a potential problem in a few cases. It would very much not work with multiple returns, which could lead people to try to minimize multiple returns as much as possible, and that also puts it in direct contradiction with the way errors work. With methods that are designed to be chained, they're restricted to only what you define as the one creating the type, but with something like |> the number of possible things it could be useful to leave it open for increases drastically.

Comment From: earthboundkid

The accumulator for product should start with 1, which I feel is an argument against reduce functions. 😊

Comment From: DeedleFake

Ah, whoops. I did just a simple sum originally but changed it for the sake of some variety and forgot to change the initial value.

Comment From: b97tsk

@Merovius

Note that this has been suggested in the past and been rejected

When your kid ~cries~ asks for a toy, you might compromise and purchase one for your kid if you take it seriously. Would you do the same thing for someone else's kid? I mean, this isn't someone else's proposal. Definitely got some weight in it. I think it's worth another go-through in their regular meetings. It's not a bad toy, after all.

I also don't really see the point of the Op type - what's the advantage of Op[A, B] over just writing func(A) B?

I think I would just leave the decision (whether to keep it) or the name choice to someone else to make.

Comment From: Merovius

@b97tsk I don't think the source of the proposal matters a lot. And the exact functions we are talking about on this issue are the primary use case given for any sort of function chaining syntax (note how the proposal I linked to uses Map as a motivation). In other words, nothing has changed since that discussion.

Comment From: AndrewHarrisSPU

The accumulator for product should start with 1, which I feel is an argument against reduce functions. 😊

In all seriousness - other than Equal, is Reduce the only fold here? Reduce seems trivial in the sense that the for-range equivalent is an entirely sane alternative most of the time. The one good reason I can think of to have Reduce is that it might be nice to curry out the iter.Seq input. It makes sense in a library like this, but practically it seems redundant with for-range.

Comment From: jimmyfrasche

Compact{,Func}{,2} would be useful

Comment From: willfaught

func ReduceSum, V any Sum {

It's confusing to call this Sum because it can be for any purpose. Perhaps Accum(ulator) or Result would be clearer?

// Zip returns an iterator that iterates x and y in parallel, // yielding Zipped values of successive elements of x and y. // If one sequence ends before the other, the iteration continues // with Zipped values in which either Ok1 or Ok2 is false, // depending on which sequence ended first.

Zip typically drops elements that don't have a matching element in the other list because the lengths are different. This seems overly complicated. Perhaps it would be better to have a func that ensured a Seq had a minimum length, using a supplied default value to fill gaps, or a special zip func that took the default value and filled in the gaps itself?

// Limit returns an iterator over seq that stops after n values.

See Take, TakeWhile, Drop, and DropWhile in Haskell. All would be useful.

[Concat] and [Concat2] concatenate sequences. [Equal], [Equal2], [EqualFunc], and [EqualFunc2] check whether two sequences contain equal values. [Filter] and [Filter2] filter a sequence according to a function f. [Limit] and [Limit2] truncate a sequence after n items. [Map] and [Map2] apply a function f to a sequence. [Merge], [Merge2], [MergeFunc], and [MergeFunc2] merge two ordered sequences. [Reduce] and [Reduce2] combine the values in a sequence. [Zip] and [Zip2] iterate over two sequences in parallel.

There's too much duplication here, which is the result of having two separate iterator signatures, and needing to serve them both. It would be simpler to have one parameterized signature that could be instantiated with any type, including a 2-tuple-like type if needed. So we would have:

package iter

type Seq[V any] func(yield func(V) bool) bool

type Pair[A, B any] struct {
    A A
    B B
}

Then we could pass around iter.Seq[iter.Pair[int, error]], and only need one version of functions that take iterators.

I suppose the counterargument would be that this wouldn't enable for k, v = range syntax, but I don't think it's important to support that. Iterating over single values has been sufficient for every other language; I see no reason why it's any different for Go. for k, v = range is special syntax enabled by slice and map types being built into the language, and iter.Seq is not.

If we really wanted to shoehorn pair assignments into for/range, we could have it work for iteration values that implement

interface {
    First() sometype
    Second() sometype
}

so we'd have

package iter

type Pair[A, B any] struct {
    A A
    B B
}

func (Pair[A, B]) First() A
func (Pair[A, B]) Second() B

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma.

I don't agree. We could have gotten generics without that commitment, it's just that the Go team insisted on it. We (or at least I) haven't seen evidence that shows we got more value out of that commitment than we would have without it.

Comment From: Merovius

Iterating over single values has been sufficient for every other language; I see no reason why it's any different for Go.

I would say that it has been sufficient for ~every other language, because ~every other language has first-class tuples and destructuring assignments. Admittedly I'm not super familiar with many languages but Go, but ISTM that at least the ones I am somewhat familiar with do have an equivalent of the two-value range syntax this way.

I don't agree. We could have gotten generics without that commitment, it's just that the Go team insisted on it.

Potato, Tomato. The Go team is making the ultimate decisions about the language, so if we have to do something to convince the Go team that something is a good idea, then we had to do it. That is just as true today as it has been for the last ten years. So whether or not you agree with their decision and whether or not you agree with that form of governance of the Go project - they are the circumstances under which we work.

Comment From: DeedleFake

Most Go range usages can yield two values, but a range over a channel is an existing usage that only yields one at a time. Given the extra complexity due to the doubling of so many basic functions, maybe it would be best to at least start with only single-value range-over-func for now as well? If something needs to yielda errors alongside other values, they can always just yield some small custom struct type.

Comment From: willfaught

I would say that it has been sufficient for ~every other language, because ~every other language has first-class tuples and destructuring assignments

Haskell's and Python's tuples are normal types (albeit with some syntactic sugar). We can already declare comparable types in Go. I showed how to do so above with Pair.

Destructuring isn't required for iteration of single values. Haskell can do fst p¹ to get the first item of pair "p". Python can do p[0] to get the first item of pair "p". Go can do p.First to get the first item of a pair "p" (or p[0], if you use an array instead of a struct).

Comment From: Merovius

@willfaught Destructuring is necessary to allow you to write x, y = f(), if f returns a tuple. Instead of having to write p = f(); x, y = p[0], p[1]. And destructuring requires the tuple types to be first-class types known to the language.

FTR I don't get your point. Because you say "Haskell is all about destructuring", which is precisely my point. Haskell has this language feature and it is quite essential for how it can manage with only one form of functional mapping primitives.

Comment From: bcmills

I don't know about Haskell, but functional languages in the Hindley-Milner family often have some form of existential types.

In Go, that might look something like:

type Seq interface {
    [V any] func(yield func(V) bool) bool | [K, V any] func(yield func(K, V) bool) bool
}

func Concat[S Seq](seqs ...S) S {
    …
}

But then, of course, there is no way to implement the body of the iterator functions without some kind of parametric type-switch. 🤔

Comment From: bcmills

Thinking about how functional adapters will be used, I think this package should also provide transformations between the two sequence arities. Otherwise it will be difficult to compose transformations that add information to a stream (such as lookup functions) or to compose Seq functions with containers that provide only Seq2 iterators.

It could either be two more Map variants:

func Map12[KIn, VIn, VOut any](seq Seq2[KIn, VIn], f func(KIn, VIn) VOut) Seq[VOut] {
    return func(yield func(VOut) bool) bool {
        for k, v := range seq {
            if !yield(f(k, v)) {
                return false
            }
        }
        return true
    }
}

func Map21[VIn, KOut, VOut any](seq Seq[VIn], f func(VIn) (KOut, VOut)) Seq2[KOut, VOut] {
    return func(yield func(KOut, VOut) bool) bool {
        for in := range seq {
            if !yield(f(in)) {
                return false
            }
        }
        return true
    }
}

Or, three standalone functions:

func Lift[V any](seq Seq[V]) Seq2[V, struct{}] {
    return func(yield func(V, struct{}) bool) bool {
        for v := range seq {
            if !yield(v, struct{}{}) {
                return false
            }
        }
        return true
    }
}

func Keys[K, V any](seq Seq2[K, V]) Seq[K] {
    return func(yield func(K) bool) bool {
        for k, _ := range seq {
            if !yield(k) {
                return false
            }
        }
        return true
    }
}

func Values[K, V any](seq Seq2[K, V]) Seq[V] {
    return func(yield func(V) bool) bool {
        for _, v := range seq {
            if !yield(v) {
                return false
            }
        }
        return true
    }
}

Comment From: jimmyfrasche

@bcmills I say :+1: to the maps and the 3 standalone funcs. This is an exp package. I say throw everything in it and adopt things into the stdlib if they get used a lot. Not a general approach but I think it would work here since these are all things that are plausibly valuable and using a novel (in Go) feature. If nothing else it'd be easier to measure their use if they're centrally contained instead of recreated in dozens of different places.

In that vein, there's another useful 1→2 adapter to consider:

func Enumerate[V any](seq Seq[V]) Seq2[int, V] {
  return func(yield func(int, V) bool) bool {
    var n int
    for v := range seq {
      if !yield(n, v) {
        return false
      }
      n++
    }
    return true
  }
}

Comment From: bcmills

Oh, yeah! Maybe just do Enumerate instead of Lift, really. 🤔

(But make it int64, not int, so that it can be used for long streams.)

Comment From: DeedleFake

But make it int64, not int, so that it can be used for long streams.

That won't make any difference on some architectures. Might as well make it uint64 since you can't have negative iterator indices anyways.

Comment From: Merovius

Enumerate[T constraints.Integer] (mostly joking)

Comment From: jimmyfrasche

int is good because int is what lots of things do. There should probably be a separate package for all the various means of generating ordered sequences with a start/stop/step that are type parametric. Then you could use one with stop = "infinity" and Zip 'em (the standard Zip I mention above not the nonstandard but useful one in the original post). Using int, start = 0, step = 1 is common enough to be worth the special case, imo

edit: the zip comment subthread I refer to is already getting eaten up by github it starts here https://github.com/golang/go/issues/61898#issuecomment-1673966447 and continues for several more posts

Comment From: bcmills

@DeedleFake

That won't make any difference on some architectures. Might as well make it uint64 since you can't have negative iterator indices anyways.

The point of using int64 is so that it isn't feasible for Enumerate to ever reach that high. (If you can dispatch three iterations of the loop per nanosecond, 2⁶³/3 nanoseconds is still almost a hundred years!)

So the extra bit in a uint64 seems unnecessary — and int64 would also be more consistent with existing APIs like io.Copy.

@jimmyfrasche

int is good because int is what lots of things do.

If you look at how int is actually used in Go, it functions like ssize_t in C: it represents the size of a thing in memory. (It is not generally used for counting things that may exceed the size of the heap, such as the number of bytes copied by io.Copy.)

An iterator may easily exceed the number of bytes in the address space of a 32-bit machine within a relatively short running time — on the order of seconds, not centuries. And, like it or not, Go still supports a number of 32-bit architectures.

Comment From: jimmyfrasche

int is what range over slice and Len() methods etc. use so it makes as good (or as bad) a default as it does there. I do think there should be something like xiter.Zip(seq.Of[uint64].From(0), iter) for when you need something stronger.

Comment From: jimmyfrasche

To clarify the hypothetical seq.Of[uint64].From(0) would be the sequence 0, 1, 2, …, of uint64 and Zip would terminate at the shorter of the two iterators so it would return (0, v0), (1, v1), (2, v3), … for all the vN in iter. Enumerate would be equivalent to xiter.Zip(seq.Of[int].From(0), iter)

Comment From: bcmills

@jimmyfrasche, the proposed Zip returns an iterator over Zipped structs, not a Seq2, so it is not equivalent to Enumerate.

(That's because it has to handle finite iterators, and continues to zip even after one of the iterators has run out of values.)

Comment From: jimmyfrasche

I state and link to my previous comment arguing against that as the default

Comment From: hherman1

I’m still wondering, if you do

For k, v := range variable.Values() {
    fmt.Println(k, v)
}

what gets printed if Values is not Seq2? Is this a compile error?

Comment From: ianlancetaylor

@hherman1 Yes, that would be a compilation error.

Comment From: willfaught

Destructuring is necessary to allow you to write x, y = f(), if f returns a tuple. Instead of having to write p = f(); x, y = p[0], p[1]. And destructuring requires the tuple types to be first-class types known to the language.

I don't follow your counterargument. My argument was that writing x, y = f() is unnecessary because we can do p = f(); x, y = p[0], p[1]. The enormous cost of having 2+ iterator signatures, and all the duplicate implementations and adaptors and API surface area that entails, isn't worth the slight convenience of x, y = f(). I also demonstrated a way to have a single iterator signature while also enabling x, y = f(). Simply stating that x, y = f() is necessary in response doesn't seem useful. Show us how the benefit outweighs the cost.

Since every language comparable to Go that I can think of doesn't rely on destructuring for iteration, and iterates over single values, I have yet to see an argument for why Go should have multiple iteration values, and incur the costs of doing so.

FTR I don't get your point. Because you say "Haskell is all about destructuring", which is precisely my point. Haskell has this language feature and it is quite essential for how it can manage with only one form of functional mapping primitives.

My point in the footnote was that Haskell only has destructuring, so fst p is not an exact analogy. The larger point that was part of was that Go and languages comparable to Go don't require destructuring, either in the context of iteration or in general, therefore we don't need multiple-value iteration in Go, because we don't need to use destructuring in Go.

Comment From: jimmyfrasche

I do not want to have to write

for pair := range s {
  k, v := pair.K, pair.V

when I have to use a 2-iter. There are other ways of writing that expansion and my dislike extends to them equally.

Comment From: willfaught

I do not want to have to write [...] when I have to use a 2-iter. There are other ways of writing that expansion and my dislike extends to them equally.

@jimmyfrasche You don't have to. You can still write for k, v = range with single iteration values, as I showed above.

Even without multiple assignment, it would be

for pair := range s {
  use(pair.K, pair.V)

not

for pair := range s {
  k, v := pair.K, pair.V
  use(k, v)

Comment From: DeedleFake

The 1 vs. 2 value iterator differentiation, though not exactly the same, seems a lot to me like the support for push and pull iterators back when this was a pre-proposal discussion. Yes, it would make certain things slightly easier, like being compatible with the existing sync.Map Range() method, but is it worth the language-level support for that when you could just stick the following in the iters package and achieve the same thing without the significantly increased complexity?

package iters

type Pair[T1, T2 any] struct {
  First T1
  Second T2
}

func FromTwo[T1, T2 any](seqfunc func(func(v1 T1, v2 T2) bool)) Seq[Pair[T1, T2]] {
  return func(yield func(Pair[T1, T2]) bool) {
    seqfunc(func(v1 T1, v2 T2) bool {
      return yield(Pair[T1, T2]{v1, v2})
    })
  }
}

Comment From: Merovius

@willfaught

Simply stating that x, y = f() is necessary in response doesn't seem useful. Show us how the benefit outweighs the cost.

It is not necessary, but it sure is convenient for idiomatic and clear code.

Since every language comparable to Go that I can think of doesn't rely on destructuring for iteration

I repeat: Almost every language I'm aware of does. Here is a Python example:

for x, y in enumerate([1,2,3]): // note how there are no tuples here
    print(x, y)
type(list(enumerate([1,2,3]))[0]) // <class 'tuple'>

enumerate and other two-variable loop constructs in Python yield tuples, but the loop construct destructures them.

Comment From: willfaught

I repeat: Almost every language I'm aware of does. Here is a Python example:

@Merovius I said rely, not enable. Every language comparable to Go that I can think of does not rely on destructuring for iteration. In other words, it is not necessary to destructure iteration values in these languages. You even acknowledged that just above:

It is not necessary, but it sure is convenient for idiomatic and clear code.

Your Python code can be changed to work like this:

for x in enumerate([1,2,3]):
    print(x[0], x[1])

similar to how it could work in Go:

for x := range enumerate(1,2,3) {
    use(x.A, x.B)

If we want first-class tuples, then we should add them to Go as their own feature. We shouldn't complicate Go iteration just to avoid the need to add tuples to Go, a feature that most of Go's competitors have had for a long time, including Python, Rust, Swift, C++, C#, and Scala (and Haskell, for what it's worth).

Tuples go hand-in-hand with generics. It's inevitable to want to bundle together arbitrary values for a parameterized type, so it's unsurprising to me that this problem is coming up for iteration of generic values. To enable the multiple-assignment syntax, we should add tuples, not add multiple iteration signatures. Or just do it how I showed above.

(Big edit, sorry)

Comment From: earthboundkid

Making multiple return values first class language constructs by adding a tuple type to Go is an interesting idea, but it should probably be its own proposal, not a side argument in an issue about a particular iteration helper library.

Comment From: Merovius

@willfaught Okay. Fair enough. As others have said, I think it is bad not to be able to write for k, v := c.All(). Allowing that, without the Seq2 form, would require some form of first-class tuples (or your IMO strange suggestion of calling methods), which Go doesn't have and is unlikely to get, in my opinion. It's not like they haven't been suggested before.

Comment From: AndrewHarrisSPU

Thinking about pairs, we could write Zip2 to return

Seq[Pair[Pair[Pair[K1, V1], bool], Pair[Pair[K2, V2], bool]]]
Seq2[Pair[Pair[K2, V2], bool], Pair[Pair[K2, V2], bool]]
Seq[Zipped2[K1, V1, K2, V2]]

Go should have porcelain, and some languages do the plumbing so meticulously it is the porcelain, but I don't know that Go has to do it this way. With Zip2 I think a stylistic choice is made (to not use Seq2 or Pair), concerning whether Seq2 or Pair are used as porcelain or plumbing. If we call them they dyadic forms, I think a strictly simple dyadic style is porcelain, and is usually preferred ... something like: - a dyadic form is characterized by exactly two type parameters - a dyadic form unpacks completely into exactly 2 values - a dyadic form isn't nested

In other words:

// ok
for i, v := range seq {...}

// impolite
for term := range seq {
    k, v := unpack(term)

--- but ---
// reasonable, catches even if unpack is ad-hoc
for term := range seq {
    index, key, value, ...stuff..., err := unpack(term)

// not actually easier to work with?
for i, term := range seq {
    k, v := unpack(term)

Comment From: DeedleFake

@AndrewHarrisSPU

If you're interested, I went and implemented the |> operator just to play around with it. There's no CL that can be conveniently installed via gotip since it's a proposal that was rejected, twice, but you can install it manually from my fork. It's not built off of CL 510541, either, as I didn't want to have to deal with Gerrit and git-codereview, but iterator chaining can be used anyways as long as you don't use range.

Comment From: AndrewHarrisSPU

@DeedleFake That's very cool! The change in code is very efficient, I don't know if I should be surprised but I am :)

It seems worth mentioning: with method chaining, h(x).g().f() doesn't work if h can't list type parameters used by g and f. Composing functions with h |> g |> f is better in this regard.

Comment From: DeedleFake

One more: I got tired of rewriting basic iterator functionality every time I wanted to try something with it, so I implemented a bunch of it as a package to make it easier to experiment with both the prototype and my |> fork. The package is compatible with CL 510541, but does not require it as all of its internals are implemented in a way compatible with regular Go 1.20. I'll probably be adding more functions to it over the next few days, so if you see something missing that you'd like, just ask.

Comment From: AndrewHarrisSPU

Exploring chaining compositions of Seq[T]:

A subset of xiter can be be expressed as Seq[T] -> Seq[T] with no additional types: 1. deriving a Seq[T] from a Seq[T] and a func(T) bool. (Filter, Limit) 2. deriving a Seq[T] from a Seq[T] and a func(T) T (Some instances of Map)

For this simply parameterized subset, chained operations can be written as compositions on: 1. sequences func(func(T) bool) bool (like xiter does) 2. yield functions func(T) bool 3. pull functions func() (T, bool)

Sequence chaining operations the way xiter does take something like the form:

func F(seq Seq[T], ...) Seq[T] {
    return func(yield func(T) bool) bool {
        for t := range seq {
            ... if !yield(t) ...
        }
    }
    return true
}

In this style, there's a cascade of for range iterations and loop bodies. Each stage is a low- but not zero-cost abstraction.

This hacky code implements a type Patch[T any] func(tPrime T, skip bool, stop bool). Methods Patch.Apply, Patch.Skip, and Patch.Stop can be used to implement Map, Filter, and Limit behavior. A Patch.Seq method applies to a sequence, but not like xiter - instead, it composes on the inner yield. A Patch.Pull method applies to pull iterators. Patch.Seq and Patch.Pull result in a single for range/yield stage, rather than a cascade.

In some quick benchmarks (too ugly to share) I've compared different ways of chaining simple, fast arithmetic operations, chaining sequences (1) pays for the for range cost for each chained operation, while chaining yield functions (2) or pull functions (3) with Patch. For the chain of operations filter evens -> square -> take 10:

Benchmark_PatchPull-10      15901131            74.75 ns/op       80 B/op          5 allocs/op
Benchmark_PatchSeq-10       17266455            68.70 ns/op       88 B/op          5 allocs/op
Benchmark_SeqCascade-10      4023194           297.4 ns/op       320 B/op         15 allocs/op
Benchmark_Direct-10         46363596            25.48 ns/op        0 B/op          0 allocs/op

(PatchPull, PatchSeq use Patch; SeqCascade uses xiter functions; direct is a direct implementation in a loop body)

xiter chaining is definitely more flexible, but paying for the cascade of for range/yield loops scales per stage in the cascade. I'm sure this is predictable, but maybe it's interesting that there's a way to do it without paying for the cascade.

Comment From: rsc

Finishing this proposal discussion is blocked on https://github.com/golang/go/issues/61405.

Comment From: bcmills

A minor point: for readability, I think it would be good to consistently put the func arguments at the end of the argument list. The function argument to a function like Map or Reduce may well be a multi-line literal, and there often isn't a good name for it as an intermediate variable, so it seems best to get the other (often shorter or easier-to-name) arguments out of the way first.

Comment From: DeedleFake

And I think it's also a good idea to consistently put the Seq argument as the first. Filter(), for example, is oddly backwards in the proposal.

Comment From: aarzilli

@bcmills the function argument appearing first is better for composition and the most common order of arguments in similar libraries https://github.com/golang/go/issues/61898#issuecomment-1674336815

Comment From: earthboundkid

Here are function signatures as originally proposed.

func Concat[V any](seqs ...Seq[V]) Seq[V]
func Concat2[K, V any](seqs ...Seq2[K, V]) Seq[K, V]
func Equal[V comparable](x, y Seq[V]) bool
func Equal2[K, V comparable](x, y Seq2[K, V]) bool
func EqualFunc[V1, V2 any](x Seq[V1], y Seq[V2], f func(V1, V2) bool) bool
func EqualFunc2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq[K2, V2], f func(K1, V1, K2, V2) bool) bool
func Filter[V any](f func(V) bool, seq Seq[V]) Seq[V]
func Filter2[K, V any](f func(K, V) bool, seq Seq2[K, V]) Seq[K, V]
func Limit[V any](seq Seq[V], n int) Seq[V]
func Limit2[K, V any](seq Seq2[K, V], n int) Seq2[K, V]
func Map[In, Out any](f func(In) Out, seq Seq[In]) Seq[Out]
func Map2[KIn, VIn, KOut, VOut any](f func(KIn, VIn) (KOut, VOut), seq Seq[KIn, VIn]) Seq[KOut, VOut]
func Merge[V cmp.Ordered](x, y Seq[V]) Seq[V]
func MergeFunc[V any](x, y Seq[V], f func(V, V) int) Seq[V]
func Merge2[K cmp.Ordered, V any](x, y Seq2[K, V]) Seq2[K, V]
func MergeFunc2[K, V any](x, y Seq2[K, V], f func(K, K) int) Seq2[K, V]
func Reduce[Sum, V any](sum Sum, f func(Sum, V) Sum, seq Seq[V]) Sum
func Reduce2[Sum, K, V any](sum Sum, f func(Sum, K, V) Sum, seq Seq2[K, V]) Sum
func Zip[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq[Zipped[V1, V2]]
func Zip2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq2[K2, V2]) Seq[Zipped2[K1, V1, K2, V2]]

20 functions. This seems like way too much duplication to me.

By contrast, here are the items in Python's itertools (by coincidence, also 20):

Infinite iterators:
count(start=0, step=1) --> start, start+step, start+2*step, ...
cycle(p) --> p0, p1, ... plast, p0, p1, ...
repeat(elem [,n]) --> elem, elem, elem, ... endlessly or up to n times

Iterators terminating on the shortest input sequence:
accumulate(p[, func]) --> p0, p0+p1, p0+p1+p2
chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ...
chain.from_iterable([p, q, ...]) --> p0, p1, ... plast, q0, q1, ...
compress(data, selectors) --> (d[0] if s[0]), (d[1] if s[1]), ...
dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails
groupby(iterable[, keyfunc]) --> sub-iterators grouped by value of keyfunc(v)
filterfalse(pred, seq) --> elements of seq where pred(elem) is False
islice(seq, [start,] stop [, step]) --> elements from
        seq[start:stop:step]
pairwise(s) --> (s[0],s[1]), (s[1],s[2]), (s[2], s[3]), ...
starmap(fun, seq) --> fun(*seq[0]), fun(*seq[1]), ...
tee(it, n=2) --> (it1, it2 , ... itn) splits one iterator into n
takewhile(pred, seq) --> seq[0], seq[1], until pred fails
zip_longest(p, q, ...) --> (p[0], q[0]), (p[1], q[1]), ...

Combinatoric generators:
product(p, q, ... [repeat=1]) --> cartesian product
permutations(p[, r])
combinations(p, r)
combinations_with_replacement(p, r)

Some of those don't make sense for Go (starmap is just a workaround for Python's bad anonymous function support), but we can probably be little closer.

Here's what I'd like to see:

// Package iterutils is not pronounced like 🚽
package iterutils

func Keys(Seq2[K, V]) Seq[K]
func Values(Seq2[K, V]) Seq[V]
func Pairs(Seq2[K, V]) Seq[Pair[K, V]]
func Unpair(seq Seq[Pair[K, V]]) Seq2[K, V]
func Result(errp *error, seq Seq2[V, error]) Seq[V]
func Count(seq Seq[V]) Seq2[int64, V]

func Concat[V any](seqs ...Seq[V]) Seq[V]
func Equal[V comparable](a, b Seq[V]) bool
func EqualFunc[V1, V2 any](a Seq[V1], b Seq[V2], f func(V1, V2) bool) bool
func Filter[V any](seq Seq[V], f func(V) bool) Seq[V]
func Limit[V any](seq Seq[V], n int) Seq[V]
func Map[In, Out any](seq Seq[In], f func(In) Out) Seq[Out]
func Merge[V cmp.Ordered](a, b Seq[V]) Seq[V]
func MergeFunc[V any](a, b Seq[V], f func(V, V) int) Seq[V]
func Reduce[Sum, V any](seq Seq[V], init Sum, f func(Sum, V) Sum) Sum
func Zip[V1, V2 any](a Seq[V1], b Seq1[V2]) Seq2[V1, V2]
func ZipLongest[V1, V2 any](a Seq[V1], b Seq1[V2]) Seq[Zipped[V1, V2]]

Fewer functions, but a little more powerful overall.

Comment From: bcmills

@aarzilli, I disagree that that order is better in Go.

The usual reason to put the function first in functional languages is usually so that the function can be curried with the argument to be applied. There is no currying in Go, so that reason does not apply.

In the example you gave, I would expect a structure more like:

odd := Filter(seq, func(x int) bool { return x%2 == 1 }
doubled := Map(odd, func(x int) int { return x*2 })

Given that Go does not have lightweight function literals (#21498), I do not expect to see long chains of calls within an expression either way, and I don't find “start from the last line and move up” at all intuitive in a language where evaluation usually flows downward.

Comment From: aarzilli

The usual reason to put the function first in functional languages is usually so that the function can be curried with the argument to be applied. There is no currying in Go, so that reason does not apply.

Several of my examples are from languages that do not have currying.

Comment From: AndrewHarrisSPU

I think many languages have found themselves looking for inspiration from other languages that are higher on the spectrum of liberation from the von Neumann Style. In these cases, the interesting properties of "liberated" languages can be inspiring, but they don't immediately follow from imitation of syntax. I think it's actually hazardous to over-commit to "liberated" syntax, because it turns out to be incumbent on attentive programmers to maintain invariants that come for free otherwise.

I don't think we need to imitate functions-first, pro forma - other languages may have found that useful, but functions-last matches existing Go idioms.

edit: More than just idioms, Go is very focused on fighting a hundred John von Neumanns the size of gophers rather than a gopher the size of John von Neumann ... over-commitment to the liberated style with iterators I think has the drawback of making it harder to pick the right fights sometimes (esp. languages with less uniform concurrency models), I think that puts some limits on how far it would be wise to impose liberated style.

Comment From: Merovius

Here is a use-case that is a bit awkward with the APIs we have so far: Say I have a Seq[string] and I want to parse those as integers. I would like to just do xiter.Map(s, strconv.Atoi). But this doesn't work - s is a Seq[string] and strconv.Atoi is a func(string) (int, error). So my options are

Work with Map2:

type None struct{}

func LiftSeq[T any](s iter.Seq[T]) iter.Seq2[None, T] {
    return func(yield func(None, T) bool) {
        for v := range s {
            if !yield(None{}, v) {
                return
            }
        }
    }
}

func LifFunc[A, B, C any](f func(A) (B, C)) func(None, A) (B, C) {
    return func(_ None, a A) (B, C) {
        return f(a)
    }
}

func ParseIntegers(s iter.Seq[string]) iter.Seq2[int, error] {
    return xiter.Map2(Lift2(s), LiftFunc(strconv.Atoi))
}

Work with Map

type Result[T any] struct {
    Value T
    Err error
}

func LiftResultFunc[A, B any](f func(A) (B, error)) func(A) Result[B] {
    return func(a A) Result[B] {
        b, err := f(a)
        return Result[B]{b, err}
    }
}

func UnpackResultIter[A any](seq iter.Seq[Result[A]]) iter.Seq2[A, error] {
    return func(yield func(A, error) bool) {
        for r := range seq {
            if !yield(r.Value, r.Err) {
                return
            }
        }
    }
}

func ParseIntegers(s iter.Seq[string]) iter.Seq2[int, error] {
    return UnpackResultIter(xiter.Map(s, LiftResultFunc(strconv.Atoi)))
}

Roll my own Map1To2:

func Map1To2[A, B, C any](seq iter.Seq[A], f func(A) (B, C)) iter.Seq2[B, C] {
    return func(yield func(B, C) bool {
        for a := range seq {
            if !yield(f(a)) {
                return
            }
        }
    }
}

func ParseIntegers(s iter.Seq[string]) iter.Seq2[int, error] {
    return Map1To§(s, strconv.Atoi)
}

Roll my own MapErr (like above, but fixing C to be error)

Notably, all three of these require quite a bit of boilerplate. We could provide helpers for any of these as library functions.

All three have IMO significant drawbacks, even as library functions:

Carries around a pointless struct{}.
Incentivizes to write failable functions to return a Result[A] instead of (A, error), to avoid the extra lifting/unpacking when interacting with iteratiors.
Adds another set of xiter-APIs, with another layer of naming scheme
Adds another set of xiter-APIs, which are better named but less general

Personally, I'd probably do 4, if we think error specifically is a common enough use case. Or do nothing. Or do something else entirely.

This thought was prompted by this comment. Making some assumptions about what that example actually wants to do, it could be rewritten as xiter.MapErr(seq, deidFunc).

But, ultimately, I think what I'm really trying to say is that this all prompted me to re-think my priorities about Seq vs. Seq2. Because if we could somehow make multiple returns work as tuples and have destructuring assignments (which is what most languages I'm aware of do), we wouldn't need any of these. We could just do something like

func ParseIntegers(seq iter.Seq[string]) iter.Seq[(int, error)] {
    return xiter.Map(seq, strconv.Itoa)
}

func main() {
    seq := stringSequence()
    for v, err := range ParseIntegers(seq) {
        // loop body
    }
}

Now, I'm not sure we can make this work. And maybe we prefer #33080 to doing tuples. Or perhaps we want to solve this with some sort of variadic type parameters, to address it for arbitrary combinations of numbers of inputs/outputs (though I have no idea how that would look). All of these would ultimately make Go behave more like a functional language, by making it easier to plug together higher-level functions. And all of them would make one or more of the options above (for incorporating error-returning functions) far nicer to write or provide as library helpers.

So the more we discuss these iterator proposals, the more I'm inclined to agree that doing the Seq and Seq2 thing specifically is adding a lot of duplication and boilerplate. And that we should take a step back and find a more holistic solution to plugging together higher-level functions. Adding transformation functions as proposed here will likely make Go code look more like a functional language - perhaps we should make it work more like one, before doing that.

Comment From: ChrisHines

Adding to the point @Merovius makes in the previous comment. Some of the same tensions have existed in Go since the beginning when it comes to composing functions and channels. If you think of channels as similar to type Seq[T any] we have always had to declare struct types to pass multiple values in a single channel send. There is a kind of duality between (channel send/function call) and (channel receive/return value) operations and we have always had to manually structure/de-structure values across that boundary.

So it seems to me that a solution that could reduce that boilerplate for both multi-value sequences and channels would be the most orthogonal and provide the most opportunities for composition within the language.

Comment From: DeedleFake

Or perhaps we want to solve this with some sort of variadic type parameters, to address it for arbitrary combinations of numbers of inputs/outputs (though I have no idea how that would look).

I actually have a proposal for that from a while back, but there hasn't been any discussion on it in a while: #56462. I've kind of just been assuming that it died due to lack of interest. I'm not entirely sure if it would help with this particular case as it doesn't have any way to limit the number of types to no more than just two, but I guess the language could just refuse to allow something to be used with range after all the generic stuff is handled for a specific instance if it turns out that it has more than two types involved.

Comment From: seebs

It really does feel like being able to express seq[in, out] where either in, out, or both can be multiple types, would be very helpful.

Hmm.

Hypothesis: I'm not sure we actually need to be able to limit the number of types to "no more than two".

Maybe... variadic types? Just let us write func Map(s iter.Seq[T1], fn func(T1) T2) iter.Seq[T2] { ... }

and then use it with a func(string) (int, error), and then T2 is (int, error).

This isn't any weirder than being able to do return strconv.Atoi(foo) from a function returning (int, error).

Comment From: DeedleFake

The constraint for multiple types can't just be any, though. For example, what happens if you do

func IsZero[T any](v T) bool {
  // If v is (int, error), what type does reflect.ValueOf() see?
  return reflect.ValueOf(v).IsZero()
}

It's mentioned in the proposal I referenced above, but the constraint for variadic generics has to be even more restricted in terms of what you can do with it than any is. It has to be such that the only way to get the values back out of it is to destructure it via a function return, as that's the only place that destructuring happens in Go right now, and it has to be structured into it in the first place with a function call, not a literal expression. Even then it can cause some problems because of exported fields, reflect, etc., though those might be solvable in a couple of different ways.

Comment From: Merovius

FWIW I don't believe this issue is the right place to discuss variadic type parameters or any of the other things I referenced in my last comment. I think what particularly we would like to do to make functional composition more flexible is a larger discussion - if we want to do that at all (which is something that would make some sense to discuss here).

Comment From: jba

@Merovius, I wouldn't write any of those. I almost always want to fail fast, so I'd want my function to look like

func ParseIntegers(seq iter.Seq[string]) (iter.Seq[int], error)

Then I would write a MapErr that returned on the first error.

I don't think any feature that turns multiple return values into a single type will help with that.

Comment From: Merovius

@jba How would your ParseIntegers look? ISTM it would have to read the entire sequence first, to return the error.

In general, I find your comment surprising. In #61637 you said we would probably make an SQL row iterator Seq2[Row, error]. I don't really see the conceptual difference between an iterator scanning SQL rows and a function parsing a string sequence.

Comment From: earthboundkid

@jba How would your ParseIntegers look? ISTM it would have to read the entire sequence first, to return the error.

I think it would look like this:

func ParseIntegers(seq iter.Seq[string], errp *error) iter.Seq[int] {
  return func(yield func(int) bool) {
    for s := range seq {
      n, err := atoi.ParseInt(s)
      if err != nil {
        *errp = err
        return
      }
      if !yield(n) {
        return
      }
    }
  }
}

Comment From: Merovius

Yes, that works. I wouldn't want to write or use that. I also don't think it is wholly unnatural to continue processing, even if a single datum is broken - I think it makes sense to skip/log broken data and continue with the rest. Like, imagine a batch-processing pipeline, taking a sequence of filenames, doing some processing on each, and mapping it to an aggregate and error.

But sure, if the consensus is that we just don't think people should do that kind of stuff and should always just stop iterating if any error is encountered, we wouldn't need MapErr. Personally, I find that a strange assumption.

Comment From: gazerro

@Merovius In addition to your options, I would propose another one.

If we wanted to convert a function that operate on a value:

func(v T) (R, error)

into a function that operates on a sequence:

func(s iter.Seq[T]) iter.Seq2[R, error]

we could define a function as

type Func[T, R any] func(v T) (R, error)
type SeqFunc[T, R any] func(s iter.Seq[T]) iter.Seq2[R, error]

func AsSeq[T, R any](fn Func[T, R]) SeqFunc[T, R] {
    return func(s iter.Seq[T]) iter.Seq2[R, error] {
        return func(yield func(R, error) bool) {
            for v := range s {
                v2, err := fn(v)
                if !yield(v2, err) {
                    return
                }
            }
        }
    }
}

With this function, ParseIntegers could be written as

func ParseIntegers(s iter.Seq[string]) iter.Seq2[int, error] {
    return AsSeq(strconv.Atoi)(s)
}

or even

ParseIntegers := AsSeq(strconv.Atoi)

Comment From: jimmyfrasche

Map12 seems entirely reasonable and generally useful. So does Map21, for that matter (Keys and Values can be implemented in terms of it).

Add

// First stops iteration and sets err on the first err in s
func First[K any](err *error, s iter.Seq2[K, error]) iter.Seq[K]

And you can do

var err error
for s := range xiter.First(&err, xiter.Map12(seq, strings.Atoi)) {

Comment From: ianlancetaylor

I also don't think it is wholly unnatural to continue processing, even if a single datum is broken - I think it makes sense to skip/log broken data and continue with the rest. Like, imagine a batch-processing pipeline, taking a sequence of filenames, doing some processing on each, and mapping it to an aggregate and error.

If we want to log errors, then my first inclination would be to change the function we are passing in. Instead of passing strconv.Atoi pass

    func(s string) int {
        r, err := strconv.Atoi(s)
        if err != nil {
            log.Errorf("bad number %s: %v", s, err)
        }
        return r
    }

If we think that will happen with some frequency then

func LogErrors[T1, T2 any](f func(T1) (T2, error)) func(T1) T2 {
    return func(v T1) T2 {
        r, err := f(v)
        if err != nil {
            log.Errorf("failure on %v: %v", v, err)
        }
    }
}

Or, of course,

func LogAndSkipErrors[T1, T2 any](it iter.Seq[T1], f func(T1) (T2, error)) iter.Seq[T2] {
    return func(yield func(T2) bool) {
        for v := range it {
            r, err := f(v)
            if err != nil {
                log.Errorf("failure on %v: %v", v, err)
                continue
            }
            if !yield(r) {
                return
            }
        }
    }
}

My general point is that we don't have to focus on "a sequence of values with errors". We can focus on "use a function to convert one value to another, while handling errors in some way".

Comment From: Merovius

FWIW none of these other options really changes my point about this being awkward and requiring boilerplate.

Anyways, I just thought I should bring it up. To me, the case of having a mapping-function that returns an error seems fairly common and having that return a Seq2[T, error] seems ultimately the more flexible option - you can choose to use a helper that ignores errors to turn it into a Seq[T], you can choose to use a helper that stops at the first error, you can stuff it into a range and handle the error…

But if the consensus is that this is not needed, okay.

Comment From: ianlancetaylor

I'm not opposed to mapping functions that work with errors, but I think it would be premature to add them today.

Comment From: AndrewHarrisSPU

@Merovius

Thanks for the provocative example. The combination of Seq2[T, error] and xiter-style composition is, to my mind, just underpowered. Another formulation with interesting properties:

Seq[T] -> context.Context -> Seq2[T, error]

Both possible values of the error term context.Canceled or context.DeadlineExceeded really suggest killing a pipeline. This contrasts with pipelines that might sensibly skip broken data, and presents a coloring problem at each stage.

I'm optimistic about ways to dress up Seq2[T, error] with a little more state to isolate the coloring problems, without further language extension. I don't know what the useful generalizations (if any) are, or when it's worth the effort. At the very least I think this formulation is a starting point for isolating errors:

Seq2[V, error] -> func(V, error) (V, error) -> Seq[V]

If I can provide a fallible sequence with an error handling function, I can isolate some expected errors and fail otherwise.

Comment From: myaaaaaaaaa

I think it may be worth looking at this from a completely different angle:

jq is a DSL where iterators are first-class citizens, with syntax designed to be ideal for iterator chaining. Go already has several other DSLs in the standard library (regexp, text/template, etc) to deal with certain constructs that are intrinsically awkward to express in more general-purpose languages, and I believe that iterator chaining is one of those constructs.

A Go implementation can be found at https://github.com/itchyny/gojq . Its APIs are focused on map[string]any | []any | string | float64s due to the JSON focus of jq, but I think there would be great value in looking into ways to adapt it to be type-safe so that it would work more nicely with Go's types.

Comment From: earthboundkid

Another reason not to name this package xiter is that since this was proposed, “xitter” has become a common nickname for Elon Musk’s social media site.

Comment From: gophun

May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation.

Another reason not to name this package xiter is that since this was proposed, “xitter” has become a common nickname for Elon Musk’s social media site.

In English 'x' is pronounced /ks/, not /ʃ/. I think there would have to be intent behind mispronouncing it.

Also, 'liter' and 'litter' are two different words, and nobody would shudder at the thought of drinking a liter of water just because it's phonetically similar to 'litter'.

Comment From: earthboundkid

An initial X is not common in English, but when it does occur, it is not pronounced as a KS sound like medial X. Xerox and xylophone are both Z sounds, which leads to the natural pronunciation "zitter". You also see Xi commonly called "Zee" by people who don't know Pinyin.

Comment From: gophun

Okay, but a lot has to go wrong to accidentally change a voiced lenis alveolar fricative like /z/ into a voiceless fortis postalveolar fricative like /ʃ/. The more probable mispronunciation is /s/, leading to the word "sitter," which doesn't carry a negative connotation.

Comment From: earthboundkid

Why deal with any of these problems? Just get a name that can't be mispronounced. iterx is fine, for example.

Comment From: Nieskalany

I have better idea, just lets dont add this "feature". Go promise simplicity and with new abstractions it brakes this promise. And with all respect this should be main point of discussion, not "naming".

Comment From: earthboundkid

Go promise simplicity and with new abstractions it brakes this promise.

Do you mean functions as iterators or this package? I think functions as iterations are a big change to the language but mostly a positive one. It's much too complex to make or use an iterator now. As for this package, I think it probably goes too far in the direction of library calls for what could just be a series of statements, but people are going to demand it, so might as well have a single semi-canonical version of it in golang.org/x.

Comment From: myaaaaaaaaa

I think it may be worth looking at this from a completely different angle:

jq is a DSL where iterators are first-class citizens, with syntax designed to be ideal for iterator chaining. ...

To add to this, Wikipedia notes that jq is a pure functional language. When considering that jq's intended purpose is to query JSON, one comes to the realization that all query languages are pure functional languages (although obviously only when querying and not when updating: SQL SELECTs, GraphQL query{}s, etc).

It may just be that query languages are the ideal form of the functional programming paradigm: DSLs that are designed around easy composition of iterators, where side-effects don't exist due to being database read operations.

In that case, studying other query languages and designing a custom one tailored for Go's type system should be more fruitful than trying to shoehorn functional programming constructs directly into Go, of which the many difficulties are already discussed above. It would also allow for far more advanced iterator composition techniques than can be accomplished natively in Go.

For prior art on embedded query languages, see C#'s LINQ, which allows writing SQL-like expressions that, among other things, can be used to run SELECT statements on struct arrays. (Although note that LINQ involves keywords and expressions integrated directly into C#, whereas for Go I think it's preferable to have a DSL in a query package, akin to regexp or text/template)

Comment From: forsaken628

@DeedleFake

go lines := bufio.Lines(r) intlines := xiter.Map(lines, parseLine) nonNegative := xiter.Filter(intlines, func(v int) bool { return v >= 0 })

Maybe a simply compiler magic is all we need.

usually

var buf *bytes.Buffer
buf.String()

but we can

var buf *bytes.Buffer
(*bytes.Buffer).String(buf)

why not this. a similar syntax like template pipeline

var buf *bytes.Buffer
buf \ (*bytes.Buffer).String()

now we have go nonNegative := bufio.Lines(r) \ xiter.Map(parseLine) \ xiter.Filter(func(v int) bool { return v >= 0 })

Comment From: AndrewHarrisSPU

Would it be useful to do something like:

iter -> iter/itcore xiter -> iter/itlib

with iter providing implementation from both? The adapter package (currently xiter) is a low common denominator but not universal, narrower projects can do things other ways. It might be nice to help them import just itcore.

Comment From: golightlyb

Throwing in a particular shout for some form of Tee, because the simple naïve implementation is not as performant as a trickier implementation. As such it would be valuable to get it right, once, in the stdlib and optimise it there.

(I've mentioned this before somewhere, but can't remember/find which discussion)

Comment From: jba

A while ago, I wrote a nonsensical signature for a function that produced an iterator with possible errors:

func ParseIntegers(seq iter.Seq[string]) (iter.Seq[int], error)

I now know what I meant to write:

func ParseIntegers(seq iter.Seq[string]) (iter.Seq[int], func() error)

See this comment, where I argue that returning an error function along with an iter.Seq[T] is usually better than using a pointer to an error or iter.Seq2[T, error].

Comment From: rsc

It is probably time to revive this issue, so that we make a decision and (if accepted) have some form of this package when Go 1.23 is released.

Summarizing and responding to the main general points of discussion so far:

There was some discussion of changing the package name. For better or worse, xerrors established the convention, and xiter is following it. The standard pronunciation is x-iter. I haven't seen a compelling reason to break that convention here. No one is going to accidentally confuse xiter with Twitter, and no one is going to accidentally pronounce it “shitter”. (I couldn't even imagine what the coy hints at mispronunciation were getting at; thanks to @gophun for spelling it out in IPA.)
There was some agonizing over the duplication of methods for Seq and Seq2. This is duplication but not complexity. Since every Foo has a Foo2, you only have to learn the Foo forms and the “add 2 for Seq2” rule. There are only 10 forms to learn, not 20, in the current proposal. That is the best we can do with the current language we have. There are proposals for variadic generics, but none of them seem quite right yet, and again it doesn't make sense to wait for something that may never happen. If variadic generics ever did happen, we could retroactively variadify all the Seq forms to work on Seq2s too, at which point the Seq2 forms stop being necessary but also aren't adding complexity.
There was some discussion of trying to eliminate Seq vs Seq2 by use of tuple types. Again, there are proposals but none of them seem quite right yet, so it doesn't make sense to wait for something that may never happen. If tuple types did happen, I imagine we would try to do them in a way that Seq2[K, V] becomes equivalent to something like Seq[«K, V»], and again the Seq2 forms simply stop being necessary.
There was some discussion of trying to eliminate Seq2 forms by introducing a generic Pair struct. I have seen so much confusing C++ code caused by overuse of std::pair that I think it would be a serious mistake to introduce in Go. If we're going to do anything, solving tuples would be a better place to put effort, especially done in a way that harmonizes with the existing multiple return values. It is especially wrong for xiter to define Pair, since then tons of people will import xiter just to get Pair. Any definition of Pair will inevitably spread virulently to many other Go APIs. It's just not a path we want to start down.
There was some discussion of fluent chaining syntax, which requires generic methods, which we don't have and likely won't ever have. This proposal aims to work with the language we have, not some hypothetical future language that may never come to pass.
There was some discussion about additional language support for chaining in the spirit of Java Streams or C# LINQ. That is not a goal right now, nor do I foresee it being a goal in the future.
There was some discussion of raising the API to an even higher level of functional abstraction, making these return functions from Seq to Seq instead of being functions from Seq to Seq, perhaps with a new functional composition operator too. Go is not that kind of language, nor does it aim to be. Many people have concerns (myself included) about unreadable code becoming far too common even raising the abstraction to the level in this proposal. On balance I think we should try this much and see, but I still have that concern. I think bumping up another level of functional abstraction would certainly make code too hard to read. Code is written once but read many many times. Readability matters. There was at least one person who said that if we couldn't do the higher-order versions, we shouldn't do anything at all. I thought @gophun's comment https://github.com/golang/go/issues/61898#issuecomment-1672651854 was exactly right: the iterator functionality aims to complement existing for loops, not replace them.

Haskell manages to be readable with functions of this form mainly because of the automatic currying. Yes, the type of map is technically (a -> b) -> [a] -> [b] but the canonical Go translation of that is func(func(a)b, Seq[a]) Seq[b] not func(func(a)b) func(Seq[a]) Seq[b].

On the topic of design philosophy, one person suggested “This is an exp package. I say throw everything in it and adopt things into the stdlib if they get used a lot.” I want to push back explicitly on this: that is absolutely not the philosophy of the x repos. The idea is to put things in that we think are plausible candidates for the standard library already, not to throw tons of things against the wall and see what sticks. One important reason is that everything will stick somewhere, and even the things in x repos we end up supporting for a very long time.

Then there were specific discussions about specific APIs:

There was discussion about the correct placement of the function arguments in Filter, Map, and Reduce. For Filter and Map, the arguments are f, seq because that suggests f(seq) (really of course f is applied to seq's elements). I also agree with the readability concerns of keeping the function literal next to the operation it applies to. So I think those should remain as is.

For Reduce, I put the function between the sum and the seq to emphasize that sum is the left argument and seq is the right argument, but I agree that that's confusing, especially since Filter and Map take the function first. I've updated Reduce to put the function first.

There was a suggestion to add:
- func Map12(f func(V1) (K2, V2), Seq[V1]) Seq2[K2, V2]
- func Map21(f func(K1, V1) V2, Seq2[K1, V1]) Seq[V2]
- func Keys(Seq[K, V]) Seq[K]
- func Values(Seq[K, V]) Seq[V]

These seem basic and worth adding. I am a little bit unsure about Keys and Values as names, but they are better than Firsts and Seconds. Perhaps Map12 and Map21 are enough by themselves. I also see that #67334 was recently filed to add Map12 and Map21 to the standard library as Join and Split. Those might be interesting names too. (It is too late to add those to iter for Go 1.23.)

There was a suggestion to add func Count(Seq[V]) Seq[int, V] but it sidetracked into whether the integer should be int or int64 and what happens on 32-bit systems after 2 billion sequence values. I'm not convinced the function is useful enough to be worth answering those kinds of hypotheticals.
There was a suggestion to swap the parameterized types for Map, writing Map[Out, In] instead of Map[In, Out], to prepare for a possible future with lightweight function literals where perhaps In could be inferred but not Out. I am not convinced by this. The natural order of arguments is In, Out: f := xiter.Map[int, string] should be about a map from int to string. Again I think we should do the best we can with the language and libraries we have today, but even if we already had lightweight function literals, I don't think we would want to make unnatural choices to remove types from the page.
There was a suggestion to add Compact and CompactFunc, analogous to slices.Compact and slices.CompactFunc. Those seem less important for Seqs than slices, since they are intended for use after slices.Sort, and there is no xiter.Sort, nor can there be. (At that point, code might as well use slices.Collect+slices.Sort.) Filing a separate proposal for these would make sense, to separate that discussion.
There was some discussion of adding LimitFunc/Skip/SkipFunc. I am not sure about these. Filing a separate proposal for these would make sense, to separate that discussion.
There was a brief suggestion to find a better name than “sum” in the context of Reduce. I don't see a better name; “accumulator” is definitely not better. Sum is nice and short and punchy and accurate; it seems quite Go-like to me. If sum bothers you, you can cope by being thankful I named the argument sum Sum and not zero Sum. 😄
There was some discussion of having a less general Zip that doesn't mention the Zipped struct. Those could be written, of course, but I see Zip mainly as a building block inside other iterator adapters, not something that would be used in long adapter chains by itself. In that context, you are always ranging over the result of Zip, and there the general form is sometimes necessary and never harder to use than the less general form. I think we should keep the current Zip form.

Are there any concerns that I missed or misunderstood ?

Comment From: ChrisHines

@rsc

There was discussion about the correct placement of the function arguments in Filter, Map, and Reduce.

I assume this refers to comments such as: - https://github.com/golang/go/issues/61898#issuecomment-1672494247 - https://github.com/golang/go/issues/61898#issuecomment-1710579073

It sounds like you disagree with those. I happen to agree with them and I don't understand your reasoning for disagreeing with them.

You say above, "I also agree with the readability concerns of keeping the function literal next to the operation it applies to. So I think those [referring to Filter & Map] should remain as is." I don't understand how putting the function second in a two parameter function is less readable than putting it first. I do find that the second parameter is easy to lose sight of when if follows a multi-line function literal, though.

Please clarify your thinking on this decision.

The rest seem fine to me.

Comment From: aarzilli

@ChrisHines It might refer to my argument here

Comment From: rsc

Hi @chrishines. Thanks for asking. If xs is a sequence of x's, then since you'd write strings.ToUpper(x), writing xiter.Map(strings.ToUpper, xs) is the same order.

This is also the order used in similar functions in at least Haskell, Lisp, Perl, Python, and Scheme, for I suspect the same reason. Worth noting that John McCarthy at one point wrote function calls like {x}f, so obviously the other order is possible, but it's not the convention that won.

If you do end up writing nested calls, then I also agree with https://github.com/golang/go/issues/61898#issuecomment-1674336815 that placing the seq last helps readability. (It matches writing f(g(x)) and not {{x}g}f.)

You're right that this placement is inconsistent with the "Func-enhanced" methods like EqualFunc and MergeFunc. There, the overriding concern is symmetry with the non-Func versions: EqualFunc's parameters are Equal's parameters plus a func, not a func plus Equal's parameters. Perhaps the distinction is that for Filter, Map, and Reduce, the function is fundamental to the operation, not an incidental generalization.

Comment From: ChrisHines

@rsc Thanks for the response. Your justification is fine when the function is something like strings.ToUpper or short literals as in the comment you referenced. You haven't addressed the case of multi-line function literals specifically mentioned in the two comments I linked to, though. I think that's when putting the function first hurts readability the most and I think multi-line function literals are common enough that we should take them into account.

Comment From: myaaaaaaaaa

@rsc What are your thoughts on the DSL approach mentioned here?

There was some discussion about additional language support for chaining in the spirit of Java Streams or C# LINQ. That is not a goal right now, nor do I foresee it being a goal in the future.

There was some discussion of raising the API to an even higher level of functional abstraction, making these return functions from Seq to Seq instead of being functions from Seq to Seq, perhaps with a new functional composition operator too. Go is not that kind of language, nor does it aim to be. ... Haskell manages to be readable with functions of this form mainly because of the automatic currying. Yes, the type of map is technically (a -> b) -> [a] -> [b] but the canonical Go translation of that is func(func(a)b, Seq[a]) Seq[b] not func(func(a)b) func(Seq[a]) Seq[b].

A DSL would aim to solve the same problems that these points are trying to address, but without the readability or language complexity drawbacks.

Go already has several other DSLs in the standard library (regexp, text/template, etc) to deal with certain constructs that are intrinsically awkward to express in more general-purpose languages, and I believe that iterator adapters is one of those constructs.

Comment From: jimmyfrasche

@rsc

There was some discussion of having a less general Zip that doesn't mention the Zipped struct. Those could be written, of course, but I see Zip mainly as a building block inside other iterator adapters, not something that would be used in long adapter chains by itself. In that context, you are always ranging over the result of Zip, and there the general form is sometimes necessary and never harder to use than the less general form. I think we should keep the current Zip form.

I do not think the general form should be removed: I think it should not be called Zip.

Zip is a very common thing in many languages that is always defined one way. It will be confusing to have something called Zip that is not Zip yet is very closely related. I don't really know what it should be called, just that, while it should contain Zip in its name, it should not be called just Zip.

Separately, and to a much lesser extent, I do think the less general Zip should be included as it is extremely useful all on its own despite its less general formulation. It's trivial to write in terms of the generalized Zip but it's going to be the common case and the expectation.

I would be fine if only the general Zip were included—as long as the name Zip is still available in case it needs to be added later.

The worst case would be naming general Zip just Zip then finding out the less general version is what's needed the majority of the time and having to add that under a different name so we end up having not-Zip named Zip and Zip named not-Zip.

Comment From: gophun

@myaaaaaaaaa I don't understand what you have in mind. Templates and regular expressions are strings that are parsed and evaluated at runtime. Do you really want that for iterator functions? Something like:

xiter.Do[int]("filter func(x): x%2 != 0 | map func(x): x*x", numbers)

? And if Go functions are to be called, they have to be passed in as a map, like with templates? Like this

xiter.Do[int]("filter odd | map square", numbers, map[string]any{
    "odd":    func(x int) bool { return x%2 != 0 },
    "square": func(x int) int  { return x*x },
})

? I don't think that's a good idea. It seems inefficient and error-prone. Maybe you should implement that as an external library first.

Comment From: jimmyfrasche

What is the use case for Equal{,Func}{,2}? I'm having a trouble imagining a situation where you want to fully consume two iterators for only that single bit of information.

Comment From: rsc

@ChrisHines I think the answer is to not do that. Instead do something like

goodName := func( ... ) { ... }
x := strings.Map(goodName, seq)

I realize that's not what many people want to hear, but I don't think we want to optimize for avoiding naming things. The other languages all have function literals too and yet they all agreed the func should come first nonetheless.

@jimmyfrasche I am not convinced about Zip. You say "The worst case would be naming general Zip just Zip then finding out the less general version is what's needed the majority of the time". What are examples of code people might like to write where the general one does not work well?

@jimmyfrasche I am happy to drop Equal and EqualFunc.

@myaaaaaaaaa Like @gophun, I do not understand exactly what you mean, although I will repeat that we are not planning to add something like C# LINQ to Go.

Comment From: myaaaaaaaaa

@gophun

Yes, although with a separate compilation step, so something like:

var f = xiter.MustCompile[int, int]("filter odd | map square", map[string]any{
    "odd":    func(x int) bool { return x%2 != 0 },
    "square": func(x int) int  { return x*x },
})
...
    f.Do(numbers)

I don't think that's a good idea. It seems inefficient and error-prone.

The separate compilation step should be able to resolve much of the overhead. If the performance is still unsatisfactory, a code generation tool could be made in the future to achieve performance even greater than a library:

//go:generate xiter_compile
var f = xiter.MustCompile[int, int]...


// Code generated by xiter_compile. DO NOT EDIT.
func init() {
    ...
    f = compiledIter
}

Maybe you should implement that as an external library first.

I'm personally content with just using https://github.com/itchyny/gojq , which accomplishes something similar but at the cost of type safety. But since the topic was relevant, I thought I should at least mention it even if it were ultimately to be rejected.

Comment From: rsc

@myaaaaaaaaa The goal here is to write code as efficient as the equivalent for loops, just more readable. (If it's less readable, we should keep using for loops.) Moving to a whole separate language with a separate compiler does not help that cause.

Comment From: magical

What is the use case for Equal{,Func}{,2}? I'm having a trouble imagining a situation where you want to fully consume two iterators for only that single bit of information.

I can't speak to their usefulness, but i'll point out that a Seq is not an iterator and using it does not consume it. So the following snippet works and prints true 1 2 3.

    x := slices.Values([]int{1, 2, 3})
    fmt.Println(xiter.Equal(x, x))
    for y := range x {
        fmt.Println(y)
    }

https://go.dev/play/p/riE3oY1pRh-?v=gotip

Comment From: jimmyfrasche

@magical it doesn't necessarily consume it but that's not an invariant. I suppose you wouldn't be using Equals unless you knew that held, though.

Comment From: magical

@jimmyfrasche Yes i suppose if you have a Seq backed by an open file, or a chans.Seq, then that could pose a problem. We'll have to see how it plays out. Most of the Seq implementations being added to the standard library so far are restartable.

Comment From: jimmyfrasche

@magical maps.Keys would also be an issue.

Comment From: magical

@jimmyfrasche Potentially. I tried to point that out on the maps.Keys proposal but it didn't seem to get much attention.

Comment From: jimmyfrasche

@rsc consider: 1. ys is a finite Seq[Y] 2. xs is an "infinite" Seq[X] (wraps around or is at least always longer than ys) 3. you need to create a zipped Seq2[X, Y] that is the same length as ys 4. you meed to send this to func Sink(Seq2[X, Y])

With the less general zip this is just

Sink(xiter.Zip(xs, ys))

With the more general zip you need to write a custom iterator or write your own less general zip to use as an adapter.

(If there were a LimitFunc you could use that to stop when the finite sequence is exhausted and then use Map12 to turn the Zipped into the desired Seq2[X, Y] though that is just writing your own zip with other building blocks.)

I presented this a bit abstractly but 99% of my (many!) uses of zip over the years have been either two sequences of the same length or pairing finite and infinite sequences. If the sequences are the same length, Map12 would suffice but it's still extra steps. And it's still not how zip has been defined in other languages for, what, 50 or 60 years now?

Comment From: rsc

I presented this a bit abstractly but ...

That's unfortunately not convincing because it is so abstract. Concrete examples are more compelling. I'm happy to drop Zip from the proposal though.

Comment From: jba

I'm a huge fan of the function currently called Zip in this proposal, because it eliminates many explicit uses of Pull for parallel iteration. But looking at Haskell, Java, C++ and Python, all define zip to stop when the shorter sequence ends.

We could rename the function ZipLongest, as in Python's itertools.zip_longest. That would free us to provide the traditional Zip, where Zip(Seq[X], Seq[Y]) returns a Seq2[X, Y]. The problem then becomes, what should Zip2 return? There is no Seq4, there are no tuples, and we don't want to introduce Pair for reasons Russ gave in his summary.

I would still vote to keep the function as ZipLongest or some other name, but I also see the logic in dropping it for now.

Comment From: jimmyfrasche

I would be very happy with

func Zip[V0, V1 any](iter.Seq[V0], iter.Seq[V1]) iter.Seq2[V0, V1]

func ZipLongest[V0, V1 any](iter.Seq[V0], iter.Seq[V1]) iter.Seq[Zipped[V0, V1]]
func ZipLongest2[V0, V1, V2, V3 any](iter.Seq2[V0, V1], iter.Seq[V2, V3]) iter.Seq[Zipped2[V0, V1, V2, V3]]

I would be happy with just ZipLongest{,2}, as it leaves the possibility of adding the classical Zip with the standard name later.

I'm not too worried about a Zip2 since it can't be represented currently.

Two possible additions that could be represented are the mixes of seq and seq2, ZipLongest{12,21}, but I don't know if either of those would come up enough to be justified, especially since you could just Zip the arity 1 seq with a seq that infinitely yields struct{} then use ZipLongest2.

Comment From: leaxoy

Since every Foo has a Foo2, you only have to learn the Foo forms and the “add 2 for Seq2” rule. There are only 10 forms to learn, not 20, in the current proposal.

@rsc I'm the author of introduce #67334, but I still think keep both seq and seq2 is not a suitable choice, there some reasons:

In other languages with iterators, there are far more than 10 iterator-related methods. And I think that as go evolves, there will be far more than ten iterator methods. In extreme cases, if the methods of other languages are combined, there will be close to 100 methods.

So you can’t extrapolate the future from your current perspective.

There was some discussion of adding LimitFunc/Skip/SkipFunc. I am not sure about these. Filing a separate proposal for these would make sense, to separate that discussion.

If this happens, there will be many more methods, far more than ten.

Now that we have seq2, but zip still returns seq, why? This seems very inconsistent.

Therefore, in my opinion, providing seq2->seq conversion and deleting methods has suffix 2 can reduce the complexity of this package, and at the same time, it will not lead to the abandonment of a large number of method when we introduce tuple or variadic generics in the future.

Comment From: perj

There was some discussion of trying to eliminate Seq vs Seq2 by use of tuple types. Again, there are proposals but none of them seem quite right yet, so it doesn't make sense to wait for something that may never happen. If tuple types did happen, I imagine we would try to do them in a way that Seq2[K, V] becomes equivalent to something like Seq[«K, V»], and again the Seq2 forms simply stop being necessary.

@rsc, a valid argument. I'm somewhat skeptical towards tuples, for much the same reasons you mentioned about std::pair. But I do think variadic type parameters should be introduced, if possible. I would make sense to have a plan for the functions in this package to generalize, if variadic type parameters are introduced.

To me, it looks like several functions are straight forward, but Map, Reduce and Zip will have trouble being generalized in this way, because they already take two type parameters. One plan could be to name them Map1, Reduce1, Zip1 instead, if that's not considered to intrusive. If renaming them is not an option, at least a comment that their signatures might change if variadic type parameters are introduced. This is an experimental package, after all.

Comment From: earthboundkid

To me, it looks like several functions are straight forward, but Map, Reduce and Zip will have trouble being generalized in this way, because they already take two type parameters. One plan could be to name them Map1, Reduce1, Zip1 instead, if that's not considered to intrusive. If renaming them is not an option, at least a comment that their signatures might change if variadic type parameters are introduced. This is an experimental package, after all.

Changing the names will break anyone who imports library A that has an old version of xiter and library B that has the new one. You'd need to do v2 to keep from breaking stuff.

Comment From: golightlyb

@rsc

Are there any concerns that I missed or misunderstood ?

Could we get a decision on a adding a Tee function? See Python itertools.tee, Rust iter_tee

It's valuable to get right, once, in the stdlib because the simple naïve implementation is not as performant or memory efficient as a trickier implementation.

Given that there's no way to clone an iterator, I think it's an especially important addition.

Comment From: rsc

@leaxoy, I think your proposal has merit on its own, but I don't think it displaces the 2 forms of the existing functions. I also highly doubt we will ever have 100*2 iterator methods. If we do, something has gone terribly wrong.

@perj, It depends exactly how tuples or variadic parameters end up working. Of course we may not find a way worth doing at all, so we shouldn't design around the hypothetical.

@golightlyb, I don't think tee is a particularly good idea. It fundamentally has to introduce unbounded buffering. It's also tied to the idea of iterators being consumble things, whereas iter.Seq is in general something that can be invoked multiple times. There is no Clone because you just range over it twice.

Comment From: szabba

@earthboundkid :

Changing the names will break anyone who imports library A that has an old version of xiter and library B that has the new one. You'd need to do v2 to keep from breaking stuff.

I'd find a library importing an exp/* package concerning. My mental model is that that's OK to do in an application, not a library others might be using.

Comment From: fzipp

There was some discussion of changing the package name. For better or worse, xerrors established the convention, and xiter is following it. The standard pronunciation is x-iter. I haven't seen a compelling reason to break that convention here.

What is the future outlook if this experiment proves to be successful? Would it then be included in the standard library as 'xiter', or integrated into 'iter' in the same way 'x/xerrors' was integrated into 'errors'?

Comment From: jimmyfrasche

@rsc

There was a suggestion to add Compact and CompactFunc, analogous to slices.Compact and slices.CompactFunc. Those seem less important for Seqs than slices, since they are intended for use after slices.Sort, and there is no xiter.Sort, nor can there be. (At that point, code might as well use slices.Collect+slices.Sort.) Filing a separate proposal for these would make sense, to separate that discussion.

There was some discussion of adding LimitFunc/Skip/SkipFunc. I am not sure about these. Filing a separate proposal for these would make sense, to separate that discussion.

Filed: #67441 and #67442

Comment From: golightlyb

@rsc

@golightlyb, I don't think tee is a particularly good idea. It fundamentally has to introduce unbounded buffering. It's also tied to the idea of iterators being consumble things, whereas iter.Seq is in general something that can be invoked multiple times. There is no Clone because you just range over it twice.

Good point. I think io.TeeReader covers the use case anyway.

Comment From: leaxoy

I think your proposal has merit on its own, but I don't think it displaces the 2 forms of the existing functions. I also highly doubt we will ever have 100*2 iterator methods. If we do, something has gone terribly wrong.

@rsc, in my opinion, the iterator library should provide as many methods as possible to meet the needs of different users. If the iterator library is introduced, but there are still a large number of users implementing theirs, then the role of introducing the iterator library in the standard library will not be so great, because users always need to implement their own libraries.

At the same time, introducing sufficiently rich methods will not bring too much additional complexity. The cost lies in the first introduction, and the implementation of these methods is already very fixed and standard, and there will basically be no frequent iterations.

Rust iterator: https://doc.rust-lang.org/std/iter/trait.Iterator.html Scala seq: https://www.scala-lang.org/api/current/scala/collection/immutable/SetOps.html Kotlin seq: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/-sequence/ Swift seq: https://developer.apple.com/documentation/swift/sequence Elixir stream: https://hexdocs.pm/elixir/Stream.html C# linq: https://learn.microsoft.com/en-us/dotnet/csharp/linq/standard-query-operators/ Cpp ranges: https://en.cppreference.com/w/cpp/header/ranges

If we want users to uniformly use the version provided by the standard library, then we will most likely have a method of similar scale, no doubt.

Comment From: Merovius

@leaxoy

because users always need to implement their own libraries.

I think "need" is the word we disagree on here. If we think there is a real need, then sure, putting stuff into the stdlib makes sense. But if we don't, then we shouldn't. If people choosing to still add extra abstraction, that's their choice.

Personally, I don't think people will use a lot of extra abstraction. I tried using these higher-level iteration functions and, in my opinion, they don't really work super well in Go, because of the heavy function literal syntax. Also, I think people will heavily weigh whether or not to do their own packages - if there is an stdlib package with some, there are more likely to restrict themselves to those functions, that add their own packages, would be my prediction. I don't often see "string utility packages" that add more kitchen-sink functions beyond what strings provides, for example.

At the same time, introducing sufficiently rich methods will not bring too much additional complexity. The cost lies in the first introduction, and the implementation of these methods is already very fixed and standard, and there will basically be no frequent iterations.

I'll note that this is taking the diametrically opposite position of what you argued above, which is that there is significant additional complexity in adding new functions (not methods, for the record) because of the Seq/Seq2 split. You can't have it both ways, saying that we shouldn't have that split, because of all the extra functions add too much complexity and that we should put lots of functions in, because it doesn't have extra complexity.

Comment From: leaxoy

I'll note that this is taking the diametrically opposite position of what you argued above, which is that there is significant additional complexity in adding new functions (not methods, for the record) because of the Seq/Seq2 split. You can't have it both ways, saying that we shouldn't have that split, because of all the extra functions add too much complexity and that we should put lots of functions in, because it doesn't have extra complexity.

@Merovius No, The complexity of Seq/Seq2 is unnecessary, it is only the current language limit, so I gave an example of introducing many methods just to illustrate that Seq/Seq2 is quite unreasonable from the perspective of long-term evolution.

What can be seen is that the introduction of more methods is inevitable in any language. I believe that go cannot escape, so it should not implement a large number of similar methods for Seq/Seq2 respectively.

And the complexity described here is only for seq or seq2, not seq and seq2

Comment From: leaxoy

because of the heavy function literal syntax.

@Merovius This is just the current situation, but good features must be recognized by everyone, otherwise simplified lambda expressions will not appear in other languages. In go, I think this feature will definitely be available sooner or later, just like the original generic. So simplified lambda is not an important reason that prevents us from introducing more methods in the iterator library.

And there should be no hesitation in rejecting the introduction of new methods due to the lack of simplified lambdas, just as we do not have variadic generics but still introduce seq2 to abstract iterators for associative containers.

If the lack of some features leads to the rejection of other features, then seq2 should not exist, what do you think of?

I don't often see "string utility packages" that add more kitchen-sink functions beyond what strings provides, for example.

At least for me, iteration of strings is far less frequent than iteration of containers. Can it be considered a different scenario? And it is not appropriate here?

In addition, strings provides dozens of methods. In the past few years of iteration, the vast majority of user needs have been implemented in the standard library.

Comment From: Merovius

At least for me, iteration of strings is far less frequent than iteration of containers.

I didn't say "iteration of strings". I was making an analogy of another case where you could imagine a lot of extra helpers for an stdlib package, yet people don't write many kitchen-sink packages to provide those helpers beyond what the stdlib has.

Comment From: leaxoy

I didn't say "iteration of strings".

I misunderstood.

Comment From: leaxoy

So, my personal opinion is clear: 1. Provide rich enough iterator methods 2. At this stage, convert seq2 to seq, and then use the method above seq, instead of defining duplicate implementations on seq2, and then split seq to seq2. - In the long run, variadic generics or tuples can be considered to gradually abandon seq2. If variadic generics or tuples are introduced

Comment From: bGVia3VjaGVu

The duplication of functions leads to a bloated package and worse developer experience, i.e. multiple functions showing up in the IDE. It also means that only a few functions are worth adding, since the duplication adds up and the package would become unusable if more functions were added. Even though it is an experimental package, there is still the expectation that the API makes sense and the package will likely be added to std.

Comment From: jimmyfrasche

I was starting to lose track so I went through and tried to find all the official positions and related proposals so far. Let me know if I got anything wrong.

Uncontroversial: - Concat{,2} - Filter{,2} - Limit{,2} (possible rename in separate proposals section below) - Map{,2} - Merge{,Func}{,2} - Reduce{,2}

Maybe drop: - Equal{,Func}{,2} - Zip{,2} (or better yet rename them to ZipLongest{,2})

Add: - Map12 - Map21

(#67334 has the alternate names Join for Map21 and Split for Map12)

Maybe add: - Keys - Values

Separate proposals: (please keep discussion of the below in their respective threads) - Compact{,Func}{,2} #67441 - {Take,Drop}{,While}{,2} (note: renames Limit{,2} to Drop{,2}) #67442 - {Fold,Any,All,None}{,2} #67453 - Compare{,Func}{,2} #67455 - {Min,Max,MinMax}{,Func}{,2} #67456 - IsSorted{,Func} #67457 - #67458 - Find{,Map} - Contains{,Func}

edit: left out the MergeFunc variants

Comment From: jimmyfrasche

If Keys/Values aren't added they should at least be in an example for Map21 so it's easy to copy-paste

Comment From: AndrewHarrisSPU

I thought there might be two distinct reasons to think about dropping Equal but I think rates highly for inclusion - it's a simple concept but not a trivial implementation (requires a parallel Pull, and belabored structuring/destructuring around type parameters).

First, it's been discussed alongside Zip. Equal is currently implemented with Zip, but using an unexported zip to implement Equal doesn't seem harmful.

Second, a reason to think about dropping was finding a use case. I think Equal can show up in graph (often tree) isomorphism problem, where walking Equally can establish isomorphism. In my experience there are practical applications, and it seems like a great fit for iterators.

Comment From: jimmyfrasche

@AndrewHarrisSPU The discussions for both happened near each other but are unrelated.

The zip discussion is about naming/semantics/convention. I would think that even if it gets dropped from this proposal there would be a follow up proposal to work out the zip-specific details without holding anything else up. They're a valuable family with many uses.

Equals was a question of use cases and if there should in general be anything other than Reduce{,2} that goes from sequence to value (see the last 5 proposals in the round up I posted).

Comment From: earthboundkid

Equal is Reduce plus a TakeWhile because it doesn't compare after the first unequal sequence item.

Comment From: jimmyfrasche

You would still need ZipLongest to get the two input sequences into a single paired sequence that records whether one ended before the other.

Comment From: josharian

Much of the example code in the OP does not compile.

I would like to suggest that:

(a) this code be updated to work with the final rangefunc design (b) this code be put somewhere copy/paste-able, if not go-gettable, so people can gain experience actually using it

I (and others) want to start using these helpers, and I'm starting to copy/paste them a bunch of places.

I am very tempted to make github.com/josharian/xiter, simply mirroring this proposal. A variety of other xiter modules are starting to sprout up around the internet. But it would be better to have a single home for this to prevent sprawl and avoid having minor API variations across modules.

The time is probably ripe for at least the uncontroversial APIs here to make it into a CL, if not x/exp/xiter.

Comment From: gophun

I would like to suggest that:

(b) this code be put somewhere copy/paste-able, if not go-gettable, so people can gain experience actually using it

That's the proposal, to make it available for experimentation.

Comment From: josharian

That's the proposal, to make it available for experimentation.

Putting the code into a CL, where it is all in a single place and compiles and has tests, is different than submitting that CL.

Comment From: ianlancetaylor

@josharian

(a) this code be updated to work with the final rangefunc design

Done. But I just checked that it compiles, I didn't write any tests.

(b) this code be put somewhere copy/paste-able, if not go-gettable, so people can gain experience actually using it

Well, you can at least copy-and-paste from this issue....

Comment From: David-Mao

In my own practice, I find that I need to write the following 2 functions frequently. I wonder if they deserve to be put into this category?

func Empty[V any]() iter.Seq[V] { 
    return func(func(V) bool) {}
}

func Single[V any](e V) iter.Seq[V] {
    return func(yield func(V) bool) { yield(e) }
}

(and the corresponding Empty2 and Single2.)

What they do is obvious: they return iter.Seq that yields no element and one given element respectively. They are common edge cases, so having them handy is very convenient.

Admittedly they can also be implemented as slices.Values([]V{}) and slices.Values([]V{e}) respectively, but that seems a little bit indirect.

Comment From: thediveo

@David-Mao am I mistaken in that your Single acts more like Endless because it never returns false?

Comment From: David-Mao

@David-Mao am I mistaken in that your Single acts more like Endless because it never returns false?

@thediveo The Seq callback doesn't return false. It's signature is func(yield func(V) bool), not func(yield func(V) bool) bool. The yield does return false, but since we have only one element, it doesn't matter as it will stop after calling yield once anyway.

To make it clearer, it should be like this exactly:

func Single[V any](e V) iter.Seq[V] {
    return func(yield func(V) bool) { 
        if !yield(e) {
            return
        }
    }
}

which is practically identical to return func(yield func(V) bool) { yield(e) }

Comment From: Merovius

One advantage over the explicit functions over the slices.Values versions is that slices.Values(s) has to put s on the heap in general (unless the iter.Seq machinery is optimized out by inlining). So, both of those have extra allocations over the explicit functions.

Comment From: bobg

For the functions that take a callback:

Agree that the callback parameter should come last;
Does the callback itself require a type parameter? E.g.:

func Filter[V any, F ~func(V) bool](seq iter.Seq[V], f F) iter.Seq[V]

Comment From: Merovius

@bobg Do you have a concrete use case that would benefit from it being a type parameter? Note that you can assign type F func(V) bool to func(V) bool - so the extra type parameter would really only matter if we think it is fairly common to 1. define such a function type yourself and 2. then wanting xiter.Filter to be able to be used as a func(iter.Seq[V], YourF) iter.Seq[V]. I find it hard to imagine a use case for either of those, much less both.

Comment From: bobg

Do you have a concrete use case that would benefit from it being a type parameter?

Not offhand, no, though I'll observe that named types defined in terms of function signatures are rare but not unheard-of; e.g., WalkDirFunc.

(Tangent.)

Still trying to develop some intuition for when a type parameter is needed vs. when it's OK to require a specific type in terms of other type parameters. Example, from the slices package:

func Equal[S ~[]E, E comparable](s1, s2 S) bool

Why is this not:

func Equal[E comparable](s1, s2 []E) bool

For that matter: is there some convention that dictates the order of the type parameters? For something like slices.Equal I would naively expect the type parameters to be reversed, so that it can be partially instantiated with a simple slices.Equal[int].

Comment From: bobg

I'll observe that named types defined in terms of function signatures are rare but not unheard-of; e.g., WalkDirFunc.

I should have added: there is also the example of WalkDir, a function defined in terms of that function type.

Comment From: Merovius

@bobg Yes, slices.Equal is exactly a case where there is an argument in favor of using type-parameters, because you might use that as a higher-level function with a custom slice type. E.g. type Ints []int; var s []Ints; slices.CompactFunc(s, slices.Equal) (compare this snippet and that snippet). Many of the other cases in the slices package are purely for consistency, though.

WalkDirFunc exists mainly for documentation purposes and it doesn't get in the way, because while it is indeed used in WalkDir, you are unlikely to try to assign WalkDir to a func(fs.FS, string, func(string, fs.DirEntry, error) error) error.

That's why I mentioned two conditions that have to happen at the same time: 1. using a type defined based on that signature and 2. trying to use that function type as an argument in a higher-order function. 2 is where type identity matters, not just assignability.

Comment From: DeedleFake

Since 1.23 came out, I've been going through a number of my projects and inserting iterators where it makes sense to. I've been able to eliminate quite a few intermediary slice allocations, which is quite nice, and I've also come to realize a few things about the iterator adapters. I have my own iterator adapter package that I built back when the first prototype came out to play around with, and now that I'm using them more in real code I've found that they're significantly less useful than I expected them to be simply because the syntax is too clunky.

For certain small things, they can be useful, such as something like xiter.Filter(times, time.Time.IsDST), but anything more complicated gets very awkward. For example, chaining even a single map and filter together with custom anonymous functions for each results about half the code being type signatures for the functions that are all types simply repeated from elsewhere, as well as just being written in a kind of awkward, inside out order:

newtitles := xiter.Map(
  func(data Data) string { return data.Title },
  xiter.Filter(
    func(data Data) bool { return data.Time.After(threshold) },
    dataSource,
  ),
)

Instead, what I've found is that the far better approach in most situations is to take advantage of the fact that iterators are just functions and write an intermediary one as a closure. That reduces the type signature overhead significantly:

newtitles := func(yield func(string) bool) {
  for data := range dataSource {
    if data.Time.Before(threshold) {
      continue
    }
    if !yield(data.Title) {
      return
    }
  }
}

It's a few extra lines, but it's a lot more flexible and a lot more readable.

I think improving the ergonomics of the adapter functions would require at the very least #21498, and possibly some variant of #49085 or some other way to write the adapter pipeline top-to-bottom instead of inside out.

Comment From: bobg

@DeedleFake It makes a big difference when you use intermediate variables instead of nesting expressions the way you have. It also improves things when any callback is the last argument to the function that needs it.

I also have a version of the xiter proposal implemented, here, but with callbacks last (plus other useful additions). Using it, your first example could be written as:

var (
  newItems  = seqs.Filter(dataSource, func(data Data) bool { return data.Time.After(threshold) })
  newTitles = seqs.Map(newItems, func(data Data) string { return data.Title }
)

which seems plenty readable to me.

Comment From: DeedleFake

My implementation actually does have callbacks last, as that's the way that I tend to think of the flow going: Thing you're operating on and then config for the action. It's like a receiver, in a way. But the upside to putting them first is that when nesting them, the function is next to the iterator operation. For example, if you try to nest the calls in your example, it'll wind up being

newtitles := xiter.Map(
  xiter.Filter(
    dataSource,
    func(data Data) bool { return data.Time.After(threshold) } // Filter
  ),
  func(data Data) string { return data.Title } // Map
)

You wind up having to read them in a spiral. By putting them the other way around, you can just read it backwards. It's weird, but it keeps things in the right order.

However, all of that being said, even if you store each step in an intermediate variable, there's still a lot clutter from the types in the function signatures, as well as needing to come up with names for the intermediate variables. For that particular example it's not a big deal, but with a more complicated chain you could have four or five steps to it. Having to name all of those is a lot of cognitive overhead and local namespace pollution that's completely avoided by using the closure approach.

Comment From: bobg

You wind up having to read them in a spiral.

Haha, that's really well-put.

I see your point about putting the callback first, and about the overhead of naming intermediate steps. But that overhead is mainly on the program author, not the reader, and Go favors the reader at the expense of a little more work for the author. As a reader I'd certainly rather encounter something like this:

var (
  ints             = seqs.Ints(1, 1)
  primes           = seqs.Filter(ints, isPrime)
  enumeratedPrimes = seqs.Enumerate(primes)
  superPrimePairs  = seqs.Filter2(enumeratedPrimes, func(pos, _ int) bool { return isPrime(pos+1) })
  superPrimes      = seqs.Right(superPrimePairs)
)

than whatever the nested version of that would be (which I concede might not be quite so horrifying if we had lightweight anonymous functions, as you pointed out above).

Comment From: adonovan

The iter package could use an efficient way to answer questions about the length of a sequence. A Len operator is an obvious solution, but it must consume the entire (possibly infinite!) sequence even though the most common questions are typically "is it empty?" and "does it have more than one element?".

I propose:

package iter

// Empty reports whether the sequence is empty.
func Empty[T any](seq iter.Seq[T]) bool {
   return !LongerThan(0, seq)
}

// LongerThan reports whether the sequence seq has more than n elements.
func LongerThan[T any](seq iter.Seq[T], n int) bool {
    i := 0
    for range seq {
        i++
        if i > n {
           return true
        }
    }
    return false
}

Comment From: DeedleFake

What I've been doing when someone wants a size is to just pass it alongside:

func Example[T any](values iter.Seq[T], numvalues int)

For emptiness checking, it might make sense to do func Empty[T any](seq iter.Seq[T]) (iter.Seq[T], bool) where the boolean indicates if it was empty or not and the returned iterator yields the one value that it had to get and then continues the underlying iterator.

Comment From: Merovius

@adonovan An O(n) Len operator seems pretty problematic to me. I believe it would frequently to accidentally quadratic behavior. It also assumes that the iter.Seq is restartable, which is not always the case. And I honestly can't imagine a lot of places where I really need to ask questions about the length of a sequence that wouldn't be better served by ranging over it and counting on the go. So, at the very least, I think a few examples for LongerThan would be helpful.

Comment From: josharian

Specifically for empty versus non-empty checks, the function could cache the single value it saw, if any, and return a new iterator that then yields the original sequence.

Comment From: jimmyfrasche

while playing around with iterators I found the following helpful for constructing higher-order iterators which often needed to prime the pump:

func Head[T any](seq iter.Seq[T]) (head T, ok bool, tail iter.Seq[T])

To get that to work you need to Pull the seq but then undo that to convert it back to a push iter. You'd need the same for replaying the sequence like @DeedleFake and @josharian mentioned. So maybe in addition to Pull maybe there needs to be a:

func Push[T any](next func() (T, bool), stop func()) iter.Seq[T]

Comment From: earthboundkid

I hadn't written Head, but I love the API design and will steal it. I have written and frequently use First(iter.Seq[T]) T, which just returns a zero value for empty sequences.

Comment From: DeedleFake

@jimmyfrasche

The issue with something like Head() or Empty() is that the coroutine leaks if the user doesn't actually use the returned iterator, unless you've got some other way to implement it. It looks something like the following in my head:

func Empty[T any](seq iter.Seq[T]) (iter.Seq[T], bool) {
  next, stop := iter.Pull(seq)
  first, ok := next()
  if !ok {
    return func(func(T) bool) {}, false
  }

  return func(yield func(T) bool) {
    defer stop() // This won't happen unless this returned iterator is used.

    if !yield(first) {
      return
    }
    for {
      v, ok := next()
      if !ok || !yield(v) {
        return
      }
  }
}

I'm also not sure that a Push() converter is necessary. It's very easy to do manually, as I did above. Pull() is necessary because doing it without coroutines has a lot more overhead, though coroutines themselves have quite a bit, too.

Comment From: jimmyfrasche

It would leak if unused and the basic implementation is simple. The leaking issue might be ameliorated with a finalizer/cleanup, though that makes the implementation more complex. It might also be possible for the runtime and/or compiler to do some tricks to avoid the coroutine when possible, though that may require properties of the iterator being pull/push'd that may not be easy to know.

Comment From: earthboundkid

I wonder if there isn't some way to thread through the yield functions so you don't need to use iter.Pull, but I will admit I haven't figured it out yet. The simplest solution is just to return the stop function:

func Head[T any](seq iter.Seq[T]) (iter.Seq[T], func(), T, bool) {
  next, stop := iter.Pull(seq)
  first, ok := next()
  if !ok {
    return func(func(T) bool) {}, stop, first, false
  }

  return func(yield func(T) bool) {
    for {
      v, ok := next()
      if !ok || !yield(v) {
        return
      }
  }, stop, first, true
}

Comment From: isgj

IMHO the adaptors shouldn't return all those values, they should : - wrap sequences, meaning their code will run when the seq is being consumed (Limit, Map, ...) - or consume the seq and return a result (Equal, Reduce, ...)

Otherwise we should return the sequences also for Equal, Reduce , but maybe (and this is my opinion) it's easier to just iterate over again, or use an if if we want to check the emptiness while looping (much performant than iter.Pull), or maybe the sequences are multi use as mentioned in the iter documentation (they can continue where they left, or start from the beginning again). It depends on the use case.

Still some adaptors I think might be useful are:

// Or All, but in Go All is used to create a sequence over all elements
func Every[V any](iter.Seq[V], func(V) bool) bool

// it covers more cases than `First` or `Head`
// the use as `First`: Find(seq, func(_ V) bool { return true })
func Find[V any](iter.Seq[V], func(V) bool) (V, ok)

// Or Any, but in Go any is an interface
// use as `Empty`: !Some(seq, func(_ V) bool { return true })
func Some[V any](iter.Seq[V], func(V) bool) bool


// the following I have used rarely (or never), but might be useful

func Count[V any](iter.Seq[V]) int     // better than Len I think

// handy if you consume the seq and want to keep track where you are
func Enumerate[V any](iter.Seq[V]) iter.Seq[int, V]

Comment From: gophun

IMHO the adaptors shouldn't return all those values, they should :

wrap sequences, meaning their code will run when the seq is being consumed (Limit, Map, ...)

@isgj I don't understand what you mean. Limit and Map don't return values, they return sequences. Can you demonstrate the changes you have in mind using the Map function as a simple example?

Comment From: gophun

@isgj Sorry, I guess your sentence was a response to @earthboundkid's comment, not to the proposed design.

Comment From: jub0bs

@jimmyfrasche

go func Head[T any](seq iter.Seq[T]) (head T, ok bool, tail iter.Seq[T])

I agree that something like this would be useful, but naming it Head even though it also returns the tail isn't great. In other languages like Haskell, such a function is typically named "uncons"; not very Go-like, I'll give you that. Also, wouldn't a bool result coming last be more idiomatic?

~~Instead, I would suggest the following:~~

func Head[E any](seq iter.Seq[E]) (E, bool) {
    for e := range seq {
        return e, true
    }
    var zero E
    return zero, false
}

func Tail[E any](seq iter.Seq[E]) (iter.Seq[E], bool) {
    next, stop := iter.Pull(seq)
    if _, ok := next(); !ok {
        return nil, false
    }
    f := func(yield func(E) bool) {
        defer stop()
        for {
            e, ok := next()
            if !ok {
                return
            }
            if !yield(e) {
                return
            }
        }
    }
    return f, true
}

~~and perhaps also something like~~

func Uncons[E any](seq iter.Seq[E]) (E, iter.Seq[E], bool)

Comment From: jimmyfrasche

@jub0bs That makes assumptions about iterators that don't hold in general. They do not need to return the same sequence each time. Consider an iterator that reads from a file and returns a single record. If you used your Head and Tail back to back you'd skip the second record.

Comment From: bobg

Two problems with your suggestions, @jub0bs: Head unrecoverably discards the rest of the iterator (which isn't resumable), and Tail requires the caller to use the returned iterator, otherwise the defer stop() never gets called and resources can leak from the input iterator.

wouldn't a bool result coming last be more idiomatic?

It's bikeshedding, but my $.02 is that I'd rather have the bool paired with the value whose ok-ness it's annotating; i.e. Uncons(...) (E, bool, iter.Seq[E]).

Comment From: DeedleFake

The more I think about this, the less I think that an operation that splits the iterator in this way is a good idea. It's very easy to do manually already with iter.Pull(), and every implementation in here, including my own, is error-prone because of the strange side effect of requiring that the returned tail iterator be used in order to ensure that the underlying pull iterator gets stopped. On top of that, if someone uses it repeatedly to get elements one at a time, something that they should definitely use iter.Pull() for instead, it'll wind up stacking pull iterators on top of each other and destroying performance.

And almost every problem that this tries to solve can be done just as easily with an iter.Push() convenience function to convert an iterator back from the return of an iter.Pull() call.

Comment From: jub0bs

@jimmyfrasche ~~Maybe I'm missing something, but you're describing is a single-use iterator, which doesn't seem specific to the functions I suggest. Can you link to a Playground that illustrates your point?~~ Excellent point!

@bobg

Head unrecoverably discards the rest of the iterator

~~I don't perceive this as problematic. For a single-use iterator, restarting the loop would yield the second element and so on; for a multiple-use iterator, the loop would yield the first one again.~~ That wouldn't be too bad if the caller only cared about the head and if my implementation didn't leak the pull-based iterator.

Tail requires the caller to use the returned iterator, otherwise the defer stop() never gets called and resources can leak from the input iterator.

Very good point, which I had missed. I'll have to think about this a bit more, it seems.

Comment From: Merovius

My take (playground):


func Cut[E any](s iter.Seq[E]) (head E, tail iter.Seq[E], ok bool) {
    for v := range s {
        head, ok = v, true
        break
    }
    tail = func(yield func(E) bool) {
        if !ok {
            return
        }
        first := true
        for v := range s {
            if first {
                first = false
                continue
            }
            if !yield(v) {
                return
            }
        }
    }
    return head, tail, ok
}

Though, for the record, I'm against including something like this, for reasons already mentioned by others.

Comment From: jub0bs

@DeedleFake @Merovius After thinking about this a bit more, I have to agree: I can't think of a way for functions like Head, Tail, Uncons to work with single-use iterators. In that case, leaving them out is preferable. The desired "logic" (treating the first element, if any, in a special way) is easy enough to implement in the loop of a push-based iterator.

Comment From: jimmyfrasche

Every case I've had for Head has been to simplify pattern I kept coming across in higher order iterators:

first, once := true, false
for v := range seq {
  if first {
    first = false
   // prime the pump
  } else {
     once = true
    // actual loop code
  }
}
if first && !once {
  // special case for one value seq
}

In terms of this thread, though, I only mentioned Head as another thing that could be implemented with Push.

Comment From: jub0bs

The more I ponder the design of a library that would complement iter, the more I'm wary of sink functions that may never terminate, such as the first four functions suggested in https://github.com/golang/go/issues/61898#issuecomment-2333767460. After all, just like a caller cannot know a priori what values a context.Context contains, the caller cannot know a priori whether a given iter.Seq[E] or iter.Seq2[K,V] is finite.

For example, the program below never terminates:

package main

import (
    "fmt"
    "iter"
)

func main() {
    fmt.Println(Count(someFunction()))
}

func Count[E any](seq iter.Seq[E]) int {
    var n int
    for range seq {
        n++
    }
    return n
}

func someFunction() iter.Seq[string] {
    return Repeat("foo")
}

func Repeat[E any](e E) iter.Seq[E] {
    return func(yield func(E) bool) {
        for yield(e) {
            // deliberately empty body
        }
    }
}

(Playground)

IMO, sink functions that may not terminate are simply too easy to misuse. As such, they're better left out. Or alternatives guaranteed to terminate like that suggested in https://github.com/golang/go/issues/61898#issuecomment-2325109019 should be provided instead.

Comment From: Merovius

@jub0bs I'll note that you don't know a priori whether an io.Reader ever returns io.EOF, yet we are fine with io.Copy etc. existing.

Comment From: jub0bs

@Merovius Good point, but I suspect that infinite iterators will be more common than io.Readers that never return io.EOF. For example, I don't think my Repeat function or the following function are particularly farfetched:

func Iterate[E any](e E, f func(E) E) iter.Seq[E] {
    return func(yield func(E) bool) {
        for yield(e) {
            e = f(e)
        }
    }
}

Comment From: jub0bs

Or, if such sinks are included in x/exp/xiter, their documentation should clearly state that the caller is responsible for passing them only finite iterators.

Comment From: adonovan

I'm not sure I see the danger with Repeat or Iterate; both seem like elegant solutions to some problems. As @Merovius said, just as with io.Readers, some iterators are finite, some infinite. Just because the Sum or Count of an infinite sequence is bottom doesn't mean there is a problem with Sum or Count, or with your infinite sequence.

Comment From: jub0bs

@adonovan I do like all those functions. I just think that people not coming from an FP background should be warned in the documentation that some sinks expect a finite iterator.

Comment From: DeedleFake

Another function proposal:

func Drain[T any](seq iter.Seq[T]) (last T, ok bool) {
  for v := range seq {
    last = v
    ok = true
  }
  return last, ok
}

Last() could also work as a name.

This may have been proposed further up, and if so sorry for repeating it. I scanned through but didn't see anything obvious.

Comment From: eihigh

In my opinion, a Push converter that pairs with iter.Pull would be useful. Push would allow values to be pushed gradually to functions that take an iterator as an argument.

func Push[V any](recv func(iter.Seq[V])) (push func(V) bool) {
    var in V

    coro := func(yieldCoro func(struct{}) bool) {
        seq := func(yieldSeq func(V) bool) {
            for yieldSeq(in) {
                yieldCoro(struct{}{})
            }
        }
        recv(seq)
    }

    next, stop := iter.Pull(coro)
    return func(v V) bool {
        in = v
        _, more := next()
        if !more {
            stop()
            return false
        }
        return true
    }
}

func main() {
    sum := func(src iter.Seq[int]) {
        sum := 0
        for v := range src {
            sum += v
            fmt.Printf("- sum: %d\n", sum)
        }
    }
    pushSum := Push(sum)

    for {
        var n int
        fmt.Scanln(&n)
        if !pushSum(n) {
            break
        }
    }
}

$ go run .
1
- sum: 1
2
- sum: 3
3
- sum: 6

This is necessary for handling endless data streams, such as real-time metrics or event sequences, as iterators. Additionally, while iter.Pull allowed for integrating multiple iterators like Zip and Merge, conversely, distributing from a single iterator to multiple iterators is difficult without Push.

func main() {
    fizz := func(src iter.Seq[int]) {
        for v := range src {
            if v%3 == 0 {
                fmt.Println("- fizz")
            }
        }
    }
    pushFizz := Push(fizz)

    buzz := func(src iter.Seq[int]) {
        for v := range src {
            if v%5 == 0 {
                fmt.Println("- buzz")
            }
        }
    }
    pushBuzz := Push(buzz)

    sum := func(src iter.Seq[int]) {
        sum := 0
        for v := range src {
            sum += v
            fmt.Printf("- sum: %d\n", sum)
            if !pushFizz(sum) || !pushBuzz(sum) {
                return
            }
        }
    }
    pushSum := Push(sum)

    for {
        var n int
        fmt.Scanln(&n)
        if !pushSum(n) {
            break
        }
    }
}

$ go run .
1
- sum: 1
2
- sum: 3
- fizz
3
- sum: 6
- fizz
4
- sum: 10
- buzz
5
- sum: 15
- fizz
- buzz
6
- sum: 21
- fizz

Comment From: DmitriyMV

Another function proposal:

go func Drain[T any](seq iter.Seq[T]) (last T, ok bool) { for v := range seq { last = v ok = true } return last, ok }

Last() could also work as a name.

This may have been proposed further up, and if so sorry for repeating it. I scanned through but didn't see anything obvious.

This will not work.

Comment From: adonovan

This will not work.

That depends how you define it: it correctly returns the last element, which means it must have iterated over the whole sequence. If the Seq is a one-shot, then it will have been drained. "Last" seems like a clearer name for this operation because that's what it guarantees; "Drain" only makes sense for one-shot sequences (which, one hopes, are far from the norm).

In any case I'm not convinced that this is something we should put in the library. Users might easily think that Last on a Seq whose representation supports random access, such as a slice, would be O(1), but in fact it is not, and indeed cannot be, efficient, because there is no way to ask a Seq if it supports a narrower random-access interface.

Comment From: dfreese

I was experimenting with this API and one thing I ran into was that there is not a way to turn iter.Seq[T] into iter.Seq2[T] and vice versa. There are Zip and Zip2, however, those don't compose well with other iterator functions, as often an iterator is described once as the result of other functions.

The use case I ran into was building up a map[string]myStruct, where the key was an identifier pulled from the struct that would be used for lookup.

func Split[In, KOut, VOut any](f func(In) (KOut, VOut), seq iter.Seq[In]) iter.Seq2[KOut, VOut] {
    return func(yield func(KOut, VOut) bool) {
        for in := range seq {
            if !yield(f(in)) {
                return
            }
        }
    }
}

func Combine[KIn, VIn, Out any](f func(KIn, VIn) Out, seq iter.Seq2[KIn, VIn]) iter.Seq[Out] {
    return func(yield func(KIn, VIn) bool) {
        for k, v := range seq {
            if !yield(f(k, v)) {
                return
            }
        }
    }
}

That would allow things such as:

type data struct {
    id string
}

func foo(datas []*data) map[string]*data {
    return maps.Collect(xiter.Split(func (d *data) {
        return d.id, d
    }, slices.Values(datas)))
}

Edit: I see now that this has been discussed at length, sorry for the noise.

Comment From: DeedleFake

More and more I see examples like that that make me want both some variant of #21498 and a pipe operator:

return slices.Values(datas) |>
  xiter.Split(func { d -> d.id, d }) |>
  maps.Collect()

I feel like most of the iterator adapters will have very limited usefulness without especially #21498.

Comment From: DmitriyMV

@DeedleFake I doubt this is ever going to happen tho, since inferring the number arguments is now dependent of what is before the call. It hurts readability for those unfamiliar with languages like Elm, Haskell and so on.

Comment From: jba

70140 erroneously duplicates Merge but adds Intersect, Union and Subtract, which are only meaningful on sorted sequences. It's unclear whether they should be in a separate package.

Comment From: stampflit

I'm wondering whether this discussion has derailed too much from the original purpose. Some of the functions in this proposal see a lot of agreement while there are others that are under much more scrutiny.

https://github.com/golang/go/issues/61898#issuecomment-2123357427 by @jimmyfrasche already compiled an overview of different functions being proposed.

The uncontroversial functions among these functions arguably provide the most value for the ecosystem. The set of uncontroversial functions is also pretty close to what initially was suggested. During the discussion more functions were added. But there is little value in adding all of these functions all at once and this discussion appears like it can easily go on for a long time.

I suggest that the work in this PR should focus on landing the uncontroversial functions (Concat{,2}, Filter{,2}, Map{,2}, Merge{,Func}{,2}, Reduce{,2}) and discussing the other functions in other PRs (@jimmyfrasche opened many of them and linked them in the mentioned comment).

Comment From: earthboundkid

FWIW, I consider Map and Reduce very controversial because they will shift Go from using for-loops everywhere to using callback functions everywhere!

Comment From: earthboundkid

Iterators have been out for a little while now. I would be more interested instead of adding x/exp/xiter in hearing what people have in their personal iterator utility libraries and what's useful, what's not, etc.

Comment From: DeedleFake

My personal utility library is deedles.dev/xiter. Overall, I find that I don't use it that much, but not because it isn't useful. I have used a few things in there, especially a xiter.Enumerate(), xiter.V2(), and whatnot for dealing with the discrepancy between iter.Seq and iter.Seq2, as well as several things related to xiter.Pair.

However, I have found myself not wanting to use it for things like xiter.Map() and xiter.Filter() not because they can't be useful, but rather because the syntax is too awkward without #21498 and without some way to write chains of calls in a left-to-right manner such as either generic methods or a pipe operator. Instead I usually just write an anonymous wrapper iterator. Something like

values := func(yield func(v string) bool) {
  for v := range original {
    if check(v) {
      if !yield(modify(v)) {
        return
      }
    }
  }
}

This is a bit overkill sometimes, though, as the pyramid can get a bit out of hand. I also find myself replicating slices.Collect() sometimes using a normal for loop simply because, without a useful, short Map() function with a short anonymous function syntax, it actually winds up being less code to do so.

All that being said, I find myself importing deedles.dev/xiter in pretty much every piece of code that I've worked on since 1.23 came out.

Comment From: jub0bs

I have produced github.com/jub0bs/iterutil, more for practice and as a showcase than anything else. I don't really use it much yet.

I do like https://pkg.go.dev/github.com/jub0bs/iterutil#SortedFromMap, though; it's more efficient than simply chaining maps.Keys with slices.Sorted.

Comment From: Merovius

@DeedleFake Not to nitpick, or anything, but

Overall, I find that I don't use it that much, but not because it isn't useful.

I would argue that "it is used often" is pretty much the definition of "useful".

Comment From: DmitriyMV

@jub0bs

it's more efficient than simply chaining maps.Keys with slices.Sorted.

It can't be more efficient since it ranges over slices.Sorted(maps.Keys(m)) right after you did ks := keys(m); slices.Sort(ks). I think it's a typo.

Comment From: jub0bs

@DmitriyMV Indeed! That is a typo. Thanks. This is what I had in mind:

func SortedFromMap[M ~map[K]V, K cmp.Ordered, V any](m M) iter.Seq2[K, V] {
    return func(yield func(K, V) bool) {
        ks := keys(m)
        slices.Sort(ks)
        for _, k := range ks {
            if !yield(k, m[k]) {
                return
            }
        }
    }
}

func keys[K comparable, V any](m map[K]V) []K {
    ks := make([]K, 0, len(m))
    for k := range m {
        ks = append(ks, k)
    }
    return ks
}

That should be more efficient than simply ranging over slices.Sorted(map.Keys(m)) because it uses the knowledge of the number of pairs in the map. I really need to add benchmarks to this library.

Comment From: earthboundkid

Another way to do SortedFromMap would be to use a heap, so you don't have to pay the full cost of sorting if you end up breaking the loop before going through all items.

Comment From: josharian

I looked through my code and the main themes I saw lined up almost precisely with @DeedleFake's https://github.com/golang/go/issues/61898#issuecomment-2476922128.

Using these adapters individually is noisy because of the verbosity of the inline function definitions. In some places, using a function or method works and reads well, but not always. https://github.com/golang/go/issues/21498 would help some, but even with that in place, the argument order to Map and friends is wrong--the function should come last.
Composing these adapters creates an "inside out" structure that is hard to read and write. (Ironically similar to C's typedefs that Go's design intentionally avoided.)

Consider counting the number of elements in a slice that have value x. You can do this with something like:

xiter.Len(xiter.Filter(func (t T) { return t == x }), slices.Values(s))

This is much harder to read than left-to-right (pseudo-code): s.Values().Filter(func (t T) { return t == x })).Len()

The difficulty of composition has led in practice to a proliferation of helpers that simply combine two or more steps.

A large proportion of uses apply to slices. The awkwardness of the inside out structure plus slices.Values and slices.Collect has led to slices-only copies of the most popular functions such as (note the corrected argument order):

func Mapped[Slice ~[]E, E, T any](s Slice, f func(E) T) []T {
    return slices.Collect(Map(f, slices.Values(s)))
}

Comment From: gophun

This is much harder to read than left-to-right (pseudo-code): s.Values().Filter(func (t T) { return t == x })).Len()

To be honest, it should read like this:

cnt := 0
for _, t := range s {
    if t == x {
        cnt++
    }
}

I thought this is what we as the Go community have agreed on in the past 15 years.

Comment From: earthboundkid

I thought this is what we as the Go community have agreed on in the past 15 years.

I wouldn't say that this is settled, but this is what I was talking about when I said,

FWIW, I consider Map and Reduce very controversial because they will shift Go from using for-loops everywhere to using callback functions everywhere!

I do see a lot of advantage in the for-loop heavy style because everyone needs to know how to use for-loops anyway, and typically the clarity from using declarative names like "map" and "filter" ends up getting eaten up in other ways, like the length of callback signatures. It's not clear which way we want Go to evolve.

Comment From: DeedleFake

@gophun

The main benefit of iterators over a manual for loop over some data structure is that they are essentially an abstraction of a loop as data, allowing it to, in pieces, be passed, modified, and dealt with across function calls. In my own projects, being able to pass something an iter.Seq has improved the code significantly, both in terms of cleanliness and in terms of necessary allocations.

Before this, in order to modify some data source and pass it to something, either that thing would need to know about the data source, breaking the abstraction, or I would need to allocate a slice to store the modified data in. Now I can apply some adapters and pass the resulting iter.Seq to the function and the function, effectively allowing me to pass the function a for loop. This also reduces the number of loops, as instead of needing to loop to create the slice and then loop again over the slice, the function the iter.Seq is passed to can be the only one looping.

Comment From: josharian

For loops can be clear and concise. Functional programming style can be clear and concise. I would like to be able to reach for either.

The discussion here highlights how far we still are from the latter in Go. Iterators are wonderful and I'm glad we have them. But without lambdas and pipes (or their moral equivalents) we are not able to use them comfortably and composably.

Comment From: jub0bs

@earthboundkid

Another way to do SortedFromMap would be to use a heap [...]

That's a good idea! I'll see if I can get that to work.

Comment From: josharian

Spitballing, thinking about what we can do with the tools we have...

One way to move from inside-out toward RTL (not as good as LTR!) and slightly finesse the argument order problem is to return functions that transform iterators.

For example:

// Map converts f into an iterator mapper.
func Map[In, Out any](f func(In) Out) func(seq iter.Seq[In]) iter.Seq[Out] {
    return func(seq iter.Seq[In]) iter.Seq[Out] {
        return func(yield func(Out) bool) {
            for in := range seq {
                if !yield(f(in)) {
                    return
                }
            }
        }
    }
}

// Limit returns a function that wraps seq to stop after n values.
func Limit[V any](n int) func(seq iter.Seq[V]) iter.Seq[V] {
    return func(seq iter.Seq[V]) iter.Seq[V] {
        return func(yield func(V) bool) {
            if n <= 0 {
                return
            }
            for v := range seq {
                if !yield(v) {
                    return
                }
                if n--; n <= 0 {
                    break
                }
            }
        }
    }
}

This adds a layer of indirection and leads to a LISP problem (lots of irritating silly parenthesis), but does slightly improve composability.

I wouldn't advocate for this as-is, but throwing it out there in case it sparks new ideas...

Comment From: josharian

And switching from forest to trees...

Limit[2] would be slightly nicer to use with flags (my de facto primary use case for it) if it treated n < 0 as "unbounded" rather than as 0.

Comment From: jub0bs

@earthboundkid

Another way to do SortedFromMap would be to use a heap, so you don't have to pay the full cost of sorting if you end up breaking the loop before going through all items.

I'm coming back to this suggestion of yours, and I'm no longer sure I understand it. Regardless of how much of the resulting iterator is consumed, you would need to collect all of the map's keys and sort them all anyway, wouldn't you? And heapsort applied to n elements is O(n log(n)) worst-case time complexity and O(n) worst-case space complexity, which is the same complexity as collecting all keys in a properly dimensioned slice and then calling slices.Sort on it. What am I missing?

Comment From: DeedleFake

@jub0bs

If you collect it into a heap, it's the same cost as sorting in advance but it defers part of that cost into the iteration, which means that if you exit the loop early then that extra cost is not pointlessly expended.

Comment From: jub0bs

@DeedleFake I don't want to derail the conversation, but can you be more specific? What "part of the cost" does it defer into the iteration, exactly?

Comment From: DeedleFake

Sorting the whole slice in advance is O(n * log(n)). Iteration is then O(n).

Building the heap is O(n). Iteration, which includes popping from the heap, is O(n * log(n)).

Comment From: jub0bs

@DeedleFake Ah yes. I had overlooked the fact that you can build the heap in O(n). Thanks!

Comment From: AlekSi

In https://github.com/golang/go/issues/53987#issuecomment-1671925841, @JeremyLoy proposed an iterator-based addition to (what become) slices.Chunk. I already needed it in multiple places, so I propose adding it. It would allow passing []E to functions that could process the whole slice at once instead of one E at a time. For example, given a MongoDB database and a generator of documents, one could use

for d := range xiter.Chunk(docs, 1000) {
    db.InsertMany(d) // sends a single insert command with up to 1000 documents
}

instead of less effective

for d := range docs {
    db.InsertOne(r) // sends a single insert command with a single document
}

In those examples, InsertMany and InsertOne are MongoDB driver's methods.

Just like slices.Chunk allows sub-slices to be used as iterators, xiter.Chunk would allow iterators to be used as (sub)slices.

// Chunk returns an iterator over consecutive slices of up to n elements of seq.
// All but the last slice will have size n.
// All slices are clipped to have no capacity beyond the length.
// If seq is empty, the sequence is empty: there is no empty slice in the sequence.
// Chunk panics if n is less than 1.
func Chunk[E any](seq iter.Seq[E], n int) iter.Seq[[]E] {
    if n < 1 {
        panic("cannot be less than 1")
    }

    return func(yield func([]E) bool) {
        var batch []E

        for e := range seq {
            if batch == nil {
                batch = make([]E, 0, n)
            }

            batch = append(batch, e)

            if len(batch) == n {
                if !yield(batch) {
                    return
                }

                batch = nil
            }
        }

        if l := len(batch); l > 0 {
            yield(batch[:l:l])
        }
    }
}

Comment From: DeedleFake

@AlekSi

I've got an implementation of that in my iterator support package. Mine reuses the slice for each chunk, leaving it up to the caller to copy them if they want to keep previous ones into the next iteration. I think that's the better approach as the user can make the decision that way. Another alternative is to allow the user to pass a slice to be used, with the capacity of the slice serving as the size of the chunks.

Comment From: anatoly-kussul

@AlekSi

imo it would be more convenient if Chunk returned iter.Seq[iter.Seq[E]] instead of iter.Seq[[]E] something like

// Chunk returns an iterator over consecutive iterators of up to n elements of seq.
// Chunk panics if n is less than 1.
func Chunk[E any](seq iter.Seq[E], n int) iter.Seq[iter.Seq[E]] {
    if n < 1 {
        panic("cannot be less than 1")
    }

    return func(yield func(iter.Seq[E]) bool) {
        var stopped bool
        take := func(next func() (E, bool), n int) iter.Seq[E] {
            return func(yield func(E) bool) {
                for range n {
                    item, ok := next()
                    if !ok || !yield(item) {
                        stopped = true
                        return
                    }
                }
            }
        }

        next, stop := iter.Pull(seq)
        defer stop()
        for {
            if stopped || !yield(take(next, n)) {
                return
            }
        }
    }
}

you can then aggregate them to slices if needed like:

func ChunkSliced[E any](seq iter.Seq[E], n int) iter.Seq[[]E] {
    return func(yield func([]E) bool) {
        for chunkSeq := range Chunk(seq, n) {
            chunk := slices.Collect(chunkSeq)
            if len(chunk) == 0 || !yield(chunk) {
                return
            }
        }
    }
}

func ChunkSlicedInplace[E any](seq iter.Seq[E], chunk []E) iter.Seq[[]E] {
    return func(yield func([]E) bool) {
        for chunkSeq := range Chunk(seq, cap(chunk)) {
            clear(chunk)
            chunk = chunk[:0]
            chunk = slices.AppendSeq(chunk, chunkSeq)
            if len(chunk) == 0 || !yield(chunk) {
                return
            }
        }
    }
}

Comment From: AlekSi

imo it would be more convenient if Chunk returned iter.Seq[iter.Seq[E]] instead of iter.Seq[[]E]

@anatoly-kussul I updated my example to show that your proposal would not work for existing functions that accept slices. And converting returned (sub)slices to iterators is already possible with slices.Values.

Comment From: anatoly-kussul

@AlekSi

And converting returned (sub)slices to iterators is already possible with slices.Values.

This wouldn't really work for cases where both iterator and your logic are I/O heavy.

For example: input Seq is some event broker consumer for each message you want to make network request. but after each batch you want to sleep for some time.

with Chunk returning iter.Seq[iter.Seq[E]] it would process each msg inside chunk as soon as it's ready and be as simple as: ```go for chunk := range Chunk(stream, n) { for msg := range chunk { // do something } time.Sleep(cooldown) }

while `iter.Seq[[]E]` approach would require to receive full batch first, which would be less efficient.

You can always convert `iter.Seq[E]` to `[]E` without impacting flow logic, but not the other way around.

**Comment From: apparentlymart**

Another situation where "sequences of sequences" can be useful is situations where each element of an input sequence can produce zero or more elements in a result but the boundaries between those sub-sequences are not important to the overall problem.

In situations like that (in other languages) I've found benefit in a "flatten" function, which I think would be spelled something like this in Go:

```go
func Flatten[V any](seq iter.Seq[iter.Seq[V]]) iter.Seq[V]

Although I expect other uses are possible, the main use I have in mind is first using Map to project from some input sequence to an iter.Seq[iter.Seq[V]] and then using Flatten to transform that into an easier-to-consume iter.Seq[V] to return.

Using Flatten in conjunction with Chunk would of course be pretty self-defeating :grinning: so the relevance here was only that sequences of sequences do tend to arise once you start doing iterator-combinator games like this library seems to be promoting, and what I've described here is another separate situation where one can end up holding nested sequences.

(I don't mean this comment to suggest that Flatten necessarily must be part of a first version of this package. I expect instead that we can wait to see how commonly it arises in practice, since it's not clear yet that Go programmers will make such extensive use of iterator "map" as is idiomatic in some other languages.)

Comment From: earthboundkid

Python does the equivalent of iter.Seq[iter.Seq[T]] and I have found it extremely inconvenient.

You could also do cooldown with

for i, item := range enumerate(stream) {
   if i != 0 && i%chunkSize == 0 {
      time.Sleep(cooldown)
   }
   doSomething(item)
}

That seems clearer to me.

Comment From: jub0bs

I don't want to derail the discussion, but I want to report some benchmark results for my SortedFromMap function. @earthboundkid's suggestion to rely on a binary heap was sound! ~My implementation of her idea seems to outperform the naive approach (consisting in sorting all the map's keys upfront) in most cases:~

Edit: Unfortunately, I had a bug in my reference "upfront_sort" implementation. I've fixed it and re-run my benchmarks. The new results indicate that the binary-heap implementation is only advantageous as the total number pairs grows and when the number of consumed pairs is somewhat smaller than the total number of pairs.

goos: darwin
goarch: amd64
pkg: github.com/jub0bs/iterutil
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                                               │ upfront_sort │             binary_heap              │
                                               │    sec/op    │    sec/op     vs base                │
SortedFromMap/pairs=16/consumed=at_most_16-8      583.0n ± 1%    726.1n ± 1%  +24.55% (p=0.000 n=10)
SortedFromMap/pairs=16/consumed=half-8            499.2n ± 0%    559.3n ± 1%  +12.04% (p=0.000 n=10)
SortedFromMap/pairs=16/consumed=all-8             583.1n ± 1%    723.0n ± 1%  +24.00% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=at_most_16-8      3.067µ ± 1%    2.460µ ± 1%  -19.80% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=half-8            3.328µ ± 1%    3.540µ ± 1%   +6.37% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=all-8             3.664µ ± 2%    5.095µ ± 1%  +39.06% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=at_most_16-8    14.663µ ± 1%    7.565µ ± 0%  -48.41% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=half-8           16.02µ ± 1%    15.35µ ± 1%   -4.16% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=all-8            18.18µ ± 1%    23.69µ ± 2%  +30.32% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=at_most_16-8    66.41µ ± 1%    26.95µ ± 0%  -59.42% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=half-8          79.15µ ± 2%    70.85µ ± 3%  -10.50% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=all-8           90.21µ ± 1%   109.31µ ± 1%  +21.18% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=at_most_16-8    300.5µ ± 2%    100.6µ ± 1%  -66.51% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=half-8          348.0µ ± 2%    303.1µ ± 1%  -12.90% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=all-8           398.1µ ± 1%    494.7µ ± 2%  +24.26% (p=0.000 n=10)
geomean                                           15.20µ         13.70µ        -9.85%

(no differences in terms of allocations)

I still believe that a binary heap is valuable. If x/exp/xiter ever sees the light of day and provides something like SortedFromMap, relying on a binary heap indeed seems like a good idea.

Comment From: morlay

a small idea to make xiter adapters composable

Added pipe helpers

package xiter

type Operator[I any, O any] func(in I) O

// pipe helpers
// added more Pipe to support n operators
func Pipe[I, O any](
    in I,
    op1 Operator[I, O],
) O {
    return op1(in)
}
func Pipe2[I, A, O any](
    in I,
    op1 Operator[I, A],
    op2 Operator[A, O],
) O {
    return op2(op1(in))
}
func Pipe3[I, A, B, O any](
    in I,
    op1 Operator[I, A],
    op2 Operator[A, B],
    op3 Operator[B, O],
) O {
    return op3(op2(op1(in)))
}
func Pipe4[I, A, B, C, O any](
    in I,
    op1 Operator[I, A],
    op2 Operator[A, B],
    op3 Operator[B, C],
    op4 Operator[C, O],
) O {
    return op4(op3(op2(op1(in))))
}

//  convert any function as operator 
func Do[I any, Arg1 any, O any](
    fn func(src I, arg1 Arg1) O,
    arg1 Arg1,
) Operator[I, O] {
    return func(src I) O {
        return fn(src, arg1)
    }
}

func Do2[I any, Arg1 any, Arg2 any, O any](
    fn func(src I, arg1 Arg1, arg2 Arg2) O,
    arg1 Arg1, arg2 Arg2,
) Operator[I, O] {
    return func(src I) O {
        return fn(src, arg1, arg2)
    }
}

Then we could

package main

func main() {
    // Must define the iter.Seq[int] to Pipe* infer the correct type
    var seq iter.Seq[int] = func(yield func(i int) bool) {
        for i := range 10 {
            if !yield(i) {
                return
            }
        }
    }

        // progress and collect
    _ = xiter.Pipe3(
        seq,
        xiter.Do(xiter.Filter, func(x int) bool { return x%2 == 0 }),
        xiter.Do(xiter.Map, func(x int) string { return fmt.Sprintf("%d", x*x) }),
        slices.Collect,
    )

        // compose for progress each sub chunk
        // then collect
    _ := xiter.Pipe3(
        seq,
        xiter.Do(xiter.Chunk[int], 3),
        xiter.Do(xiter.Map, func(values []int) int {
            return xiter.Pipe(
                slices.Values(values),
                xiter.Do2(xiter.Reduce, 0, func(ret int, v int) int { return ret + v }),
            )
        }),
        slices.Collect,
    )
}

Comment From: Jackuczys

I was starting to lose track so I went through and tried to find all the official positions and related proposals so far. Let me know if I got anything wrong. [...] Separate proposals: (please keep discussion of the below in their respective threads)

@jimmyfrasche would it be possible to link the proposals you listed in that comment in the issue description as well (or at least add a link to your comment, buried behind "Load more")? This is an enormous thread so most people probably won't see your quite helpful summary. The remarks about potentially dropping/renaming Zip and adding Map12/Map21 also seem useful to mention. This issue is quite discoverable through Google searches, not as much as other proposals :)

I'm not sure if there's a point in piling things up in this proposal by suggesting yet another function but it does support the case for addition of "classical" Zip (rather than just ZipLongest) and Zip12 so I figure what I'll say below is at least partially relevant to this issue. One thing that I think could be useful in terms of "classical" Zip (not as much the one from issue description that I'd call ZipLongest), i.e. this definition of it:

func Zip[V0, V1 any](iter.Seq[V0], iter.Seq[V1]) iter.Seq2[V0, V1]

would be something that allows creating an iterator that repeats the given item (or maybe items like slices.Repeat does for slices - there's something like that in Python called itertools.cycle()) defined number of times or even infinite number of times (since Zip would just stop after the shorter iterator stops yielding items):

// count == -1 could mean an infinite iterator
func Repeat[T any](x T, count int) iter.Seq[T]

This would allow something like:

list1 = []string{"apple", "banana", "orange"}
list2 = []string{"grapefruit", "banana", "pineapple"}
// merge into a deduplicated set
set := map[string]struct{}{}

// potentially could split this into xiter.Repeat and xiter.RepeatN like in `strings` package
valueIter := xiter.Repeat(struct{}{}, -1)

maps.Insert(set, xiter.Zip(slices.Values(list1), valueIter))
maps.Insert(set, xiter.Zip(slices.Values(list2), valueIter))

This specific case is also solved by a less universal Lift proposed earlier in this issue:

func Lift[V any](seq Seq[V]) Seq2[V, struct{}]

but it's a lot more restrictive compared to Repeat which can achieve the same while still allowing the user to not define their own anonymous function. You can also note that, when defining an anonymous function, all of this can technically be achieved with Map12 without using Zip at all:

func Map12(f func(V1) (K2, V2), Seq[V1]) Seq2[K2, V2]

in the following way:

valueFunc := func(string) (string, struct{}) { return struct{}{} }
maps.Insert(set, xiter.Map12(valueFunc, slices.Values(list1)))
maps.Insert(set, xiter.Map12(valueFunc, slices.Values(list2)))

though it starts to get tricky when trying to replicate itertools.cycle rather than just a single item repeat.

Comment From: jub0bs

@Jackuczys See

https://pkg.go.dev/github.com/jub0bs/iterutil#Zip
https://pkg.go.dev/github.com/jub0bs/iterutil#Repeat
https://pkg.go.dev/github.com/jub0bs/iterutil#Cycle

Comment From: dzherb

By the way, we can easily chain iterators like this:

type Stream[V any] iter.Seq[V]

func FromSeq[V any](seq iter.Seq[V]) Stream[V] {
    return Stream[V](seq)
}

func (s Stream[V]) Iterator() iter.Seq[V] {
    return iter.Seq[V](s)
}

func (s Stream[V]) Filter(predicate func(V) bool) Stream[V] {
    return FromSeq(xiter.Filter[V](s.Iterator(), predicate))
}

func (s Stream[V]) Map(mapper func(V) V) Stream[V] {
    return FromSeq(xiter.Map[V](s.Iterator(), mapper))
}

st := stream.FromSeq(seq).
    Filter(func(i int) bool { return i != 2 }).
    Map(func(i int) int { return i * 2 })

for s := range st {
    fmt.Println(s)
}

I created a draft of a package to explore this more, if you are interested :)

Comment From: Merovius

@dzherb Note that Map is usually expected to be able to have different input and output type arguments. That's not possible with a method. Hence the preference for functions, for consistency.

Comment From: rsc

I think it is probably time to wind down this proposal. Well-past time actually. It was a good discussion, but I don't think we have consensus on really adding any of these to the standard library and encourage what in many cases is overabstraction (even though in some cases it is not).

Most importantly, if we're not going to do xiter, we should stop talking about it and create more space for third-party packages, instead of standing in their way. We licked the cookie, but the cookie is now quite stale. Better to toss it and let other people bake some new ones. Or something like that.

Comment From: bGVia3VjaGVu

Without a heterogeneous collection xiter is going nowhere. I am not sure about third-party, because I does not seem worth it to import one for an iterator adapter.

and encourage what in many cases is overabstraction (even though in some cases it is not).

Isn't these already solved by making iterator adapters verbose to write. Coroutines and iterator enable reusable algorithms, both are better options than repeating for loops.

Comment From: DeedleFake

@rsc, the problem with this proposal is not the idea itself. Most people in here seem to like the idea of having iterator adapters in the standard library. The problem is that without generic methods and/or a pipe operator and shorter, easier-to-write anonymous functions, they are ergonomically unsound. I have an iterator adapter library of my own and while I have used it, I sometimes don't purely because the syntax is so awkward to write, not because it wouldn't simplify logic flow. These problems make evaluation of this proposal skewed because the biggest issues are with syntax and ergonomics, not utility or functionality.

Comment From: Merovius

@DeedleFake We have to evaluate library proposals in the context of the language that we have. That's not skewed. If the language ever changes, it seems easy enough to revisit this. But given that those changes are not really on the horizon, it makes no sense to add things to the standard library that aren't useful without them.

Comment From: earthboundkid

I am basically in favor of withdrawing this proposal because in practice while I have used iterators for real code, I haven't needed a swiss army set of iterator utilities. Instead I generally just write the ones I need at the time. That said, it does feel a little incomplete not to have map/filter/reduce in the standard library.

Comment From: adonovan

I am basically in favor of withdrawing this proposal because in practice while I have used iterators for real code, I haven't needed a swiss army set of iterator utilities. Instead I generally just write the ones I need at the time.

That's been my experience too. Perhaps at some point we can scan the corpus for the most commonly implemented helpers and consider adding them to iter based on the data.

Comment From: bGVia3VjaGVu

the problem with this proposal is not the idea itself. Most people in here seem to like the idea of having iterator adapters in the standard library. The problem is that without generic methods and/or a pipe operator and shorter, easier-to-write anonymous functions, they are ergonomically unsound. I have an iterator adapter library of my own and while I have used it, I sometimes don't purely because the syntax is so awkward to write, not because it wouldn't simplify logic flow. These problems make evaluation of this proposal skewed because the biggest issues are with syntax and ergonomics, not utility or functionality.

You're asking for type inference and the omission of the return keyword in closures, both syntactic sugar since these don't add new language features. The verbosity level of closures would be tied to the frequency of iterator adapters. This makes future language design harder, since Go may one day may decide to become less verbose without sacrificing readability.

I consider it bad practice to chain more than 2 iterator adapters, especially for those with functions as parameters. There are multiple problems with this approach;

Optimizing iterator adapters requires aggressive inlining, however Go does not have a sophisticated inliner.
Inlining is performed, before optimizations are applied.
The Go compiler generally performs less optimization to speed up compile times.
Closures and generics are not statically dispatched, thus there is even less opportunity for optimizations
If applied in the standard library, this would lead to a consistent performance penalty.
Finally it makes the code harder to read understand for humans and the Go compiler.

A lot of iterator adapters is what LLVM based languages can afford but unfortunately not go without sacrificing performance.

That's been my experience too. Perhaps at some point we can scan the corpus for the most commonly implemented helpers and consider adding them to iter based on the data.

I don't think this requires research, either iterator adapters are added or not.

Comment From: dfinkel

I think that even if we don't include the rest, Concat and Concat2 are useful enough (and needed commonly enough) that they should be considered for inclusion in either an xiter or iter-proper. (I've written both of those multiple times because it avoids a bunch of sequential loops)

FWIW, I'd be fine with a "this is less performant than an equivalent loop" warning on any iterator adapter.

Comment From: apparentlymart

I think it's probably true that I've not been using iter.Seq and iter.Seq2 with as much gusto as others in this discussion and so my experience might not be representative, but as I review the accumulated uses of both in some of the codebases I maintain I think the most notable gaps for me are represented by the separate proposal https://github.com/golang/go/issues/68947.

In those cases, I have a function that normally returns a more interesting sequence but has some situations where it returns a fixed result of either an empty sequence or a sequence containing just one placeholder value, primarily just to satisfy the function signature in a way that avoids the caller needing to make any special accommodation, similar to returning nil slices or maps so that a caller will just naturally take no action on the result.

The only example from this proposal that I've seen arise more than once is "filter", but even that is marginal since we've typically been able to do the filtering either at the source or the destination, rather than as an intermediate step. Our main focus has been to avoid materializing temporary slices just to conveniently iterate over them, and not to significantly change approach otherwise.

Of course I can't and don't intend to speak for anyone else's experience here, but I'd generally agree that it's not clear yet just how extensive the standard library needs to be in this area, or whether future language changes would want these to be designed in a different way, so I might suggest that this larger proposal be closed and instead focus on smaller proposals like https://github.com/golang/go/issues/68947 that each solve a specific set of well-understood problems. (While hopefully keeping in mind all of the general API design ideas already worked through above so that the individual proposals can all have consistent design elements.)

Comment From: aclements

As Russ said, it's time to wind this down.

Iterators are still young in Go and there's a lot of opportunity to get more experience with this. If specific iterator adapters are getting used a lot, please file targeted proposals.

Thanks to everyone for all of the input and discussion on this!

Comment From: aclements

This proposal has been declined as retracted. — aclements for the proposal review group

Yes, fst probably does destructuring under the hood, but that's only because Haskell's pair's fields are unnamed (and I'm not saying they should be), and unlike Python and Go, Haskell is all about destructuring. You get my point. ↩

Golang proposal: x/exp/xiter: new package with iterator adapters

70140 erroneously duplicates Merge but adds Intersect, Union and Subtract, which are only meaningful on sorted sequences. It's unclear whether they should be in a separate package.