Golang proposal: strings: SplitAny and CountAny

Proposal Details

The strings package contains the function Split that splits a string whereever the separator string occurs. Only one string can be specified.

There are use cases where one wants to split on any of a collection of characters. Often FieldsFunc is recommended for this. However, FieldsFunc has a bug in that it skips leading and trailing separators. This behaviour can not be fixed, just documented.

In order to make it possible to split a string on any of several characters there should be functions analogous to IndexAny, namely SplitAny and CountAny.

SplitAny would have the signature func SplitAny(s, chars string) []string, while CountAny would be func CountAny(s, chars string) int.

SplitAny splits the string on any character in chars, while CountAny returns how many times any of the characters in chars appears in the supplied string.

There could be another function SplitAnyN with the signature func SplitAnyN(s, chars string, n int []string that limits the splitting to a maximum of n strings.

I attach file split_any.zip where SplitAny and CountAny have been implemented as an example.

Comment From: ianlancetaylor

You mention FieldFunc in the description but I assume you mean FieldsFunc (with an "s").

Comment From: ianlancetaylor

Can you add a comment with an example or two showing the exact behavior difference? Thanks.

Comment From: xformerfhs

Yes, you are right. I meant FieldsFunc. Sorry for the typo.

Here are some examples:

   source := ":something,to:split-"

   parts := strings.Split(source, ":")
   // part is [ "" "something,to" "split-"]

   separators := ":,-.;"
   parts = strings.SplitAny(source, separators)
   // parts is [ "" "something" "to" "split" "" ]

   count := strings.CountAny(source, separators)
   // count is 4 (for the 4 found characters ':', ',', ':' and '-'

   separators = "o,t.;"
   parts = strings.SplitAny(source, separators)
   // parts is [ ":s" "me" "hing" "" "" ":split-" ]

   parts = strings.SplitAnyN(source, separators, 2)
   // parts is [ ":s" "mething,to:split-" ]

I hope this helps to clarify the proposal. I will be glad to provide any more information that is deemed necessary.

Comment From: jub0bs

@xformerfhs I'm wary of adding more Split* functions that return a slice (as opposed to an iterator) in the standard library. In my experience, such functions tend to be misused (e.g. for splitting untrusted data); see https://nvd.nist.gov/vuln/detail/CVE-2025-22868, for instance.

Comment From: xformerfhs

Hi, @jub0bs, thanks four your comment.

I see that you have a reported a security vulnerability that was caused by using strings.Split without checking, limiting or cleaning what is going to be splitted. It was fixed by using strings.Count and only splitting after that returns the correct number of fields.

I agree that using Split and the likes is dangerous when the programmer does not check the string to split. Splitting definitely has security implications.

However, a strings.SplitAny function is missing. There ought to be a way of splitting a string on multiple different characters, not only on one separator.

What are the possible alternatives?

Function	Impact
`SplitAny`	Programmers have to be warned they they ought to check the string to split if it has the correct format, count the fields with `CountAny` or remove unwanted characters. This should be documented.
`SplitAnyN`	This is much safer. If handled correctly, there is no vulnerability. However, setting `n` to a negative number will effectively turn `SplitN` into `Split`.
`SplitAnySeq`	This is the safest form, but can make the program more cumbersome and less readable.

I think of my use case: The user specifies two encodings. One for the input file and one for the output file as a flag like e.g. -encodings win1252:utf8. The separator may be : or ,. When I use SplitAnyN this would look like this:

   ...
   if len(encodingsFlagValue) < minEncodingLen || len(encodingsFlagValue) > maxEncodingLen {
      return errors.New("invalid length of encodings")
   }

   encodings := strings.SplitAnyN(encodingsFlagValue, ":,", 3)
   if len(encodings) > 2 {
      return errors.New("invalid number of encodings")
   }

   var inputEncoding string
   var outputEncoding string

   inputEncoding = encodings[0]

   if len(encodings) == 1 {
      outputEncoding = inputEncoding
   } else {
      outputEncoding = encodings[1]
   }
   ...

This is simple and straight-forward.

Now the same with an iterator:

   ...
   var inputEncoding string
   var outputEncoding string

   var haveInputEncoding bool
   for encoding := strings.SplitAnySeq(encodingsFlagValue, ":,") {
      if !haveInputEncoding {
         inputEncoding = encoding
         haveInputEncoding = true
      } else {
        outputEncoding = encoding
        break
      } 
   }
   if len(outputEncoding) == 0 {
      outputEncoding = inputEncoding
   }
  ...

This is much less readable and understandable.

So, I think SplitAnyN is a sensible way to go. With the warning that one must not use an n that is less than 1 and to check for an appropriate length.

Even SplitAny would be a way to go with the clear warning that this may cause a security vulnerability if the source is not checked and that SplitAnyN, and SplitAnySeq are better alternatives.

Comment From: as

The alternative is to normalize the seperators into one seperator and then call the split function.

source := ":something,to:split-"
source = strings.ReplaceAll(source, ":", ",")
source = strings.ReplaceAll(source, "-", ",")
fmt.Printf("%q\n", strings.Split(source, ","))

Comment From: xformerfhs

The alternative is to normalize the seperators into one seperator and then call the split function.

While this yields the correct result, it has three disadvantages:

It allocates memory for two additional strings. Allocations are slow.
It copies the string two times, resulting in additional CPU overhead.
It is cumbersome, hard to read and does not convey what is meant. Someone reading this would have to figure out why all this replacing takes place. This makes it harder to understand the meaning of the code.

Using strings.SplitAny(source, ":,-") is short, simple and understandable at first glance. No unnecessary memory allocations, no unnecessary copying.