What version of Go are you using (go version)?

$ go version
go version go1.20.2 darwin/amd64

Does this issue reproduce with the latest release?

yes and with 1.19 on the Go Playground

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/andrew/Library/Caches/go-build"
GOENV="/Users/andrew/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/andrew/Go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/andrew/Go:/Users/andrew/Dropbox/suneido_tests"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/andrew/Dropbox/gsuneido/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/dc/0y3gfqnn56g6pd8d3_b5wz300000gn/T/go-build1311023722=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.20.2 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.20.2
uname -v: Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
ProductName:        macOS
ProductVersion:     13.2.1
BuildVersion:       22D68
lldb --version: lldb-1400.0.38.17
Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)

What did you do?

pat := regexp.MustCompile(`0A|0[aA]`)
fmt.Println(pat.MatchString("0a"))

(https://go.dev/play/p/jCLPp8AXnJB)

What did you expect to see?

true

What did you see instead?

false

I tested on regex101 to confirm that this should match. (Strangely, it shows golang matching) It appears to compile incorrectly. Simplify doesn't seem to matter.

syntax.Parse("0A|0[aA]", 0)
fmt.Println(syntax.Compile(re))
  0 fail
  1*    rune1 "0" -> 2
  2 rune1 "A" -> 3
  3 nop -> 4
  4 match

syntax.Parse("0a|0[aA]", 0)
fmt.Println(syntax.Compile(re))
  0 fail
  1*    rune1 "0" -> 2
  2 rune "AAaa" -> 3
  3 match

Comment From: cherrymui

cc @rsc

Comment From: ianlancetaylor

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

Comment From: randall77

From https://github.com/google/re2/wiki/Syntax , it says

The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators... Some examples: ab|cd is equivalent to (ab)|(cd)

That makes me think the OP's regexp is (0A)|(0[aA]), not 0(A|0)[aA].

Comment From: crisman

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

No, re2 flavor regex don't bind that way, see https://github.com/google/re2/wiki/Syntax

Some examples: ab|cd is equivalent to (ab)|(cd);

https://go.dev/play/p/EQ6nkEEHJCT

For re `aB|a[cd]` parsed as `a[Bc-d]`
For re `0A|0[aA]` parsed as `0A(?:)`
For re `0a|0[aA]` parsed as `0[Aa]`

Notice the second one parsed as the string 0A followed by an empty non-capturing group, which is just wrong. It should be similar to the third item where the alternation collapsed and the second character is casefolded a (either [aA] or (?i:a)).

An prefix followed by a capital letter with a character class with capital and lower case seems broken, larger prefixes are still broken (see the play link above)

Comment From: crisman

I think that syntax.Regexp Equal() is broken between literals and caseFolded literals.

justA, _ := syntax.Parse(`A`, syntax.Perl)
foldA, _ := syntax.Parse(`(?i:A)`, syntax.Perl)
fmt.Println(justA.Equal(foldA)) // should not be true

Given the odd pattern as above (0A|0[aA]).

In src/regexp/syntax/parse.go once factor() pulls out the 0 prefix it then tries to factor() between A and (?i:A) (what [aA] is parsed as) and is able to find a prefix of A as Equal() says true. Once the code goes wrong it then does some sillyness with alternation between two empty non-capturing groups, which makes no sense as one would expect down a wrong branch.

In src/regexp/syntax/regexp.go Equal() under the case OpLiteral does not have any logic for the FoldCase flag, which I think is the error.

At first glance the logic in re2/regexp.cc TopEqual() for Literal does care about FoldCase which lets me be more sure this is what needs fixing.

Comment From: apmckinlay

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

Even if that was the case, there would still be a problem because 0A|0[aA] is compiling and working very differently from 0a|0[aA] (as I showed)

Comment From: ianlancetaylor

Apologies for being confused.

Comment From: dolmen

$ perl -E 'say ("0a" =~ /0A|0[Aa]/)'
1
$ go version
go version go1.21.5 darwin/arm64
$ go run github.com/dolmen-go/goeval@latest -i=regexp -i=fmt 'fmt.Println(regexp.MustCompile("0A|0[Aa]").MatchString("0a"))' 
false

Comment From: dolmen

@apmckinlay You must use the regexp/syntax.Perl flag with regexp/syntax.Parse to match regexp.Compile behaviour.

Here is your example fixed: https://go.dev/play/p/OrLoZxhMwMN

But the issue is still there.

Comment From: junyer

RE2 has a similar bug, FWIW, but it's caused by this assumption. I haven't looked into what the Go regexp package is doing, but I suspect that @crisman is correct or, at the very least, on the right track.

Comment From: gopherbot

Change https://go.dev/cl/569735 mentions this issue: regexp: fix compiling alternate patterns of different fold case literals