What version of Go are you using (go version
)?
$ go version go version go1.20.2 darwin/amd64
Does this issue reproduce with the latest release?
yes and with 1.19 on the Go Playground
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/andrew/Library/Caches/go-build" GOENV="/Users/andrew/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/andrew/Go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/andrew/Go:/Users/andrew/Dropbox/suneido_tests" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.20.2" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/andrew/Dropbox/gsuneido/go.mod" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/dc/0y3gfqnn56g6pd8d3_b5wz300000gn/T/go-build1311023722=/tmp/go-build -gno-record-gcc-switches -fno-common" GOROOT/bin/go version: go version go1.20.2 darwin/amd64 GOROOT/bin/go tool compile -V: compile version go1.20.2 uname -v: Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64 ProductName: macOS ProductVersion: 13.2.1 BuildVersion: 22D68 lldb --version: lldb-1400.0.38.17 Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
What did you do?
pat := regexp.MustCompile(`0A|0[aA]`)
fmt.Println(pat.MatchString("0a"))
(https://go.dev/play/p/jCLPp8AXnJB)
What did you expect to see?
true
What did you see instead?
false
I tested on regex101 to confirm that this should match. (Strangely, it shows golang matching) It appears to compile incorrectly. Simplify doesn't seem to matter.
syntax.Parse("0A|0[aA]", 0)
fmt.Println(syntax.Compile(re))
0 fail
1* rune1 "0" -> 2
2 rune1 "A" -> 3
3 nop -> 4
4 match
syntax.Parse("0a|0[aA]", 0)
fmt.Println(syntax.Compile(re))
0 fail
1* rune1 "0" -> 2
2 rune "AAaa" -> 3
3 match
Comment From: cherrymui
cc @rsc
Comment From: ianlancetaylor
The regexp 0A|0[aA]
means 0(A|0)[aA]
. You seem to be looking for (0A)|(0[aA])
.
Comment From: randall77
From https://github.com/google/re2/wiki/Syntax , it says
The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators... Some examples: ab|cd is equivalent to (ab)|(cd)
That makes me think the OP's regexp is (0A)|(0[aA])
, not 0(A|0)[aA]
.
Comment From: crisman
The regexp
0A|0[aA]
means0(A|0)[aA]
. You seem to be looking for(0A)|(0[aA])
.
No, re2 flavor regex don't bind that way, see https://github.com/google/re2/wiki/Syntax
Some examples: ab|cd is equivalent to (ab)|(cd);
https://go.dev/play/p/EQ6nkEEHJCT
For re `aB|a[cd]` parsed as `a[Bc-d]` For re `0A|0[aA]` parsed as `0A(?:)` For re `0a|0[aA]` parsed as `0[Aa]`
Notice the second one parsed as the string 0A
followed by an empty non-capturing group, which is just wrong. It should be similar to the third item where the alternation collapsed and the second character is casefolded a
(either [aA]
or (?i:a)
).
An prefix followed by a capital letter with a character class with capital and lower case seems broken, larger prefixes are still broken (see the play link above)
Comment From: crisman
I think that syntax.Regexp
Equal()
is broken between literals and caseFolded literals.
justA, _ := syntax.Parse(`A`, syntax.Perl)
foldA, _ := syntax.Parse(`(?i:A)`, syntax.Perl)
fmt.Println(justA.Equal(foldA)) // should not be true
Given the odd pattern as above (0A|0[aA]
).
In src/regexp/syntax/parse.go
once factor()
pulls out the 0
prefix it then tries to factor()
between A
and (?i:A)
(what [aA]
is parsed as) and is able to find a prefix of A
as Equal()
says true. Once the code goes wrong it then does some sillyness with alternation between two empty non-capturing groups, which makes no sense as one would expect down a wrong branch.
In src/regexp/syntax/regexp.go
Equal()
under the case OpLiteral
does not have any logic for the FoldCase flag, which I think is the error.
At first glance the logic in re2/regexp.cc
TopEqual()
for Literal does care about FoldCase which lets me be more sure this is what needs fixing.
Comment From: apmckinlay
The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).
Even if that was the case, there would still be a problem because 0A|0[aA]
is compiling and working very differently from 0a|0[aA]
(as I showed)
Comment From: ianlancetaylor
Apologies for being confused.
Comment From: dolmen
$ perl -E 'say ("0a" =~ /0A|0[Aa]/)'
1
$ go version
go version go1.21.5 darwin/arm64
$ go run github.com/dolmen-go/goeval@latest -i=regexp -i=fmt 'fmt.Println(regexp.MustCompile("0A|0[Aa]").MatchString("0a"))'
false
Comment From: dolmen
@apmckinlay You must use the regexp/syntax.Perl
flag with regexp/syntax.Parse
to match regexp.Compile
behaviour.
Here is your example fixed: https://go.dev/play/p/OrLoZxhMwMN
But the issue is still there.
Comment From: junyer
RE2 has a similar bug, FWIW, but it's caused by this assumption. I haven't looked into what the Go regexp
package is doing, but I suspect that @crisman is correct or, at the very least, on the right track.
Comment From: gopherbot
Change https://go.dev/cl/569735 mentions this issue: regexp: fix compiling alternate patterns of different fold case literals