Proposal Details
Currently there is no easy and reliable way to access the End
position of a token through go/scanner
.
It can be done to some extend with:
pos, tok, lit := s.Scan()
tokLength := len(lit)
if !tok.IsLiteral() && tok != token.COMMENT {
tokLength = len(tok.String())
}
tokEnd := pos + token.Pos(tokLength)
It looks correct, but actually it is not. There are few issues:
len(lit)
(line 2) is wrong for comments and raw string literals, since carriage returns ('\r'
) are not included in the literal, thus for such cases thetokEnd
is already wrong.- When a file ends (just before an EOF token) and
impliedSemi==true
, then an artificialSEMICOLON
token is emitted. Say you want to inspect the whitespace between tokens: ```go func TestScanner(t *testing.T) { const src = "package a; var a int"file := token.NewFileSet().AddFile("", -1, len(src)) var s scanner.Scanner s.Init(file, []byte(src), func(pos token.Position, msg string) { panic("unreachable: " + msg) }, scanner.ScanComments) prevEndOff := 0 for { pos, tok, lit := s.Scan() t.Logf("%v %v %v", pos, tok, lit) off := file.Offset(pos) white := src[prevEndOff:off] // panics when tok == EOF for _, c := range white { switch c { case ' ', '\t', '\n', '\r', '\ufeff': default: panic("unreachable: " + strconv.QuoteRune(c)) } } t.Logf("%q", white) tokLength := len(lit) if !tok.IsLiteral() && tok != token.COMMENT { tokLength = len(tok.String()) } prevEndOff = off + tokLength if tok == token.EOF { break } }
} ``` This code panics, because of the artificial SEMICOLON token.
To solve such problems, and to simplify the logic i propose to add to the go/scanner
following new API:
package scanner // go/scanner
// Pos returns the current position in the source where the next Scan call
// will begin tokenizing.
// It also represents the end position of the previous token.
func (s *Scanner) Pos() token.Pos {
return s.file.Pos(s.offset)
}
CC @adonovan @findleyr
Comment From: gopherbot
Change https://go.dev/cl/694615 mentions this issue: go/parser: properly calculate the end position of comments
Comment From: mateusz834
And just to note that the same issue about \r
exists in the go/parser
positions returned by End()
https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/ast/ast.go#L64-L67
https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/ast/ast.go#L313-L317
This does not try to solve that, but it makes the situation better, since with such API you could pipe the source starting at node.Pos()
to go/scanner
, then scan a single token and look at the Pos()
. This is still a workaround, but now at least it would be possible to get the correct end pos (if needed).
Comment From: mateusz834
Uhhh, actually it would not solve in 100% the second issue/point, it would still require some workarounds.
https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/scanner/scanner.go#L803-L804
I am curions why the scanner does such thing. EDIT: #54941
EDIT2:
Actually this behaviour is kind-of conflicting with the proposed API, since:
// Pos returns the current position in the source where the next Scan call
// will begin tokenizing.
is not 100%
true then.
Comment From: gabyhelp
Related Code Changes
Related Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)