Proposal Details

Currently there is no easy and reliable way to access the End position of a token through go/scanner.

It can be done to some extend with:

pos, tok, lit := s.Scan()
tokLength := len(lit)
if !tok.IsLiteral() && tok != token.COMMENT {
    tokLength = len(tok.String())
}
tokEnd := pos + token.Pos(tokLength)

It looks correct, but actually it is not. There are few issues:

  • len(lit) (line 2) is wrong for comments and raw string literals, since carriage returns ('\r') are not included in the literal, thus for such cases the tokEnd is already wrong.
  • When a file ends (just before an EOF token) and impliedSemi==true, then an artificial SEMICOLON token is emitted. Say you want to inspect the whitespace between tokens: ```go func TestScanner(t *testing.T) { const src = "package a; var a int"
    file := token.NewFileSet().AddFile("", -1, len(src))
    var s scanner.Scanner
    s.Init(file, []byte(src), func(pos token.Position, msg string) {
        panic("unreachable: " + msg)
    }, scanner.ScanComments)
    
    prevEndOff := 0
    for {
        pos, tok, lit := s.Scan()
        t.Logf("%v %v %v", pos, tok, lit)
        off := file.Offset(pos)
    
        white := src[prevEndOff:off] // panics when tok == EOF
        for _, c := range white {
            switch c {
            case ' ', '\t', '\n', '\r', '\ufeff':
            default:
                panic("unreachable: " + strconv.QuoteRune(c))
            }
        }
        t.Logf("%q", white)
    
        tokLength := len(lit)
        if !tok.IsLiteral() && tok != token.COMMENT {
            tokLength = len(tok.String())
        }
        prevEndOff = off + tokLength
    
        if tok == token.EOF {
            break
        }
    }
    

    } ``` This code panics, because of the artificial SEMICOLON token.

To solve such problems, and to simplify the logic i propose to add to the go/scanner following new API:

package scanner // go/scanner

// Pos returns the current position in the source where the next Scan call
// will begin tokenizing.
// It also represents the end position of the previous token.
func (s *Scanner) Pos() token.Pos {
    return s.file.Pos(s.offset)
}

CC @adonovan @findleyr

Comment From: gopherbot

Change https://go.dev/cl/694615 mentions this issue: go/parser: properly calculate the end position of comments

Comment From: mateusz834

And just to note that the same issue about \r exists in the go/parser positions returned by End()

https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/ast/ast.go#L64-L67

https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/ast/ast.go#L313-L317

This does not try to solve that, but it makes the situation better, since with such API you could pipe the source starting at node.Pos() to go/scanner, then scan a single token and look at the Pos(). This is still a workaround, but now at least it would be possible to get the correct end pos (if needed).

Comment From: mateusz834

Uhhh, actually it would not solve in 100% the second issue/point, it would still require some workarounds.

https://github.com/golang/go/blob/fbac94a79998d4730a58592f0634fa8a39d8b9fb/src/go/scanner/scanner.go#L803-L804

I am curions why the scanner does such thing. EDIT: #54941

EDIT2:

Actually this behaviour is kind-of conflicting with the proposed API, since:

// Pos returns the current position in the source where the next Scan call
// will begin tokenizing.

is not 100% true then.

Comment From: gabyhelp

Related Code Changes

Related Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)