Currently, import paths have the following lexical restrictions (see module.CheckImportPath
):
- Must consist of valid path elements, separated by slashes. Must not begin or end with a slash.
- A valid path element is a non-empty string that consists of ASCII letters, ASCII digits, and the punctuation characters
- . _ ~
. Must not end with a dot or contain two dots in a row. - A path element prefix up to the first dot must not be a reserved name on Windows, regardless of case (CON, com1, ...). An element must not have a suffix of a tilde followed by ASCII digits (like a Windows short name).
Module paths have the same restrictions as import paths, with additional constraints (see module.CheckPath
:
- The first path element (by convention, a domain name) must const only lower-case ASCII letters, ASCII digits, dots, and dashes. It must contain at least one dot and must not start with a dash.
- If the path ends with
/vN
where N consists of ASCII digits and dots, N must not begin with 0, must not be 1, and must not contain any dots (there's a separate special case forgopkg.in/...
module paths). - No path element may begin with a dot.
File paths have the same restrictions as import paths, but the set of allowed characters is larger (see module.CheckFilePath
):
- Path elements may consist of Unicode letters, ASCII digits, ASCII spaces, and ASCII punctuation characters
! # $ % & ( ) + , - . = @ [ ] ^ _ { } ~
. The remaining ASCII punctuation characters" * < > ? ` ' | / \ :
are excluded.
These restrictions are generally in place for good reasons (see Unicode restrictions):
- Module paths are frequently written and encoded into URLs, and we don't want to allow strings that interfere with that (for example, non-ASCII domain names).
- Module contents are extracted into directories on a variety of systems. We don't want to allow strings that aren't valid file names or might collide with a different string (on case-insensitive or Unicode normalizing systems). We don't want to allow strings that are reserved, might be interpreted by the shell, might be interpreted as a flag (starting with
-
), or might be interpreted as a repository (.git
).
That being said, these restrictions more English-centric than necessary (#45507). They're also more restrictive than GOPATH (#29101).
We should come up with a wider set of characters that may be allowed without causing compatibility problems, particularly for import and file paths.
cc @bcmills @matloob
Comment From: duolabmeng6
Please support Chinese characters
Comment From: ddbxyrj
For culture diversity, maybe we should take more uncode tyep into consideration.
Comment From: FiloSottile
Related: the handling of punycode domains. #20210
Comment From: FiloSottile
Also related, the conclusion that it's up to review tooling to keep homoglyph or LTR/RTL attacks at bay. https://research.swtch.com/trojan
Comment From: sxin0
Please support Chinese characters
Comment From: FiloSottile
Also related, #44970 discusses spec interactions.
Comment From: CodeNightOwl
Please support Chinese characters
go1.15.15 (This version is normal, and errors are reported in subsequent versions)
Comment From: cx-shahar-septon
Proposal: skip checking resource file names For example. the package of "github.com/google/wuffs" contains a filename named 😻.txt . The file is not part of the module, but a resource used for tests. It's path is within Unicode standards. I would like to think the rules can be more flexible here ;)
Comment From: yangyile1990
when I use go 1.15 without go.mod, my go package can name as "ACM题目小马过河"。
while after I use go.mod in go1.20 or go1.21,it says. not support.
I think the "ACM题目小马过河" is easy to be understood for me. easy more than "ACM topic Pony Crossing the River".
So I think it's important to support native languages。
If you think it can make some mistakes. you can use a flag such as "support_native_language", when I open it, my package can not be popular but only for fun.
Comment From: CodeNightOwl
我目前直接用1.15版本,有解决办法再交流。
一直永远 @.***
------------------ 原始邮件 ------------------ 发件人: "golang/go" @.>; 发送时间: 2023年9月8日(星期五) 晚上9:05 @.>; @.**@.**>; 主题: Re: [golang/go] cmd/go: revisit allowed set of characters in module, import, and file paths (#45549)
when I use go 1.15 without go.mod, my go package can name as "ACM题目小马过河"。
while after I use go.mod in go1.20 or go1.21,it says. not support.
I think the "ACM题目小马过河" is easy to be understood for me. easy more than "ACM topic Pony Crossing the River".
So I think it's important to support native languages。
If you think it can make some mistakes. you can use a flag such as "support_native_language", when I open it, my package can not be popular but only for fun.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Comment From: SgtCoDFish
Since #66243 was closed as a dupe of this issue, it's worth pointing out here that this issue seems to break the Go Sum DB. As an example, https://sum.golang.org/lookup/github.com/!doppler!h!q/cli@v0.5.9 currently has the following output:
not found: create zip: docker/node:alpine: malformed file path "docker/node:alpine": invalid char ':' docker/python:alpine: malformed file path "docker/python:alpine": invalid char ':' docker/ruby:alpine: malformed file path "docker/ruby:alpine": invalid char ':'
This seems to be because there are files in the repo which have colons in.
(It seems like maybe a separate bug that the Go sum DB prints errors like that as output)
Comment From: matloob
Closing this issue since #67562 has been opened as a proposal to do something similar.