What version of Go are you using (go version
)?
$ go version go version go1.12.6 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GOARCH="amd64" GOBIN="" GOCACHE="/root/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/root/go" GOPROXY="" GORACE="" GOROOT="/usr/local/go" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GCCGO="gccgo" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build056785781=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Consider this situation, in my program, down to the lowest call, i need to convert my Record
to csv string, so I use csv.NewWriter
and pass it with bytes.Buffer, and i Write my record, and Flush, at last, i got the csv string from bytes.Buffer. Everything is fine and great for now. But, after benching, I found my program running very slowly, and the bufio used in csv.Writer is the performance killer. Because for each record, bufio.defaultBufSize length of slice is maked, and it is not easy to prevent from this slice allocing.
What did you expect to see?
Leave the choice of using bufio or not to the user.
in stdlib, use raw io.Writer, and if user want to use bufio. just wrapp it like this:
w := csv.NewWriter(bufio.NewWriter(os.Stdout))
Comment From: dsnet
Can you post a small reproduction? The allocation of the buffer in bufio.Writer
should only occur once per csv.NewWriter
. Are you creating lots of small CSV files?
Comment From: yaozongyou
No, not creating lots of csv files. The situation is about providing a restful service for running sql on csv files, so that user can query only the selected records/columns. For example, if we have the following csv:
Year,Make,Model
1997,Ford,E350
2000,Mercury,Cougar
running select Make from s3object
will return
Ford
Mercury
running select * from s3object where Year = '2000'
will return
2000,Mercury,Cougar
The steps are roughly as below: 1. Iterate each record over the csv file using csv.Reader, 2. test the record if meeting the sql, 3. if meeting the sql, encoding the record to csv using csv.Writer, 4. and wrap the csv using customed message format and send the message to the user.
The csv.Writer is used at the third step, for each record, we use a new csv.Writer to encode the record to csv string, and encode the string to customed message, and write the message to the user.
Why not use csv.Writer for each csv file instead of for each record? 1. the proto this service providing is customed binary message based on http chunked encoding, we need to send the record meeting the sql as soon as possible, and need to tell the progress periodly and sometimes sending keepalive messages. 2. csv.Writer is used in the deepest call stack, it is not easy to refactor.
And currently, we use a global sync.Pool of bufio.Writer to prevent csv.Writer from allocating a new buffer. something like this:
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
var bufioWriterPool = sync.Pool{
New: func() interface{} {
// ioutil.Discard is just used to create the writer. Actual destination
// writer is set later by Reset() before using it.
return bufio.NewWriter(ioutil.Discard)
},
}
bufioWriter := bufioWriterPool.Get().(*bufio.Writer)
buf := bufPool.Get().(*bytes.Buffer)
bufioWriter.Reset(buf)
w := csv.NewWriter(bufioWriter)
w.Write(record)
w.Flush()
str := buf.String()
// encode this str to customed message, and send it to the user
bufioWriterPool.Put(bufioWriter)
bufPool.Put(buf)
Using this way, we avoid the performance problem, but we still think, the go stdlib should use raw io.Writer, leave the choice of bufio or not to the user.
Comment From: andybons
@dsnet if this is a change that needs fixing, can you add the NeedsFix label? Thanks.
Comment From: dsnet
It's a usage pattern that is certainly not what the encoding/csv
package had in mind when designed and thus this pattern is hitting some pathological behavior. It's worth improving the performance of this use case, but it's not clear to me that the solution is simply to "not use bufio.Writer in csv.Writer".
Comment From: eliben
@yaozongyou could you please provide a minimal benchmark that demonstrates the slowness here?
Comment From: yaozongyou
here is a minimal benchmark to demonstrate the slowness:
package main
import (
"bufio"
"bytes"
"encoding/csv"
"io/ioutil"
"sync"
"testing"
)
func BenchmarkCSV_1(b *testing.B) {
b.ResetTimer()
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var buf bytes.Buffer
w := csv.NewWriter(&buf)
w.Write([]string{"Hello", "World"})
w.Flush()
// buf.String() will be "Hello,World\n"
}
})
}
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
var bufioWriterPool = sync.Pool{
New: func() interface{} {
// ioutil.Discard is just used to create the writer. Actual destination
// writer is set later by Reset() before using it.
return bufio.NewWriter(ioutil.Discard)
},
}
func BenchmarkCSV_2(b *testing.B) {
b.ResetTimer()
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
buf := bufPool.Get().(*bytes.Buffer)
buf.Reset()
// Use bufio Writer to prevent csv.Writer from allocating a new buffer.
bufioWriter := bufioWriterPool.Get().(*bufio.Writer)
bufioWriter.Reset(buf)
w := csv.NewWriter(bufioWriter)
w.Write([]string{"Hello", "World"})
w.Flush()
// buf.String() will be "Hello,World\n"
bufPool.Put(buf)
bufioWriterPool.Put(bufioWriter)
}
})
}
the output result in my test environment:
$ go test -cpu 10 -benchtime 100000x -bench .
goos: linux
goarch: amd64
BenchmarkCSV_1-10 100000 1311 ns/op 4272 B/op 4 allocs/op
BenchmarkCSV_2-10 100000 90.9 ns/op 0 B/op 0 allocs/op
PASS
ok _/home/richardyao/xxx 0.146s
From the result, BenchmarkCSV_1 has 4 allocs for each op, and BenchmarkCSV_2 has no allocs for each op.
Comment From: gopherbot
Change https://go.dev/cl/671596 mentions this issue: encoding/csv: optionally skip creating bufio.Writer