CsvProcessor is a simple and fast library to transform and split CSV files in Go.
The file is streamed and processed so it can handle large files that are multiple Gigabytes in size.
Install:
Use go get to install the latest version of the library.
go get -u github.com/sivaramasubramanian/csvprocessor
Import:
import "github.com/sivaramasubramanian/csvprocessor"
// To split a file into multiple files
inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
// %03d will be replaced by split id - 001, 002, etc.
outputFilenameFormat := "/path/to/output_%03d.csv"
// no-op transformer does not transform the rows
transformer := csvprocessor.NoOpTransformer()
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, transformer)
if err != nil {
log.Printf("error while creating csvprocessor %v ", err)
return
}
// processes and splits the file
err = c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
For example, to convert all the values in the 3rd column to Upper case,
inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv"
upperCaseTransformer := func(ctx context.Context, row []string) []string {
isHeader, _ := (ctx.Value(csvprocessor.CtxIsHeader)).(bool)
if isHeader {
// ignoring header rows
return row
}
if len(row) > 2 {
// convert the 3rd column value to Upper case
row[2] = strings.ToUpper(row[2])
}
// return the modified row.
return row
}
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, upperCaseTransformer)
if err != nil {
log.Printf("error while creating csvprocessor %v ", err)
return
}
err = c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
For some often used cases, there are pre-defined transformer functions that can be used. For example, to add row number column to a CSV.
inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv"
addRowNumberTransformer := csvprocessor.AddRowNoTransformer()
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, addRowNumberTransformer)
if err != nil {
log.Printf("error while creating csvprocessor %v ", err)
return
}
err = c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
See transformers.go for more pre-defined transformer functions.
To do just transformation without splitiing the CSV into multiple parts, Give the rowsPerFile value to be equal to or greater than the total rows in file.
inputFile := "/path/to/input-with-1500-rows.csv"
rowsPerFile := 1500 // entire input file has only 1500 so only one output file will be generated
outputFilenameFormat := "/path/to/output.csv" // we can omit the %d format as there will be only one output file
addRowNumberTransformer := csvprocessor.AddRowNoTransformer("S.no") // pass the column name for the row number colum
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, addRowNumberTransformer)
if err != nil {
log.Printf("error while creating csvprocessor %v ", err)
return
}
err = c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
To do a series of transformations for each row, we can chain the transformations,
inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv"
// add row number within current chunk
addChunkRowNumber := csvprocessor.AddRowNoTransformer("Chunk Row no.")
// add overall row number
addRowNumber := csvprocessor.AddRowNoTransformer("Row no.")
// add column 'User' with value 'siva' at index 4; 0-based indexing.
addAConstantColumn := csvprocessor.AddConstantColumnTransformer("User", "siva", 4)
// Replace all values 'Madras' with 'Chennai'
replacements := make(map[string]string)
replacements["Madras"]="Chennai"
replaceValues := csvprocessor.ReplaceValuesTransformer(replacements)
// chain all these transformations
combinedTransformer := csvprocessor.ChainTransformers(
addChunkRowNumber,
addRowNumber,
addAConstantColumn,
replaceValues
)
// performs all the transformations
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, combinedTransformer)
if err != nil {
log.Printf("error while creating csvprocessor %v ", err)
return
}
err = c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
To provide your own logger,
// any custom logger that implements `func(format string, args ...any)`
logger = logrus.New().Debugf
c, err := csvprocessor.New(
csvprocessor.WithFileReader("input.csv"),
csvprocessor.WithOutputFileFormat("output.csv"),
csvprocessor.WithTransformer(csvprocessor.NoOpTransformer()),
csvprocessor.WithChunkSize(100),
csvprocessor.WithLogger(logger),
)
if err != nil {
log.Printf("error while creating csvprocessor %v ",err)
return
}
err := c.Process()
if err != nil {
log.Printf("error while splitting csv %v ",err)
}
For example, To wrap the transformer with debug statements
logFunc := logrus.WithField("user", "1234").Infof
inputFile := "/path/to/input.csv"
rowsPerFile := 5
outputFilenameFormat := "/path/to/input_%03d.csv"
// add row number within current chunk
addChunkRowNumber := csvprocessor.AddRowNoTransformer("Chunk Row no.")
// print debug statements for each row
debug := csvprocessor.DebugWrapper
// recover from panic in the wrapped transformer function
panicSafe := csvprocessor.PanicSafe
// chain all these transformations
wrappedTransformer := panicSafe(debug(addChunkRowNumber, logFunc), logFunc)
// performs all the transformations
c := csvprocessor.New(inputFile, rowsPerFile, outputFilenameFormat, wrappedTransformer)
err := c.Process()
if err != nil {
log.Printf("error while splitting csv %v ", err)
}
- csvprocessor
- Transformer
- Wrapper
- Helper Functions including merger
- [-] Unit tests
- [-] Benchmarking
Contributions are always welcome!
See contributing.md for ways to get started.
Please adhere to this project's code of conduct.
See LICENSE