Skip to content

sivaramasubramanian/csvprocessor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CsvProcessor

CsvProcessor is a simple and fast library to transform and split CSV files in Go.
The file is streamed and processed so it can handle large files that are multiple Gigabytes in size.

GoDoc Go Go Report Card MIT License codecov

Installation

Install:

Use go get to install the latest version of the library.

go get -u github.com/sivaramasubramanian/csvprocessor

Import:

import "github.com/sivaramasubramanian/csvprocessor"

Usage

Simple Usage

Splitting a single CSV file into multiple files

// To split a file into multiple files

inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
// %03d will be replaced by split id - 001, 002, etc.
outputFilenameFormat := "/path/to/output_%03d.csv"
// no-op transformer does not transform the rows
transformer := csvprocessor.NoOpTransformer()

c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, transformer)
if err != nil {
	log.Printf("error while creating csvprocessor %v ", err)
    return
}

// processes and splits the file
err = c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

Transforming the contents of the CSV

For example, to convert all the values in the 3rd column to Upper case,

inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv" 
upperCaseTransformer := func(ctx context.Context, row []string) []string {
    isHeader, _ := (ctx.Value(csvprocessor.CtxIsHeader)).(bool)
    if isHeader {
        // ignoring header rows
        return row
    }

    if len(row) > 2 {
        // convert the 3rd column value to Upper case
        row[2] = strings.ToUpper(row[2])
    }

    // return the modified row.
    return row
}

c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, upperCaseTransformer)
if err != nil {
	log.Printf("error while creating csvprocessor %v ", err)
    return
}

err = c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

Predefined Transformer Functions

For some often used cases, there are pre-defined transformer functions that can be used. For example, to add row number column to a CSV.

inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv" 
addRowNumberTransformer := csvprocessor.AddRowNoTransformer()

c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, addRowNumberTransformer)
if err != nil {
	log.Printf("error while creating csvprocessor %v ", err)
    return
}

err = c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

See transformers.go for more pre-defined transformer functions.

Transform without splitting

To do just transformation without splitiing the CSV into multiple parts, Give the rowsPerFile value to be equal to or greater than the total rows in file.

inputFile := "/path/to/input-with-1500-rows.csv"
rowsPerFile := 1500 // entire input file has only 1500 so only one output file will be generated
outputFilenameFormat := "/path/to/output.csv" // we can omit the %d format as there will be only one output file
addRowNumberTransformer := csvprocessor.AddRowNoTransformer("S.no") // pass the column name for the row number colum

c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, addRowNumberTransformer)
if err != nil {
	log.Printf("error while creating csvprocessor %v ", err)
    return
}

err = c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

Advanced Usage

Combining multiple Transformers

To do a series of transformations for each row, we can chain the transformations,

inputFile := "/path/to/input.csv"
rowsPerFile := 100_000
outputFilenameFormat := "/path/to/output_%03d.csv"
// add row number within current chunk
addChunkRowNumber := csvprocessor.AddRowNoTransformer("Chunk Row no.")
// add overall row number
addRowNumber := csvprocessor.AddRowNoTransformer("Row no.")
// add column 'User' with value 'siva' at index 4; 0-based indexing.
addAConstantColumn := csvprocessor.AddConstantColumnTransformer("User", "siva", 4) 
// Replace all values 'Madras' with 'Chennai'
replacements := make(map[string]string)
replacements["Madras"]="Chennai"
replaceValues := csvprocessor.ReplaceValuesTransformer(replacements)

// chain all these transformations
combinedTransformer := csvprocessor.ChainTransformers(
    addChunkRowNumber,
    addRowNumber,
    addAConstantColumn,
    replaceValues
)

// performs all the transformations
c, err := csvprocessor.NewFileReader(inputFile, rowsPerFile, outputFilenameFormat, combinedTransformer)
if err != nil {
	log.Printf("error while creating csvprocessor %v ", err)
    return
}

err = c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

Using custom logger

To provide your own logger,

// any custom logger that implements `func(format string, args ...any)`
logger = logrus.New().Debugf

c, err := csvprocessor.New(
		csvprocessor.WithFileReader("input.csv"),
		csvprocessor.WithOutputFileFormat("output.csv"),
		csvprocessor.WithTransformer(csvprocessor.NoOpTransformer()),
		csvprocessor.WithChunkSize(100),
		csvprocessor.WithLogger(logger),
	)
if err != nil {
    log.Printf("error while creating csvprocessor %v ",err)
    return
}

err := c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ",err)
}

Wrapping transformer executions

For example, To wrap the transformer with debug statements

logFunc := logrus.WithField("user", "1234").Infof

inputFile := "/path/to/input.csv"
rowsPerFile := 5
outputFilenameFormat := "/path/to/input_%03d.csv"
// add row number within current chunk
addChunkRowNumber := csvprocessor.AddRowNoTransformer("Chunk Row no.")
// print debug statements for each row
debug := csvprocessor.DebugWrapper
// recover from panic in the wrapped transformer function
panicSafe := csvprocessor.PanicSafe

// chain all these transformations
wrappedTransformer := panicSafe(debug(addChunkRowNumber, logFunc), logFunc)

// performs all the transformations
c := csvprocessor.New(inputFile, rowsPerFile, outputFilenameFormat, wrappedTransformer)
err := c.Process()
if err != nil {
    log.Printf("error while splitting csv %v ", err)
}

Roadmap

  • csvprocessor
  • Transformer
  • Wrapper
  • Helper Functions including merger
  • [-] Unit tests
  • [-] Benchmarking

Contributing

Contributions are always welcome!

See contributing.md for ways to get started.

Please adhere to this project's code of conduct.

License

MIT License See LICENSE

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published