Writing a MultiFileReader in Go
Today I learned how to implement a #golang Reader which automatically reads multiple files for further processing of the resulting bytes.
Motivation
I recently had to combine multiple yara rule files from different sources. The nasty thing with yara is, it doesn’t like duplicated rules which lead to errors when compiling the rules files. As a solution for this I decided to parse the different rules files, remove the duplicates and export one large single yar file as a result. To open and parse multiple input files I implemented the following MultiFileReader.
Note: This is a quiet special case and only works because the yara compiler works with multiple rules in one file. In this case the files contain complete yara objects which can be concatinated and the order doesn’t matter. For example this will not work with json files, since they cannot be concatinated without nesting them e.g. into a json array. Also it should be noted that this is likely not the most performant solution to such a problem, but I found the implementation a good learning experience.
Workflow
- Find a list of files of a certain type
- Open each file and read the content until EOF (end of file)
- When an unprocessed file is left, close the last and jump back to 2.
- If no unprocessed file is left, return
Implementation
To implement the outlined workflow, I started creating a new type containing a slice of filenames filenames
, a pointer to the currently opened file curFile
and the index of the current filename curIdx
.
type MultiFileReader struct {
filenames []string
curFile *os.File
curIdx int
}
Now I implemented the Reader
interface by implementing the Read
function as a receiver function on the newly created type.
func (m *MultiFileReader) Read(p []byte) (int, error) {
readBytes := 0
var err error
// if one of the files is already opened
if m.curFile != nil {
// call the Read function of that file
readBytes, err = m.curFile.Read(p)
// If end of file reached
if readBytes == 0 && err == io.EOF {
// close current file and iterate to next file in slice
m.curFile.Close()
m.curFile = nil
m.curIdx++
// if end of slice reached, return EOF indicator
if m.curIdx == len(m.files) {
return 0, io.EOF
}
} else {
// return read content
return readBytes, nil
}
}
// if no file is open
if m.curFile == nil {
// open next file
m.curFile, err = os.Open(m.files[m.curIdx])
if err != nil {
log.Fatalf("Could not open file %s", m.files[m.curIdx])
}
// call the Read function of that file
return m.curFile.Read(p)
}
return 0, io.EOF
}
Next I implemented a factory function to make the initialization easier
func NewMultiFileReader(root string, pattern string) *MultiFileReader {
m := MultiFileReader{}
var err error
// find all files in `root` directory with the given file `pattern`
m.filenames, err = WalkMatch(root, pattern)
if err != nil {
log.Fatalf("Error looking for files in directory `%s` using pattern `%s`\n",
root, pattern)
}
// Open first file
m.curFile, err = os.Open(m.files[m.curIdx])
if err != nil {
log.Fatalf("Could not open file %s", m.files[m.curIdx])
}
return &m
}
Usage example
In this example I’m reading multiple txt files using ioutils and the new MultiFileReader. The resulting bytes are just printed to the console.
package main
import (
"fmt"
"ioutil"
)
func main() {
multiFileReader := NewMultiFileReader("/tmp/txtFiles/", "*.txt")
b, err := ioutil.ReadAll(multiFileReader)
fmt.Println(string(b))
}
While this example is maybe not the most useful for the regular usecases, it was a good excercise to explain how the Reader interface works and how to use it.