Writing a MultiFileReader in Go

Posted on Jan 5, 2022

Today I learned how to implement a #golang Reader which automatically reads multiple files for further processing of the resulting bytes.

Motivation

I recently had to combine multiple yara rule files from different sources. The nasty thing with yara is, it doesn’t like duplicated rules which lead to errors when compiling the rules files. As a solution for this I decided to parse the different rules files, remove the duplicates and export one large single yar file as a result. To open and parse multiple input files I implemented the following MultiFileReader.

Note: This is a quiet special case and only works because the yara compiler works with multiple rules in one file. In this case the files contain complete yara objects which can be concatinated and the order doesn’t matter. For example this will not work with json files, since they cannot be concatinated without nesting them e.g. into a json array. Also it should be noted that this is likely not the most performant solution to such a problem, but I found the implementation a good learning experience.

Workflow

  1. Find a list of files of a certain type
  2. Open each file and read the content until EOF (end of file)
  3. When an unprocessed file is left, close the last and jump back to 2.
  4. If no unprocessed file is left, return

Implementation

To implement the outlined workflow, I started creating a new type containing a slice of filenames filenames, a pointer to the currently opened file curFile and the index of the current filename curIdx.

type MultiFileReader struct {  
   filenames []string  
   curFile *os.File  
   curIdx int  
}

Now I implemented the Reader interface by implementing the Read function as a receiver function on the newly created type.

func (m *MultiFileReader) Read(p []byte) (int, error) {  
   readBytes := 0  
   var err error  

   // if one of the files is already opened
   if m.curFile != nil {  
      // call the Read function of that file
      readBytes, err = m.curFile.Read(p)  
	  // If end of file reached
      if readBytes == 0 && err == io.EOF {  
		 // close current file and iterate to next file in slice
         m.curFile.Close()  
         m.curFile = nil  
         m.curIdx++  
		 // if end of slice reached, return EOF indicator
         if m.curIdx == len(m.files) {  
            return 0, io.EOF  
         }  
      } else {  
		 // return read content
         return readBytes, nil  
      }  
   }  
   // if no file is open
   if m.curFile == nil {
	  // open next file
      m.curFile, err = os.Open(m.files[m.curIdx])  
      if err != nil {  
         log.Fatalf("Could not open file %s", m.files[m.curIdx])  
      }  
	  // call the Read function of that file
      return m.curFile.Read(p)  
   }  
  
   return 0, io.EOF  
}

Next I implemented a factory function to make the initialization easier

func NewMultiFileReader(root string, pattern string) *MultiFileReader {  
   m := MultiFileReader{}  
   var err error  
   // find all files in `root` directory with the given file `pattern`
   m.filenames, err = WalkMatch(root, pattern)  
   if err != nil {  
      log.Fatalf("Error looking for files in directory `%s` using pattern `%s`\n", 
				 root, pattern)  
   }  

   // Open first file
   m.curFile, err = os.Open(m.files[m.curIdx])  
   if err != nil {  
      log.Fatalf("Could not open file %s", m.files[m.curIdx])  
   } 

   return &m  
}

Usage example

In this example I’m reading multiple txt files using ioutils and the new MultiFileReader. The resulting bytes are just printed to the console.

package main

import (  
   "fmt"  
   "ioutil"
)

func main() {

	multiFileReader := NewMultiFileReader("/tmp/txtFiles/", "*.txt")
	
	b, err := ioutil.ReadAll(multiFileReader)

	fmt.Println(string(b))
}

While this example is maybe not the most useful for the regular usecases, it was a good excercise to explain how the Reader interface works and how to use it.