NOTE:This blog had a good run, but is now in retirement.
If you enjoy the content here, please support Gregory's ongoing work on the Practicing Ruby journal.

Using Blocks for Robustness

2009-04-09 14:25, written by Robert Klemme

Ruby’s blocks can be used for many purposes — in fact, they might well be the most used feature of the language. Today we will start looking at a frequently-used idiom, analyze its implications for robustness and demonstrate how blocks can greatly improve it.

Robustness

When talking about robustness of software1 we mean the degree to which it is able to function properly under changed circumstances. Those changes could be internal (i.e. introduced by a code change) or external (e.g. by changed input data). Sometimes achieving robustness is a hard task especially for complex software like operating systems. In this article we’ll look at robustness at a smaller scale.

A common file handling idiom

When you read through ruby-talk you will find code that looks like this (I just did some minimal adjustments to formatting and corrected obvious errors):

  # read file into an array
  f = File.open("mylogfile.log")
  saved = File.open("archive.log", "a")
  results = f.readlines
  # Is there a way to remove everything that's been read here?
  f.close

  results.each_line do |sending|
    conn_to_db.print("data #{sending}")
    # Make sure we keep a backup
    saved.puts("#{sending}")
  end

  # Close the archive file
  saved.close

The logic is straightforward: a file is opened, all its lines are read and then it is closed again. Similarly, the second file is opened for appending, then written to and finally closed. Sure, the second file could be opened later, but this is not where I want to draw your attention to.

Rather, please notice that File.open is used together with an explicit close. Closing the file IO object ensures that all pending output is written to the file and operating system resources associated with the underlying file handle are freed. While not closing a read only file usually does not have dramatic consequences, not closing a file opened for writing likely has dramatic consequences. You can see it by running this bit of code through Ruby 1.8.*:

  NAME = "foo"

  f1 = File.open NAME, "w"
  f1.puts "text 1"
  # forgot to close f1
  f2 = File.open NAME, "a"
  f2.puts "text 2"
  # forgot to close f2

On my system (CentOS 5.3, Ruby 1.8.5) I see only the first line in the file but the second puts has no effect. Even if you add a f2.close at the end of the script you won’t see any difference. You need to add the f1.close before you open f2 to actually see two lines in the output file.

File.close is the solution, or is it?

We just remember to always add a f.close whenever we have opened a file and be done. Everything written to the IO object will reach the file and we do not need to worry about anything any more. Or do we? Unfortunately, this is not the case. Especially in situations like the one shown above where exceptions can happen (we do not know what conn_to_db.print does, but it looks like another IO operation which could fail) we need to take additional measures to ensure the file is really closed.

We can use begin … rescue … end to do proper exception handling and make sure file descriptors are closed properly. But this tends to get verbose. Fortunately there is another option: File.open accepts a block and ensures that the file descriptor is closed regardless how the block is left.

Large files

When dealing with files as shown in the code example above there is another potential source of trouble: if files can grow arbitrarily large they will burn a lot of memory when read as a whole or even terminate the program because there is not enough memory available.

For reading files there is an even better alternative: File.foreach will read a file line by line (or using a different separator given as second argument) and hand each string to the block that is provided. This saves an additional io.each {|line| …} inside the File.open block — less typing and one level of indentation less. Also, as an added benefit we can efficiently process large files because we do not have to hold the complete file in memory when using iterative solutions.

Example rewritten

With these tools equipped we are now ready to rewrite the original bit to a more robust version:

  # read file and write to archive
  File.open("archive.log", "a") do |saved|
    File.foreach("mylogfile.log") do |sending|
      conn_to_db.print("data #{sending}")
      # Make sure we keep a backup
      saved.puts(sending) # removed superfluous string interpolation
    end
  end

Note, that if the timing of the original script is critical we might want to use File.readlines in order to read in all lines of “mylogfile.log” before starting to write to the “archive.log” as the original code does.

Conclusion

Using Ruby’s blocks goes a long way to making programs and scripts that deal with file IO more robust with regard to

  • exceptions occurring somewhere along the way,
  • large files.

So, whenever you do file IO in a Ruby program, remember you can use

  • the block form of File.open or
  • File.foreach

to make your code more robust without paying a significant price.

You can even resort to File.readlines if you have to read the whole file into memory and at least save a bit of typing. If you make it a habit to always use these idioms chances are that you’ll save a lot file IO related bug hunting in the future.

Next time I will look into how to create a method like File.open which uses a block to ensure automatic resource deallocation.

1 Here is a definition of robust and a Wikipedia Article about Robustness Principle.

blog comments powered by Disqus