<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Ruby Best Practices</title><link>http://blog.rubybestpractices.com/</link><description>Increase your productivity -- Write Better Code</description><language>en-us</language><item><title>Complexity</title><description>&lt;p&gt;If I tell you a program has 1,000,000 lines of code you&amp;#8217;ll immediately understand that this must be a complex piece of software.  Why is that?  You implicitly know that nobody copies the same 5 lines 200,000 times to create a long program which is full of redundancy.  This is another term we will have to consider while trying to improve our understanding of complexity.  As a first approximation we can say that a complex piece of software is free of redundancy.  This is of course an oversimplification and for various reasons that we will have to talk about in a minute we will not usually reach that mark in practice.&lt;/p&gt;
&lt;p&gt;This oversimplification however helps us to understand one important &amp;#8211; if not the most important &amp;#8211; task of software engineering: reduction of redundancy.  Assume that someone in an application you are writing you need to output all elements contained in an &lt;code&gt;Array&lt;/code&gt; in a special way (say, as an &lt;span class="caps"&gt;HTML&lt;/span&gt; unordered list).  You would probably write something like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  puts "&amp;lt;ul&amp;gt;"
  
  arr.each do |x|
    puts "  &amp;lt;li&amp;gt;#{x}&amp;lt;/li&amp;gt;"
  end
  
  puts "&amp;lt;/ul&amp;gt;"
&lt;/pre&gt;
&lt;p&gt;I know this is not the best of examples since you normally would be doing this with &lt;span class="caps"&gt;ERB&lt;/span&gt; or your favorite web framework.  Picking the right tools for the job is of course another important task in software engineering but we want to focus on something different which can be nicely demonstrated with this simple toy example.  (Also, you can easily follow by copying and pasting the code to &lt;span class="caps"&gt;IRB&lt;/span&gt; and play with it.)&lt;/p&gt;
&lt;p&gt;What do you do once you discover that you also have to apply the same formatting to a &lt;code&gt;Set&lt;/code&gt; in a different location of your program?  Right, you make it a function and write something like:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
def format_list(enum)
  puts "&amp;lt;ul&amp;gt;"
  
  enum.each do |x|
    puts "  &amp;lt;li&amp;gt;#{x}&amp;lt;/li&amp;gt;"
  end
  
  puts "&amp;lt;/ul&amp;gt;"
end
&lt;/pre&gt;
&lt;p&gt;In other words, instead of increasing redundancy of your program you refactor a part of it so you can use this functionality in different locations.  Now you discover that you also need to be able to write to a file.  Instead of writing a second function that has a second argument for the file you want to write to you rather add an argument to this function potentially making it look like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
def format_list(enum, out = $stdout)
  out.puts "&amp;lt;ul&amp;gt;"
  
  enum.each do |x|
    out.puts "  &amp;lt;li&amp;gt;#{x}&amp;lt;/li&amp;gt;"
  end
  
  out.puts "&amp;lt;/ul&amp;gt;"
end
&lt;/pre&gt;
&lt;p&gt;Again you avoided increasing redundancy of the application by adding complexity to this function which now is capable of applying the formatting in an even broader range of situations.  Let&amp;#8217;s push this one step further: now we also want to be able to do custom conversions of elements in the &lt;code&gt;Array&lt;/code&gt; instead of just always using &lt;code&gt;#to_s&lt;/code&gt;.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
TO_S = lambda {|o| o.to_s}

def format_list(enum, out = $stdout, &amp;amp;convert)
  convert ||= TO_S
  
  out.puts "&amp;lt;ul&amp;gt;"
  
  enum.each do |x|
    out.puts "  &amp;lt;li&amp;gt;#{convert[x]}&amp;lt;/li&amp;gt;"
  end
  
  out.puts "&amp;lt;/ul&amp;gt;"
end
&lt;/pre&gt;
&lt;p&gt;Now we can call it for a list of &lt;code&gt;Floats&lt;/code&gt; like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  floats = [1.2345, 4.567, 64.3]
  
  format_list floats do |f|
    "%010.3fcm" % f
  end
&lt;/pre&gt;
&lt;p&gt;Again, complexity of this function increased but we gained versatility.  In this case we are particularly lucky because all calls of the initial version of the function still work and produce the exact same result.  We might have to take a bit of runtime overhead because the original string interpolation is likely faster for the general case where only &lt;code&gt;#to_s&lt;/code&gt; is needed.  (Btw, if this program runs on some form of virtual runtime environment (such as the &lt;span class="caps"&gt;JVM&lt;/span&gt;) the situation might be different because the JVM&amp;#8217;s runtime optimization might produce better results if there is just a single function which is called more often than multiple methods that each are called infrequently.)&lt;/p&gt;
&lt;p&gt;I&amp;#8217;d say what we have seen above is a fairly typical evolution of a piece of software.  Instead of increasing redundancy of the program we increased complexity of this function.  It may seem that having different functions for various variants might be better, for example, these functions are certainly easier documented.  So why did we do this and strive to stick with a single function?&lt;/p&gt;
&lt;p&gt;Humans write software and while a piece of software might be bug free humans are not.  More importantly the world keeps &lt;a href="http://en.wikipedia.org/wiki/Heraclitus#Panta_rhei.2C_.22everything_flows.22"&gt;changing all the time&lt;/a&gt; and so do requirements for software.  Either we did not read the spec properly or someone changed his mind and now all of a sudden we need to ouput ordered lists.  That&amp;#8217;s an easy change if we just have a single implementation:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
def format_list(enum, out = $stdout, &amp;amp;convert)
  convert ||= TO_S
  
  out.puts "&amp;lt;ol&amp;gt;"
  
  enum.each do |x|
    out.puts "  &amp;lt;li&amp;gt;#{convert[x]}&amp;lt;/li&amp;gt;"
  end
  
  out.puts "&amp;lt;/ol&amp;gt;"
end
&lt;/pre&gt;
&lt;p&gt;The change is so minimal one barely sees it.  Yet, if we have to apply that change to multiple versions of this function potential for new bugs is much higher.  We might forget one of them (remember that all these do not necessarily be located in the same file) or we might misspell in some of them etc.  Also the effort is higher: for this particular change it might not be dramatic but just think about having to check in multiple files into your favourite source control system, having to update documentation for multiple functions, having to adjust unit tests or other test suites etc.&lt;/p&gt;
&lt;p&gt;All in all I think it is fair to say that we gained more than we lost.  We gained developer productivity and paid only a small runtime penalty and complexity of this function.  Up to this point we can say that increased complexity has brought about a better piece of software.  However you might have a premonition of degradation which will start if we add even more features to this function.  Here we have the first reason why we do not have redundancy free applications in software: humans can only handle a certain level of complexity efficiently and so we must avoid too complex systems if we want to be able to maintain the code.&lt;/p&gt;
&lt;p&gt;Another reason why we may end up with different versions of &lt;code&gt;format_list&lt;/code&gt; is the sheer size of an application.  Large applications need multiple authors and it is not too unrealistic that people independently invent similar functions in different parts.  While this would increase redundancy it might actually be desirable to do it: if there is just the one implementation of the function all components that need it must depend on the component that contains it.  Depending on the application and programming language used the price of an additional component dependency may actually be higher than the benefit.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Today we looked at software complexity caused by adding features to a piece of software to avoid redundancy.  In our daily work we continuously have to make decisions that affect this dimension of software complexity.  We have also seen how we might want to retain some level of redundancy to keep software maintainable.  Next we will look at other dimensions of software complexity.  Hopefully we will eventually come up with a classification of factors that lead to software complexity &amp;#8211; and how to tame them.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sun, 06 Jun 2010 20:37:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/021-Complexity.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/021-Complexity.html</guid></item><item><title>Code Massage</title><description>&lt;p&gt;This article started out as a mental experiment and led to a surprising result.  I post this mostly for the fun of it.  But of course you can take something away from it.  With that I do mean not only technical solutions.  I believe firmly that a certain level of playfulness actually helps finding better solutions.  The other ingredient you need is a certain eagerness for improvement which means to not be be content too early.  OK, let&amp;#8217;s start.&lt;/p&gt;
&lt;p&gt;The scenario I started out was this: suppose you want to anonymize email addresses because you want to publish an email but not expose addresses to spam harvesting.  Yet, you want to make sure that every address is always represented with the same replacement address in order to not change the meaning.  You might immediately answer &amp;#8220;We&amp;#8217;ll need a Hash so we can efficiently find addresses that have been replaced already&amp;#8221; &amp;#8211; and so did I.&lt;/p&gt;
&lt;h3&gt;Java Style&lt;/h3&gt;
&lt;p&gt;If you are familiar with the Java standar library you know that &lt;code&gt;java.util.Map&lt;/code&gt; has methods to check whether a key is present, to set values and to retrieve values.  So after switching to Ruby you might be tempted to do it like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
subst = {}

puts email.gsub(ADDR) {|match|
  if subst.has_key? match
    subst[match]
  else
    subst[match] = "&amp;lt;&amp;lt;MAIL #{subst.size}&amp;gt;&amp;gt;"
  end
}
&lt;/pre&gt;
&lt;p&gt;Whenever we encounter an email address we must first check whether we have generated a replacement string for this address already.  If not, we create a new one.  No rocket science.&lt;/p&gt;
&lt;h3&gt;A little more sophisticated&lt;/h3&gt;
&lt;p&gt;You might find documumentation of method &lt;a href="http://ruby-doc.org/ruby-1.9/classes/Hash.html#M000380"&gt;Hash#fetch&lt;/a&gt; when reading the library documentation which comes in handy because the block is invoked if the key is not present in the &lt;code&gt;Hash&lt;/code&gt;.  The code now looks a little shorter already:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
subst = {}

puts email.gsub(ADDR) {|match|
  subst.fetch(match) {|k| subst[k] = "&amp;lt;&amp;lt;MAIL #{subst.size}&amp;gt;&amp;gt;" }
}
&lt;/pre&gt;
&lt;h3&gt;O||=erator&lt;/h3&gt;
&lt;p&gt;A similar thing can be achieved with the ubiquituous operator &lt;code&gt;||=&lt;/code&gt; which allows for conditional execution.  In case you are not yet familiar with it you&amp;#8217;ll find plenty of discussions in ruby-talk that revolve around this.  The short summary is that &lt;code&gt;a ||= b&lt;/code&gt; is equivalent to &lt;code&gt;a || a = b&lt;/code&gt; and &lt;em&gt;not&lt;/em&gt; &lt;code&gt;a = a || b&lt;/code&gt; as you might be tempted to believe.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
subst = {}

puts email.gsub(ADDR) {|match|
  subst[match] ||= "&amp;lt;&amp;lt;MAIL #{subst.size}&amp;gt;&amp;gt;"
}
&lt;/pre&gt;
&lt;p&gt;The code has become even more shorter.  But we are not finished yet!&lt;/p&gt;
&lt;h3&gt;Outsourcing&lt;/h3&gt;
&lt;p&gt;If you need that replacement in multiple places of your code you&amp;#8217;ll likely put it into a method.  However, if you want to replace different things (i.e. you need different regular expressions) which can match the same string you might want to outsource the generation of the replacement string so you can use it with different calls of &lt;code&gt;gsub&lt;/code&gt;.  You can of course do it with an additional method but there is a more elegant way to do it:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
subst = Hash.new {|h,k| h[k] = "&amp;lt;&amp;lt;MAIL #{h.size}&amp;gt;&amp;gt;"}

puts email.gsub(ADDR) {|match| subst[match]}
&lt;/pre&gt;
&lt;p&gt;We simply use &lt;code&gt;Hash's&lt;/code&gt; default proc functionality for this.  This is basically the same the fetch block does but now the code is attached to the &lt;code&gt;Hash&lt;/code&gt; instance and not to the &lt;code&gt;#fetch&lt;/code&gt; call.&lt;/p&gt;
&lt;p&gt;You might wonder, how much further can we get?  And indeed, this solution is probably the most idiomatic one and the one you see most frequent in seasoned Ruby developers&amp;#8217; code.  It turns out though, that we can drive this further if we are prepared to use some newer Ruby features.&lt;/p&gt;
&lt;h3&gt;Getting tricky&lt;/h3&gt;
&lt;p&gt;Since Ruby 1.8.7 you can use &lt;em&gt;anything&lt;/em&gt; as a block parameter to a method provided it implements a method &lt;code&gt;to_proc&lt;/code&gt; which returns a &lt;code&gt;Proc&lt;/code&gt;.  Namely class &lt;code&gt;Symbol&lt;/code&gt; implements this method in the following way: it returns a proc which needs at least one argument when called and invokes the given method with the remaining arguments on that instance.  This allows for convenient operations like mapping data:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; (1..3).map &amp;amp;:to_s
=&amp;gt; ["1", "2", "3"]
&lt;/pre&gt;
&lt;p&gt;One thing that bugged me was that the block handed to &lt;code&gt;gsub&lt;/code&gt; above does nothing more than basically only forward the &lt;code&gt;Hash&lt;/code&gt; lookup.  With the new feature it should be possible to make the code a bit more concise.  Luckily there are some core classes that implement &lt;code&gt;to_proc&lt;/code&gt; already:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
$ ruby -e 'ObjectSpace.each_object(Module) {|m| p m if m.instance_methods.include? "to_proc"}'
Method
Proc
Symbol

$ ruby19 -e 'ObjectSpace.each_object(Module) {|m| p m if m.instance_methods.include? :to_proc}'
Method
Proc
Symbol
&lt;/pre&gt;
&lt;p&gt;We can exploit this fact and now we can write the code like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
subst = Hash.new {|h,k| h[k] = "&amp;lt;&amp;lt;MAIL #{h.size}&amp;gt;&amp;gt;"}

puts email.gsub(ADDR, &amp;amp;subst.method(:[]))
&lt;/pre&gt;
&lt;p&gt;&lt;i&gt;Note: code changed after arthurschreiber&amp;#8217;s comment.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;Now, that looks ugly, doesn&amp;#8217;t it?  We should be able to do something about that, because after all we love Ruby for its elegance and clear syntax.  Yes, we can!&lt;/p&gt;
&lt;h3&gt;Even shorter with a general solution&lt;/h3&gt;
&lt;p&gt;Since we can use &lt;em&gt;any&lt;/em&gt; object why not provide a general mechanism for this case?  Not only &lt;code&gt;Hash&lt;/code&gt; but also &lt;code&gt;Array&lt;/code&gt; and a lot more classes provide method &lt;code&gt;[]&lt;/code&gt; as a general hook for lookup or exeution:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
$ ruby19 -e 'ObjectSpace.each_object(Module) {|m| p m if m.instance_methods.include? :[]}'
Thread
Method
Proc
Struct::Tms
MatchData
Struct
Hash
Array
Bignum
Fixnum
Symbol
String
&lt;/pre&gt;
&lt;p&gt;Now, let&amp;#8217;s allow all these to be simply used as block parameters!&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class Object
  def to_proc(m = :[])
    method(m).to_proc
  end
end

subst = Hash.new {|h,k| h[k] = "&amp;lt;&amp;lt;MAIL #{h.size}&amp;gt;&amp;gt;"}

puts email.gsub(ADDR, &amp;amp;subst)
&lt;/pre&gt;
&lt;p&gt;Now we can just pass any &lt;code&gt;Hash&lt;/code&gt; instance to &lt;code&gt;gsub&lt;/code&gt;.  The working logic for calculating our replacement string is now completely restricted to the &lt;code&gt;Hash&lt;/code&gt; creation.  This is a really elegant solution!&lt;/p&gt;
&lt;h3&gt;Golf&lt;/h3&gt;
&lt;p&gt;We can reduce the number of characters to type a bit more by throwing out the variable declaration and effectively turn this into a one liner:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
puts email.gsub(ADDR, &amp;amp;Hash.new {|h,k| h[k] = "&amp;lt;&amp;lt;MAIL #{h.size}&amp;gt;&amp;gt;"})
&lt;/pre&gt;
&lt;p&gt;I don&amp;#8217;t think this is an improvement over the last variant but sometimes it helps driving things as far as possible to find out where in the process we reached the optimum.&lt;/p&gt;
&lt;h3&gt;The fun begins&lt;/h3&gt;
&lt;p&gt;Some classes do also implement method &lt;code&gt;[]&lt;/code&gt; &amp;#8211; we should be able to make good use of that as well.  We might be tempted to create a lot of &lt;code&gt;Struct&lt;/code&gt; instances via this method.  It can be done but we have to do some tweaking because &lt;code&gt;Struct.[]&lt;/code&gt; does not splat a single &lt;code&gt;Array&lt;/code&gt; argument so we have to redefine it a bit:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
Name = Struct.new :forename, :surname

# unfortunately this does not work with the default Struct.[]
def Name.[](a)
  new(*a)
end

p [
  ["John", "Doe"],
  ["John", "Cleese"],
  ["Mickey", "Mouse"],
].map(&amp;amp;Name)

# maybe a bit better:
def Name.create(a)
  new(*a)
end

p [
  ["John", "Doe"],
  ["John", "Cleese"],
  ["Mickey", "Mouse"],
].map(&amp;amp;Name.to_proc(:create))
&lt;/pre&gt;
&lt;p&gt;I hope you had some fun reading this and more importantly playing around yourself.  Trying out all things will certainly help you discover new ways and improve your skills.&lt;/p&gt;
&lt;p&gt;As usually I have placed the &lt;a href="http://gist.github.com/317580"&gt;code at github&lt;/a&gt; .  If you look at it, please don&amp;#8217;t get yourself hung up on the regular expression for matching email addresses.  This is a whole topic of its own and I just hacked something together to make the code work.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sun, 28 Feb 2010 13:32:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/020-Code_Massage.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/020-Code_Massage.html</guid></item><item><title>The Complete Numeric Class</title><description>&lt;p&gt;As announced in the &lt;a href="018-Complete_Class.html"&gt;previous article&lt;/a&gt; we will look at a complete number class today.  I will use the example of a integer number which, when printed, will show up as hex number (as opposed to the decimal presentation of &lt;code&gt;Fixnum&lt;/code&gt; and relatives).  As before the main point is not sophisticated logic or usefulness of the class.  Instead I will keep the logic simple so we can focus on the aspects I try to convey with today&amp;#8217;s article:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;conversions into &lt;code&gt;HexNum&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;conversions of &lt;code&gt;HexNum&lt;/code&gt; to other types&lt;/li&gt;
	&lt;li&gt;equivalence vs. comparability&lt;/li&gt;
	&lt;li&gt;type coercion&lt;/li&gt;
	&lt;li&gt;math and operator overloading&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For completeness reasons other aspects mentioned in the previous article will be implemented as well but I won&amp;#8217;t discuss them in detail here.&lt;/p&gt;
&lt;h3&gt;Conversions into &lt;code&gt;HexNum&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The first step into the world of &lt;code&gt;HexNum&lt;/code&gt; is creation of a new object of course.  This part is not that much interesting and so I will only gloss over the implementation.  I have provided a few ways:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;constructor&lt;/li&gt;
	&lt;li&gt;conversion methods (&lt;code&gt;to_hex&lt;/code&gt;),&lt;/li&gt;
	&lt;li&gt;explicit method (similar to &lt;code&gt;Integer()&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are shown in the code snippet below:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class HexNum &amp;lt; Numeric

  # Create a new instance from an int or String.
  def initialize(val)
    case val
    when String
      @i = parse_string(val)
      @s = val.frozen? ? val : val.dup.freeze
    when Numeric
      @i = val.to_i
    else
      raise ArgumentError, 'Cannot convert %p' % val
    end
  end

end

# more conversions

def HexNum(i)
  HexNum.new(Integer(i))
end

class Object
  def to_hex
    HexNum.new(to_i)
  end
end
&lt;/pre&gt;
&lt;p&gt;As you can see, a &lt;code&gt;HexNum&lt;/code&gt; consists of an integer value and optionally a &lt;code&gt;String&lt;/code&gt;.  The integer value is the mandatory part which will be used in most methods while the &lt;code&gt;String&lt;/code&gt; is just an helper intended to make conversions to &lt;code&gt;String&lt;/code&gt; more efficient (I did not make any measurements though &amp;#8211; I mainly wanted to make the class a tad more interesting).&lt;/p&gt;
&lt;p&gt;Method &lt;code&gt;Object.to_hex&lt;/code&gt; is based on the presence of method &lt;code&gt;to_i&lt;/code&gt;.  This is debatable.  Basing this conversion on method &lt;code&gt;to_int&lt;/code&gt; is as reasonable &lt;span class="caps"&gt;IMHO&lt;/span&gt;.  As you can see, the set of classes which implement &lt;code&gt;to_i&lt;/code&gt; differs from those which implement &lt;code&gt;to_int&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):002:0&amp;gt; s1 = []; s2 = []
=&amp;gt; []
irb(main):003:0&amp;gt; ObjectSpace.each_object(Module) do |m|
irb(main):004:1* s1 &amp;lt;&amp;lt; m if m.instance_methods.include? :to_i
irb(main):005:1&amp;gt; s2 &amp;lt;&amp;lt; m if m.instance_methods.include? :to_int
irb(main):006:1&amp;gt; end
=&amp;gt; 407
irb(main):007:0&amp;gt; s1
=&amp;gt; [Complex, Rational, Process::Status, Time, File, ARGF.class, IO, Bignum, Float, Fixnum, Integer, String, NilClass]
irb(main):008:0&amp;gt; s2
=&amp;gt; [Complex, Rational, Bignum, Float, Fixnum, Integer, Numeric]
&lt;/pre&gt;
&lt;p&gt;The rationale behind this is that only types which can be used as integers should implement &lt;code&gt;to_int&lt;/code&gt;.  Method &lt;code&gt;to_i&lt;/code&gt; is merely a conversion method which turns &amp;#8220;something&amp;#8221; into an int.  I chose to base &lt;code&gt;to_hex&lt;/code&gt; on &lt;code&gt;to_i&lt;/code&gt; because this increases the number of cases where you can immediately use a &lt;code&gt;HexNum&lt;/code&gt;.  If you need more strict argument checks, you can use &lt;code&gt;HexNum()&lt;/code&gt; (the method) which is modeled similar to &lt;code&gt;Integer()&lt;/code&gt; and in fact uses it internally in order to benefit from its argument checking.&lt;/p&gt;
&lt;p&gt;You might have expected &lt;code&gt;HexNum&lt;/code&gt; to inherit &lt;code&gt;Integer&lt;/code&gt;.  Actually, that&amp;#8217;s what I would have done, too.  But, it turns out, if you make a class inherit &lt;code&gt;Integer&lt;/code&gt; you cannot create instances of it any more:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):007:0&amp;gt; class X &amp;lt; Integer
irb(main):008:1&amp;gt; end
=&amp;gt; nil
irb(main):009:0&amp;gt; X.new
NoMethodError: undefined method `new' for X:Class
        from (irb):9
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
irb(main):010:0&amp;gt; class X;end
=&amp;gt; nil
irb(main):011:0&amp;gt; def X.new; allocate; end
=&amp;gt; nil
irb(main):012:0&amp;gt; X.new
TypeError: allocator undefined for X
        from (irb):11:in `allocate'
        from (irb):11:in `new'
        from (irb):12
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
irb(main):013:0&amp;gt; Integer.class
=&amp;gt; Class
irb(main):014:0&amp;gt;
&lt;/pre&gt;
&lt;p&gt;This is a bit unfortunate since &lt;code&gt;Integer&lt;/code&gt; would be the proper base class for our &lt;code&gt;HexNum&lt;/code&gt;.  Inheriting &lt;code&gt;Numeric&lt;/code&gt; is the second best we can do.&lt;/p&gt;
&lt;h3&gt;Conversions to other types&lt;/h3&gt;
&lt;p&gt;These conversions are done by the typical set of &lt;code&gt;to_xyz&lt;/code&gt; methods:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # conversions

  def to_s
    @s ||= (@i &amp;lt; 0 ? '-0x%x' % -@i : '0x%x' % @i).freeze
  end

  def to_i
    @i
  end

  alias to_int to_i

  def to_hex
    self
  end
&lt;/pre&gt;
&lt;p&gt;Please note that for efficiency reasons class &lt;code&gt;HexNum&lt;/code&gt; also contains a method &lt;code&gt;to_hex&lt;/code&gt;.  Other than that there are really no surprises here.  Conversion to string may look a bit complicated but it&amp;#8217;s really just the different treatment of negative and positive values plus caching of the converted &lt;code&gt;String&lt;/code&gt;.  The #freeze just ensures that changes of the cached value outside the class cannot backfire.  If we weren&amp;#8217;t using &lt;code&gt;#freeze&lt;/code&gt; here, we would have to create a new &lt;code&gt;String&lt;/code&gt; instance for every &lt;code&gt;to_s&lt;/code&gt; invocation which would defy the whole point of caching the value.&lt;/p&gt;
&lt;h3&gt;Mutability&lt;/h3&gt;
&lt;p&gt;An important point to note is that the class is immutable.  While this is not mandatory creating a mutable class does not blend well with how Ruby handles arithmetic operators.  If you implement operator + as in place modification you will get all the issues typically caused by aliasing, i.e. using the same instance from different places in code.  As they say, &amp;#8220;When in Rome do as the Romans do&amp;#8221; &amp;#8211; so we will stick with the convention used throughout Ruby&amp;#8217;s core library and make our &lt;code&gt;HexNum&lt;/code&gt; immutable too.  That way we can use instances of &lt;code&gt;HexNum&lt;/code&gt; where we would otherwise have used &lt;code&gt;Fixnums&lt;/code&gt; or &lt;code&gt;Bignums&lt;/code&gt;.  And after all this was the aim of the exercise: to demonstrate how to create a class that seemlessly blends with all the other numeric types of Ruby&amp;#8217;s core and standard library.&lt;/p&gt;
&lt;h3&gt;Equivalence vs. Comparability&lt;/h3&gt;
&lt;p&gt;Now we slowly get to the more interesting topics.  Every class has comparison for equivalence through method &lt;code&gt;#eql?&lt;/code&gt; and operator &lt;code&gt;==&lt;/code&gt; (see also the discussion of the topic in the &lt;a href="018-Complete_Class.html"&gt;previous article&lt;/a&gt;).  For seamless blending of &lt;code&gt;HexNum&lt;/code&gt; with other numeric types comparison with &lt;code&gt;==&lt;/code&gt; should only look at the numerical value so that &lt;code&gt;HexNum(1)&lt;/code&gt; and &lt;code&gt;1.0&lt;/code&gt; compare &lt;code&gt;true&lt;/code&gt;.  However since we do not have influence on &lt;code&gt;Fixnum's&lt;/code&gt; and &lt;code&gt;Float's&lt;/code&gt; implementation of &lt;code&gt;===&lt;/code&gt; and since equivalence is a symmetric relation (i.e. &lt;code&gt;a == b&lt;/code&gt; must return the same as &lt;code&gt;b == a&lt;/code&gt;) we can only establish equivalence with other &lt;code&gt;HexNum&lt;/code&gt; instances.  I think this is unfortunate.  I can only speculate about Matz&amp;#8217;s reasons to not use &lt;code&gt;#coerce&lt;/code&gt; in this situation: I assume he did it this way for performance reasons since these types of comparisons are very frequent in a program.&lt;/p&gt;
&lt;p&gt;So, this is how equivalence checking code looks like:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # equivalence

  def eql?(num)
    self.class.equal?(num.class) &amp;amp;&amp;amp; @i == num.to_i
  end

  alias == eql?
&lt;/pre&gt;
&lt;p&gt;No big surprises here.  You&amp;#8217;ll note the type check which is necessary because of the aforementioned implementation details of core classes.&lt;/p&gt;
&lt;p&gt;Things stand dramatically different with respect to comparison operator &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt;.  The definition of this operator&amp;#8217;s semantics was nicely presented by Marc in &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html#comment-21155640"&gt;his comments&lt;/a&gt; to the last article.  Basically, &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; must return &lt;code&gt;nil&lt;/code&gt; if classes of compared instances are differnt.  But what is this?&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; 1 &amp;lt;=&amp;gt; 1.0
=&amp;gt; 0
irb(main):002:0&amp;gt; 1 &amp;lt;=&amp;gt; 1.5
=&amp;gt; -1
irb(main):003:0&amp;gt; 0.5 &amp;lt;=&amp;gt; 1
=&amp;gt; -1
irb(main):004:0&amp;gt; 0.5 &amp;lt;=&amp;gt; 1 &amp;lt;&amp;lt; 40
=&amp;gt; -1
irb(main):005:0&amp;gt;
&lt;/pre&gt;
&lt;p&gt;Numeric classes allow for comparison across a wide range ot types.  Can we make &lt;code&gt;HexNum&lt;/code&gt; blend in here?  It turns out, that we can.  This is the time to introduce a functionality that took me a while understand initially.  But this is at the heart of Ruby&amp;#8217;s operator overloading and probably the single most important point to consider when implementing numeric types.&lt;/p&gt;
&lt;h3&gt;What&amp;#8217;s this &lt;code&gt;#coerce&lt;/code&gt; thingy?&lt;/h3&gt;
&lt;p&gt;When talking about equivalence it might have occurred to you that the behavior of &lt;code&gt;==&lt;/code&gt; should depend not on a single class but actually on the type of &lt;em&gt;both&lt;/em&gt; arguments.  The rule whether two objects are equivalent include&amp;#8217;s both object&amp;#8217;s class and statements about state of both instances (in case of &lt;code&gt;HexNum&lt;/code&gt; for example, both must have the same integer value).  Now, Ruby &amp;#8211; like many object oriented languages &amp;#8211; has &amp;#8216;single &lt;a href="http://en.wikipedia.org/wiki/Dynamic_dispatch"&gt;dispatch&lt;/a&gt;&amp;#8217; which means only the type of the receiver (the object to the left of the dot in a method call or &lt;code&gt;self&lt;/code&gt;) determines which method is actually used.  This works remarkably well most of the time but there are cases where you rather want to make the dispatch (and thus the decision which code is executed) depend on the receiver and one or more of the method&amp;#8217;s arguments.  Binary operators are a typical case of this.&lt;/p&gt;
&lt;p&gt;The usual solution in other languages is to overload a method based on argument types and in Ruby you would likely employ some form of type checking similar to what I have done in &lt;code&gt;HexNum's&lt;/code&gt; method &lt;code&gt;#initialize&lt;/code&gt;.  However, this approach has a fundamental drawback: when writing class A the author must know all classes B, C and D that are used as argument types.  This hurts extensibility seriously since for every new numeric type the code of older types needs adjustment.  While this would be possible due to Ruby&amp;#8217;s dynamic nature this is on the one hand tedious and it would have performance implications on the other hand because you would likely be implementing those &amp;#8220;enahancements&amp;#8221; in Ruby (and not C) and methods would have to check for more and more types.&lt;/p&gt;
&lt;p&gt;Ruby&amp;#8217;s solution to this is very elegant (as many other aspects of the language): whenever a methods receives a type different from its own type it asks that type to do a conversion so the operation can finally be handled.  It does that via method &lt;code&gt;#coerce&lt;/code&gt; which receives the &lt;em&gt;caller&lt;/em&gt; as argument so the other (presumably newer) type can find an adequate conversion.  That way, only types written later need to know about older types (e.g. everything in the core and standard library).  Let&amp;#8217;s look at how &lt;code&gt;#coerce&lt;/code&gt; works by using it on some standard types:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; 1.coerce 2
=&amp;gt; [2, 1]
irb(main):002:0&amp;gt; 2.0.coerce 1
=&amp;gt; [1.0, 2.0]
irb(main):003:0&amp;gt; 1.coerce 2.0
=&amp;gt; [2.0, 1.0]
&lt;/pre&gt;
&lt;p&gt;The result of invoking &lt;code&gt;#coerce&lt;/code&gt; has two interesting properties:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;The types of returned values are identical.&lt;/li&gt;
	&lt;li&gt;The return value is really an &lt;code&gt;Array&lt;/code&gt; with the order reversed compared to the invocation (the receiver is the &lt;em&gt;second&lt;/em&gt; object in the array).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While we immediately see the usefulness of 1 (every class should know how to handle instances of itself) the second property irritated me at first.  But it does make sense if you consider that the argument to &lt;code&gt;#coerce&lt;/code&gt; is really the original receiver of the method call.&lt;/p&gt;
&lt;p&gt;I don&amp;#8217;t want to hide some inconsistencies of the standard functionality from you:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):004:0&amp;gt; 1.coerce 1&amp;lt;&amp;lt;40
=&amp;gt; [1099511627776.0, 1.0]
irb(main):005:0&amp;gt; (1&amp;lt;&amp;lt;40).coerce 2
=&amp;gt; [2, 1099511627776]
irb(main):006:0&amp;gt; (1&amp;lt;&amp;lt;40).coerce 3.0
TypeError: can't coerce Float to Bignum
        from (irb):6:in `coerce'
        from (irb):6
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
irb(main):007:0&amp;gt; 3.0.coerce 1&amp;lt;&amp;lt;40
=&amp;gt; [1099511627776.0, 3.0]
irb(main):008:0&amp;gt; 1&amp;lt;&amp;lt;40
=&amp;gt; 1099511627776
irb(main):009:0&amp;gt; (1&amp;lt;&amp;lt;40) + 3.0
=&amp;gt; 1099511627779.0
&lt;/pre&gt;
&lt;p&gt;You can add &lt;code&gt;Bignum&lt;/code&gt; and &lt;code&gt;Float&lt;/code&gt; but &lt;code&gt;#coerce&lt;/code&gt; fails on one direction.  If anybody can provide a sensible explanation of this please let us know.  For the moment I am inclined to believe that it&amp;#8217;s due to the fact that implementations of core types are done in C and there are probably some corners cut.&lt;/p&gt;
&lt;p&gt;Now look at a simple example which demonstrates the mechanics of &lt;code&gt;#coerce&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
$ ruby19 &amp;lt;&amp;lt;CODE
&amp;gt; a = 1
&amp;gt; b = Object.new
&amp;gt; def b.coerce(x) [x,2] end
&amp;gt; p b
&amp;gt; set_trace_func lambda {|*a| p a}
&amp;gt; puts a + b
&amp;gt; CODE
#&amp;lt;Object:0x10028284&amp;gt;
["c-return", "-", 5, :set_trace_func, #&amp;lt;Binding:0x10027dd0&amp;gt;, Kernel]
["line", "-", 6, nil, #&amp;lt;Binding:0x10027adc&amp;gt;, nil]
["c-call", "-", 6, :+, #&amp;lt;Binding:0x10027724&amp;gt;, Fixnum]
["call", "-", 3, :coerce, #&amp;lt;Binding:0x1002721c&amp;gt;, #&amp;lt;Object:0x10028284&amp;gt;]
["line", "-", 3, :coerce, #&amp;lt;Binding:0x10026e48&amp;gt;, #&amp;lt;Object:0x10028284&amp;gt;]
["return", "-", 3, :coerce, #&amp;lt;Binding:0x10026924&amp;gt;, #&amp;lt;Object:0x10028284&amp;gt;]
["c-call", "-", 6, :+, #&amp;lt;Binding:0x100263e4&amp;gt;, Fixnum]
["c-return", "-", 6, :+, #&amp;lt;Binding:0x10025fbc&amp;gt;, Fixnum]
["c-return", "-", 6, :+, #&amp;lt;Binding:0x10025a28&amp;gt;, Fixnum]
["c-call", "-", 6, :puts, #&amp;lt;Binding:0x10025424&amp;gt;, Kernel]
["c-call", "-", 6, :puts, #&amp;lt;Binding:0x10024f54&amp;gt;, IO]
["c-call", "-", 6, :to_s, #&amp;lt;Binding:0x10024854&amp;gt;, Fixnum]
["c-return", "-", 6, :to_s, #&amp;lt;Binding:0x10023db8&amp;gt;, Fixnum]
["c-call", "-", 6, :write, #&amp;lt;Binding:0x10023878&amp;gt;, IO]
3["c-return", "-", 6, :write, #&amp;lt;Binding:0x10022f9c&amp;gt;, IO]
["c-call", "-", 6, :write, #&amp;lt;Binding:0x10022ae8&amp;gt;, IO]

["c-return", "-", 6, :write, #&amp;lt;Binding:0x10022768&amp;gt;, IO]
["c-return", "-", 6, :puts, #&amp;lt;Binding:0x1002227c&amp;gt;, IO]
["c-return", "-", 6, :puts, #&amp;lt;Binding:0x10021b0c&amp;gt;, Kernel]
&lt;/pre&gt;
&lt;p&gt;As you can see in line 6 &lt;code&gt;Fixnum's&lt;/code&gt; operator &lt;code&gt;+&lt;/code&gt; is invoked.  Since &lt;code&gt;Fixnum&lt;/code&gt; does not know how to add an &lt;code&gt;Object&lt;/code&gt; it invokes its &lt;code&gt;#coerce&lt;/code&gt;.  For simplicity reasons we always return a fixed value of 2 for the &lt;code&gt;Object&lt;/code&gt; instance and as you can see &lt;code&gt;Fixnum's&lt;/code&gt; operator &lt;code&gt;+&lt;/code&gt; is invoked again and now returns immediately.  The final result is properly calculated as 3.&lt;/p&gt;
&lt;p&gt;Now let&amp;#8217;s look at &lt;code&gt;HexNum's&lt;/code&gt; method &lt;code&gt;#coerce&lt;/code&gt; given the long introduction it looks surprisingly simple:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # coercion

  def coerce(o)
    [HexNum.new(o.to_int), self]
  end

  # comparability

  include Comparable

  def &amp;lt;=&amp;gt;(o)
    case o
    when HexNum
      @i &amp;lt;=&amp;gt; o.to_i
    when Numeric
      @i &amp;lt;=&amp;gt; o
    else
      a, b = o.coerce(self)
      a &amp;lt;=&amp;gt; b
    end rescue nil
  end
&lt;/pre&gt;
&lt;p&gt;We just try to create a new &lt;code&gt;HexNum&lt;/code&gt; if the other type can be converted to an integer and go from there.  In other cases you might have to check argument types to apply specific conversions but for our &lt;code&gt;HexNum&lt;/code&gt; this is enough to ensure functionality.&lt;/p&gt;
&lt;p&gt;Now you can also see how comparison works for &lt;code&gt;HexNum&lt;/code&gt;: if the type is known we rely on comparison functionality of the standard library.  And in order to make &lt;code&gt;HexNum&lt;/code&gt; also work with classes written later &lt;code&gt;#coerce&lt;/code&gt; is invoked on all other arguments.  A last detail: if &lt;code&gt;#coerce&lt;/code&gt; fails it is supposed to throw an exception.  Since the contract of &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; prohibits that we add a &lt;code&gt;rescue nil&lt;/code&gt; so we always end up with a proper comparison result.&lt;/p&gt;
&lt;h3&gt;Math and Operator Overloading&lt;/h3&gt;
&lt;p&gt;Now that we understand which role &lt;code&gt;#coerce&lt;/code&gt; plays in Ruby&amp;#8217;s operator land we can implement all the operators available in a way as to ensure we always get proper results.  In our case this means, we always want to have a &lt;code&gt;HexNum&lt;/code&gt; result with proper int math.&lt;/p&gt;
&lt;p&gt;There is one thing that we need to ensure for all operators: they must handle their own type directly.  If we fail to do that, we may end up in infinite recursion:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; class X
irb(main):002:1&amp;gt;   def +(o)
irb(main):003:2&amp;gt;     case o
irb(main):004:3&amp;gt;     when Integer
irb(main):005:3&amp;gt;       o + 1
irb(main):006:3&amp;gt;     else
irb(main):007:3*       a, b = o.coerce(self)
irb(main):008:3&amp;gt;       a + b
irb(main):009:3&amp;gt;     end
irb(main):010:2&amp;gt;   end
irb(main):011:1&amp;gt;
irb(main):012:1*   def coerce(o)
irb(main):013:2&amp;gt;     [X.new, self]
irb(main):014:2&amp;gt;   end
irb(main):015:1&amp;gt; end
=&amp;gt; nil
irb(main):016:0&amp;gt; x = X.new
=&amp;gt; #&amp;lt;X:0x10145c50&amp;gt;
irb(main):017:0&amp;gt; x + 1
=&amp;gt; 2
irb(main):018:0&amp;gt; 1 + x
SystemStackError: stack level too deep
        from (irb):7:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
... 7229 levels...
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):18:in `+'
        from (irb):18
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'irb(main):019:0&amp;gt; x + x
SystemStackError: stack level too deep
        from (irb):7:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
... 7229 levels...
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):8:in `+'
        from (irb):19
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'irb(main):020:0&amp;gt;
irb(main):021:0*
&lt;/pre&gt;
&lt;p&gt;So let&amp;#8217;s look at operator &lt;code&gt;+&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def +(o)
    case o
    when Integer
      HexNum.new(@i + o)
    when Numeric
      HexNum.new(@i + o.to_i)
    else
      a, b = o.coerce(self)
      a + b
    end
  end
&lt;/pre&gt;
&lt;p&gt;We know that all &lt;code&gt;Integer&lt;/code&gt; instances do return an &lt;code&gt;Integer&lt;/code&gt; instance from their implementation of operator &lt;code&gt;+&lt;/code&gt; so we can simply use that on our instance varible containing the numerical value.  For all other &lt;code&gt;Numeric&lt;/code&gt; instances (including &lt;code&gt;HexNum&lt;/code&gt; itself) we can work with the instance value converted to int.  All others are responsible for returning appropriate values for themself and the &lt;code&gt;HexNum&lt;/code&gt; instance so the math can work properly.&lt;/p&gt;
&lt;p&gt;Operator &lt;code&gt;*&lt;/code&gt; works a bit differently because for proper results of multiplication we need to use the numeric value as isand convert later:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def *(o)
    case o
    when HexNum
      HexNum.new(@i * o.to_i)
    when Numeric
      HexNum.new(@i * o)
    else
      a, b = o.coerce(self)
      a * b
    end
  end
&lt;/pre&gt;
&lt;p&gt;Of course we could refactor common parts into an operator implementing method but I did not want to get too fancy here and rather demonstrate how particular math operations might need different treatment.  For other operators we rely on the argument&amp;#8217;s compatibility to an integer since other types do not really make sense here:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # asymmetric binary operators

  def &amp;lt;&amp;lt;(o)
    HexNum.new(@i &amp;lt;&amp;lt; o.to_int)
  end

  # bit operators

  def &amp;amp;(o)
    HexNum.new(@i &amp;amp; o.to_int)
  end
&lt;/pre&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;We have seen how a numeric class needs to be implemented differently from &lt;a href="018-Complete_Class.html"&gt;other classes&lt;/a&gt; that try to be complete.  The major point is to provide &lt;code&gt;#coerce&lt;/code&gt; with meaningful semantics and implement all operators.  As always you can find the complete code &lt;a href="http://gist.github.com/309694"&gt;at github&lt;/a&gt;.  Have fun playing around with it!&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sat, 20 Feb 2010 14:25:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html</guid></item><item><title>The Complete Class</title><description>&lt;p&gt;&lt;i&gt;A remark: we enabled comment moderation because the blog was recently target of spam.  You probably have not seen much of it because we were pretty quick in removing it manually.  So if your comment does not show up please be patient.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;There are some basic concepts (often called &amp;#8220;aspects&amp;#8221;) that need to be implemented for many classes although not all classes need all (or even any) of them:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;initialization&lt;/li&gt;
	&lt;li&gt;conversion to a printable string&lt;/li&gt;
	&lt;li&gt;equivalence&lt;/li&gt;
	&lt;li&gt;hash code calculation&lt;/li&gt;
	&lt;li&gt;comparability&lt;/li&gt;
	&lt;li&gt;cloning (&lt;code&gt;clone&lt;/code&gt; and &lt;code&gt;dup&lt;/code&gt;)&lt;/li&gt;
	&lt;li&gt;freezing&lt;/li&gt;
	&lt;li&gt;customized persistence (&lt;code&gt;Marshal&lt;/code&gt; and &lt;code&gt;Yaml&lt;/code&gt;)&lt;/li&gt;
	&lt;li&gt;matching&lt;/li&gt;
	&lt;li&gt;math and operator overloading&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Which of these is needed for a particular class depends of course completely on the circumstances.  Classes which are never written to disk or used in a DRb context will not need customized persistence handling.  For other classes there might not be a reasoable ordering.&lt;/p&gt;
&lt;p&gt;We will look at these concepts individually in subsequent sections.  For the sake of this presentation I will create a slightly artificial class which will have particular properties in order to be able to show all the concepts:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;mutable fields&lt;/li&gt;
	&lt;li&gt;redundant fields, i.e. fields which carry cached values that can be derived from other fields&lt;/li&gt;
	&lt;li&gt;at least two fields for ordering priorities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Class &lt;code&gt;Album&lt;/code&gt; implements a music album with a title, interpret, sequence of tracks and a fixed pause duration between tracks.  I picked a slightly different approach than Eric in &lt;a href="http://blog.segment7.net/articles/2008/12/17/friendly-ruby-objects"&gt;his article&lt;/a&gt; in two ways:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Eric has a stronger focus on collaboration with standard library classes while my guiding question was &amp;#8220;what does a class need to be complete and consistent?&amp;#8221;,&lt;/li&gt;
	&lt;li&gt;I present all the features in a single class to show how aspects play together.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You will likely never implement all these aspects in a single class.  Even for those aspects you do use, you might not use the same implementations I will present.  That&amp;#8217;s OK.  The implementations presented in this article are intended to cover aspects thoroughly even though you will not always have to do that in practice.  For example, certain code is there in order to make the class work properly as part of an inheritance hierarchy.  If you write a one off class for a script which is never intended for inheritance you can simplify many of the presented methods.&lt;/p&gt;
&lt;p&gt;I left out the topic of math and operator overloading since that does not mix well with the concept of music album.  Instead I will cover that in the next article in the blog which will present a class that shows how to override operators in Ruby and that plays well with Ruby&amp;#8217;s built in numeric classes.&lt;/p&gt;
&lt;h3&gt;Initialization&lt;/h3&gt;
&lt;p&gt;Implementing method &lt;code&gt;initialize&lt;/code&gt; is typically one of the first things I do when implementing a new class unless I can use the default implementation of &lt;code&gt;Struct&lt;/code&gt;.  There are a few things though that are worth considering.&lt;/p&gt;
&lt;p&gt;First of all, who owns arguments to &lt;code&gt;initialize&lt;/code&gt;?  It is important to make clear what happens to arguments that are provided to the call to &lt;code&gt;new&lt;/code&gt;.  There are three cases&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;the value is immutable (like &lt;code&gt;Fixnum&lt;/code&gt;) or&lt;/li&gt;
	&lt;li&gt;the caller retains ownership or&lt;/li&gt;
	&lt;li&gt;ownership is transferred to the new instance.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Case 1 is the simple one: basically you do not need care thinking about who owns the object as there are no bad effects which can be caused by &lt;a href="http://en.wikipedia.org/wiki/Aliasing_(computing)"&gt;aliasing&lt;/a&gt;.  If the instance is mutable these effects can show up.  If you want to make your code as robuts as possible you must ensure you got your own copy of the instance (typically via copying it, for example by using &lt;code&gt;dup&lt;/code&gt;).  The downside of this is of course that this costs performance as you&amp;#8217;ll likely copy too many objects.  In practice you will probably most of the time do nothing special and keep the code as simple as an assignment.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class Album

  def initialize(title, interpret, pause = 0)
    super()

    # main properties
    self.title = title
    self.interpret = interpret
    @tracks = []

    # additional properties
    self.pause = pause

    # redundant properties
    # left out
  end

  def title=(t)
    @title = t.dup.freeze
  end

  def interpret=(i)
    @interpret = i.dup.freeze
  end

  def pause=(time)
    @pause = time
    @duration = nil
  end

end
&lt;/pre&gt;
&lt;p&gt;The other important aspect is inheritance.  Most classes that are written probably do not have a &lt;code&gt;super&lt;/code&gt; in their &lt;code&gt;initialize&lt;/code&gt; method.  If there are chances that you reopen the class and add mixin modules you should include &lt;code&gt;super()&lt;/code&gt; right from the start because even if you only implicitly inherit &lt;code&gt;Object&lt;/code&gt; and &lt;code&gt;initialize&lt;/code&gt; in Object does nothing you may later reopen the class and add a mixin module which has an &lt;code&gt;initialize&lt;/code&gt; method itself.  If you inherit another class than &lt;code&gt;Object&lt;/code&gt; you should explicitly mention arguments with &lt;code&gt;super&lt;/code&gt; and not rely on having the same argument list as the super class initializer.  That way you are making the call more explicit and are robust against changes in your &lt;code&gt;initialize&lt;/code&gt; method.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class CdAlbum &amp;lt; Album
  attr_accessor :bar_code

  def initialize(t, i, code)
    super(t, i, 2)

    self.bar_code = code
  end
end
&lt;/pre&gt;
&lt;p&gt;While we&amp;#8217;re at it: if you write a module which is intended as mixin and needs initialization itself the initializer should simply pass on all arguments in order to be compatible with arbitrary inheritance chains:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
module AnotherMixin
  # Prepare internal state and pass on all
  # arguments to the super class
  def initialize(*a, &amp;amp;b)
    super
    @list = []
  end
end
&lt;/pre&gt;
&lt;h3&gt;Conversion to a printable String&lt;/h3&gt;
&lt;p&gt;Often it is desirable to be able to convert an instance to a human readable string.  Which representation is most appropriate depends on the uses of the class.  One policy is to create a string representation so the instance can be later reconstructed given this string as is the case for all the numeric types from the standard library.  In the case of our sample class we will provide all interesting information:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def to_s
    "Album '#{title}' by '#{interpret}' (#{tracks.size} tracks)"
  end
&lt;/pre&gt;
&lt;p&gt;If you want to reuse a string field as external representation you could copy it in order to avoid bad effects from aliasing.  However, since typically &lt;code&gt;to_s&lt;/code&gt; is invoked for printing and no references are held I would say most of the time it is safe to not explicitly copy the field in these cases.&lt;/p&gt;
&lt;h3&gt;Equivalence&lt;/h3&gt;
&lt;p&gt;There are two methods that deal with object equivalence &lt;code&gt;eql?&lt;/code&gt; and &lt;code&gt;==&lt;/code&gt;.  (Method &lt;code&gt;equal?&lt;/code&gt; tests for object &lt;em&gt;identity&lt;/em&gt; and should &lt;strong&gt;not&lt;/strong&gt; be overridden.)  Some core classes do have different equivalence relations implemented:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):003:0&amp;gt; 2 == 2.0
=&amp;gt; true
irb(main):004:0&amp;gt; 2.eql? 2.0
=&amp;gt; false
&lt;/pre&gt;
&lt;p&gt;But most of the time both methods will implement the same equivalence relation.  This also helps avoid confusion.  Note that &lt;code&gt;eql?&lt;/code&gt; is special because it is used by &lt;code&gt;Hash&lt;/code&gt; and &lt;code&gt;Set&lt;/code&gt; to test for instance equivalence.  We will come to that in a minute when we look at hash code calculation.&lt;/p&gt;
&lt;p&gt;Equivalence of instances must be tested against significant fields and should ignore redundant fields.  Looking at derived field values does not add anything to equivalence and makes the process slower at best.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def eql?(album)
    self.class.equal?(album.class) &amp;amp;&amp;amp;
      title == album.title &amp;amp;&amp;amp;
      interpret == album.interpret &amp;amp;&amp;amp;
      tracks == album.tracks
  end

  alias == eql?
&lt;/pre&gt;
&lt;p&gt;Now here you see a test for class identity which you likely have not seen in other classes.  Why is it there?  &lt;a href="http://en.wikipedia.org/wiki/Equivalence_relation"&gt;Mathematical equivalence&lt;/a&gt; is a symmetric relation which means that if &lt;code&gt;a.eql? b&lt;/code&gt; returns &lt;code&gt;true&lt;/code&gt; so should &lt;code&gt;b.eql? a&lt;/code&gt;.  Implementing &lt;code&gt;eql?&lt;/code&gt; and &lt;code&gt;==&lt;/code&gt; that way also helps prevent strange effects when working with &lt;code&gt;Hash&lt;/code&gt; instances.  So if you are going to compare two instances of a class and a subclass then you might get &lt;code&gt;true&lt;/code&gt; when called on the super class instance and false on the subclass instance.  The same happens for two completely unrelated classes where the set of fields of one instance is a subset of those of the other instance.  You may actually be tricked into thinking they are equivalent because the common fields are equivalent while the two instances represent completely different concepts.  The only way to remedy this is to check for identity of the class.&lt;/p&gt;
&lt;p&gt;The reason that omitting the class identity check does not cause issues most of the time is simple: usually you stuff uniform instances into a &lt;code&gt;Hash&lt;/code&gt; or &lt;code&gt;Set&lt;/code&gt; and even if you mix different classes most of the time they will have different fields and different field values.  Still it is good to remember the point in case you experience unexpected effects with &lt;code&gt;Hash&lt;/code&gt; keys.&lt;/p&gt;
&lt;p&gt;Note that I omitted the test for self identity which you might be used to from Java classes.  I believe this is a premature optimization because most of the time you are going to test different instances for equivalence so most of the time you pay the penalty of the failing identity check and win only in rare circumstances.&lt;/p&gt;
&lt;h3&gt;Hash Code Calculation&lt;/h3&gt;
&lt;p&gt;This topic is closely related to instance equivalence: classes &lt;code&gt;Hash&lt;/code&gt;, &lt;code&gt;Array&lt;/code&gt;, &lt;code&gt;Set&lt;/code&gt; and other core classes rely on the fact that equivalent instances also return the same hash code.  Note that this is not symmetric: instances wich have the same hash code may actually not be equivalent.  But if the hash code differs they must not be equivalent.&lt;/p&gt;
&lt;p&gt;An instance&amp;#8217;s hash code should be based on the same fields that are used for determining equivalence.  Our class &lt;code&gt;Album&lt;/code&gt; has more fields and it is advisable to do some bit operations (often involving &lt;span class="caps"&gt;XOR&lt;/span&gt;) to combine all member hash codes into a single value in order to ensure better diversity of these values.  If you base the hash code only on a single field you increase the likelyhood that non equivalent instances fall into the same bucket of the &lt;code&gt;Hash&lt;/code&gt; which makes additional equivalence checks via &lt;code&gt;eql?&lt;/code&gt; necessary.  You can find more about how hash tables work on &lt;a href="http://en.wikipedia.org/wiki/Hash_table"&gt;Wikipedia&lt;/a&gt;.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def hash
    title.hash ^
      interpret.hash ^
      tracks.hash
  end
&lt;/pre&gt;
&lt;h3&gt;Comparability&lt;/h3&gt;
&lt;p&gt;Many classes have a natural ordering such as integers which are ordered by their numeric value.  If your class does also have a natural order you can implement operator &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; and include module &lt;code&gt;Comparable&lt;/code&gt; to get implementations of &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt; etc. for free.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  include Comparable

  def &amp;lt;=&amp;gt;(o)
    self.class == o.class ?
      (interpret &amp;lt;=&amp;gt; o.interpret).nonzero? ||
      (title &amp;lt;=&amp;gt; o.title).nonzero? ||
      (tracks &amp;lt;=&amp;gt; o.tracks).nonzero? ||
      (pause &amp;lt;=&amp;gt; o.pause) || 
      0 :
      nil
  end
&lt;/pre&gt;
&lt;h3&gt;Cloning&lt;/h3&gt;
&lt;p&gt;When cloning or duping an instance the default mechanism sets all fields of the new instance to refer to the same objects as the cloned instance.  While this is not an issue with immutable instances (e.g. &lt;code&gt;Fixnum&lt;/code&gt; or instances which are frozen) bad things can happen if you have a mutable instance (for example &lt;code&gt;Array&lt;/code&gt; or &lt;code&gt;String&lt;/code&gt;) which is suddenly referenced by two instances which both believe they are the sole owner of it.  All bad effects can happen including violation of class invariants and the instance state likely becomes inconsistent.  Do deal with such cases even for shallow copies such as &lt;code&gt;#clone&lt;/code&gt; and &lt;code&gt;#dup&lt;/code&gt; you need to copy a bit more.  Fortunately Ruby provides a hook (&lt;code&gt;#initialize_copy&lt;/code&gt;) which is invoked after the instance has been copied and which can make appropriate adjustments.  In our case we only need top copy the &lt;code&gt;Array&lt;/code&gt; of &lt;code&gt;Track&lt;/code&gt; instances because all other fields are immutable:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def initialize_copy(source)
    super
    @tracks = @tracks.dup
  end
&lt;/pre&gt;
&lt;p&gt;There is a &lt;a href="http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/8a24226a54e79d24/9667453664f2633d#9667453664f2633d"&gt;recent discussion on comp.lang.ruby&lt;/a&gt; that started out with the subject of deep cloning but uncovered some general aspects of cloning.&lt;/p&gt;
&lt;h3&gt;Freezing&lt;/h3&gt;
&lt;p&gt;For freezing similar reasoning applies as for cloning: immutable fields need no additional attention as you cannot change those objects anyway.  For others you need to decide how deep you want the freeze to go.  In case of class &lt;code&gt;Album&lt;/code&gt; we certainly want to prevent addition of more tracks after an album has been frozen.  We also explicitly trigger calculation of &lt;code&gt;duration&lt;/code&gt; so to avoid errors when accessing the duration on the frozen instance; the value cannot change any more anyway.  This also makes it important to invoke &lt;code&gt;super&lt;/code&gt; as last method.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def freeze
    # unless frozen?
    @tracks.freeze 
    duration
    # end

    super
  end
&lt;/pre&gt;
&lt;p&gt;Note that the code is robust against multiple invocations of &lt;code&gt;#freeze&lt;/code&gt; because duration is calculated only once (on first transition from unfrozen to frozen).  If you have more complex calculations going on that involve state changes of the instance you should place &lt;code&gt;unless frozen?&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; around the custom freeze code.&lt;/p&gt;
&lt;h3&gt;Custom Persistence&lt;/h3&gt;
&lt;p&gt;If you want to serialize the complete state of your instance there is nothing more to do: you can simply use &lt;code&gt;Marshal&lt;/code&gt; and &lt;code&gt;YAML&lt;/code&gt; from scratch.  However, if you have redundant data (such as field &lt;code&gt;duration&lt;/code&gt; in our case) which you want to omit from the serialization you have to adjust the process yourself.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;Marshal&lt;/code&gt; the proper approach is to use the newer approach which invoves writing methods &lt;code&gt;marshal_dump&lt;/code&gt; and &lt;code&gt;marshal_load&lt;/code&gt;.  The former is supposed to return something which is serialized instead of the current instance and the latter is invoked on a new empty object and handed the deserialized object so fields can be initialized properly.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def marshal_dump
    a = (super rescue [])
    a.push(@title, @interpret, @pause, @tracks)
  end

  def marshal_load(dumped)
    super rescue nil
    @title, @interpret, @pause, @tracks = *dumped.shift(4)
  end
&lt;/pre&gt;
&lt;p&gt;For &lt;span class="caps"&gt;YAML&lt;/span&gt; it is even simpler: you basically just need to ensure &lt;code&gt;to_yaml_properties&lt;/code&gt; returns a list of symbols containing only those members that you want serialized.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def to_yaml_properties
    a = super
    a.delete :@duration
    a
  end
&lt;/pre&gt;
&lt;p&gt;Both approaches do have their own strengths: with &lt;code&gt;Marshal&lt;/code&gt; it is easier to completely replace an object with something else.  While that can be done with &lt;code&gt;YAML&lt;/code&gt; as well it is not as simple and elegant as in the case of &lt;code&gt;Marshal&lt;/code&gt;.  &lt;code&gt;YAML&lt;/code&gt; on the other hand shines because you need to override only a single method.&lt;/p&gt;
&lt;p&gt;One word about the implementations: while these methods could look simpler I picked an approach which also works when inheritance comes into play.  You can repeat the pattern of &lt;code&gt;marshal_dump&lt;/code&gt;, &lt;code&gt;marshal_load&lt;/code&gt; and &lt;code&gt;to_yaml_properties&lt;/code&gt; throughout a class hierarchy and still have each method only deal with fields of the class in which it is defined.  That makes it easier to deal with later additions or removals of fields in some class in the hierarchy.  This is even more so important when dealing with mixin modules, which can happen on a per instance basis (via &lt;code&gt;#extend&lt;/code&gt;).&lt;/p&gt;
&lt;h3&gt;Matching&lt;/h3&gt;
&lt;p&gt;I use the term &amp;#8220;matching&amp;#8221; for the functionality of the three equals operator (&lt;code&gt;===&lt;/code&gt;) and for the matching operator (&lt;code&gt;=~&lt;/code&gt;).  The first is used in &lt;code&gt;case&lt;/code&gt; statements and with &lt;code&gt;Array#grep&lt;/code&gt; while the latter is usually used only explicitly.&lt;/p&gt;
&lt;p&gt;The semantic is completely up to the class at hand and there are no general guidelines that could be given.  Implement it when it is reasonable for your class.  If you want to elegantly use instances of your class in &lt;code&gt;case&lt;/code&gt; expressions you have to implement &lt;code&gt;===&lt;/code&gt; doing something meaningful.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # Check whether we have that track or track title by title.
  def ===(track_or_title)
    t = (track_or_title.title rescue track_or_title)
    tracks.find {|tr| t == tr.title}
  end

  # Match title against regular expression.
  def =~(rx)
    rx =~ title
  end
&lt;/pre&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Today we looked at a class which implements various common aspects that you will meet over and over again when coding Ruby.  Many of these are in fact present in other programming languages as well: Java has serialization, &lt;code&gt;equals&lt;/code&gt; and hash code calculation.  C++ also has an equivalence operator and other operators that can be overloaded to implement ordering etc.  You can find the &lt;a href="http://gist.github.com/217641"&gt;full code of the class&lt;/a&gt; at github.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sat, 24 Oct 2009 18:05:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html</guid></item><item><title>Structs inside out</title><description>&lt;p&gt;Today we&amp;#8217;re back to normal blog mode, where each article stands for itself.  Muppet Labs are closed and we will be continuing our journey across the Ruby universe starting with an indepth look at Ruby&amp;#8217;s &lt;code&gt;Struct&lt;/code&gt; class &amp;#8212; Ruby&amp;#8217;s Swiss army knife for structured data.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Struct&lt;/code&gt; can be used without any additional &lt;code&gt;require&lt;/code&gt; statement &amp;#8212; it&amp;#8217;s just there.  This means it comes with zero additional overhead during initial interpreter startup &amp;#8212; one of the many advantage of using &lt;code&gt;Struct&lt;/code&gt;.  But first let&amp;#8217;s look at the basics.&lt;/p&gt;
&lt;h3&gt;Data Container&lt;/h3&gt;
&lt;p&gt;Class &lt;code&gt;Struct&lt;/code&gt; is a strange beast if you are confronted with it the first time: it has a method &lt;code&gt;new&lt;/code&gt; like other classes.  But that method does not return an instance of &lt;code&gt;Struct&lt;/code&gt; but rather a class which is a subclass of class &lt;code&gt;Struct&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; pnt = Struct.new :x, :y
=&amp;gt; #&amp;lt;Class:0x101c3d68&amp;gt;
irb(main):002:0&amp;gt; pnt.class
=&amp;gt; Class
irb(main):003:0&amp;gt; pnt.ancestors
=&amp;gt; [#&amp;lt;Class:0x101c3d68&amp;gt;, Struct, Enumerable, Object, Kernel, BasicObject]
&lt;/pre&gt;
&lt;p&gt;Arguments to &lt;code&gt;Struct.new&lt;/code&gt; are names of fields that the generated class will have.  These names must be provided as &lt;code&gt;Symbols&lt;/code&gt; which makes sense, because these field names are basically identifier you hardcode into the program.  (I will explain later why this is an accurate assessment when I look at &lt;code&gt;Struct&lt;/code&gt; vs. &lt;code&gt;OpenStruct&lt;/code&gt; vs. &lt;code&gt;Hash&lt;/code&gt;.)  You must provide them as &lt;code&gt;Symbols&lt;/code&gt; because of another peculiarity of &lt;code&gt;Struct&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; Struct.constants
=&amp;gt; [:Tms]
irb(main):002:0&amp;gt; s = Struct.new "Point", :x, :y
=&amp;gt; Struct::Point
irb(main):003:0&amp;gt; Struct.constants
=&amp;gt; [:Tms, :Point]
irb(main):004:0&amp;gt; t = Struct.new "doesNotWork", :x, :y
NameError: identifier doesNotWork needs to be constant
        from (irb):5:in `new'
        from (irb):5
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
&lt;/pre&gt;
&lt;p&gt;I have to say I never used this feature of &lt;code&gt;Struct&lt;/code&gt; because&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;I don&amp;#8217;t want all my custom classes in a single namespace because that leads to potential issues,&lt;/li&gt;
	&lt;li&gt;I prefer explicit assignment to constants which makes clearer what&amp;#8217;s happening.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If someone knows advantages of this auto constantification which I overlooked please let us know via the &lt;a href="#disqus_thread"&gt;comment function&lt;/a&gt;.  Otherwise I suggest to rely on standard constant magic if you want to name your &lt;code&gt;Struct&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):053:0&amp;gt; pnt.name
=&amp;gt; nil
irb(main):054:0&amp;gt; Point = pnt
=&amp;gt; Point
irb(main):055:0&amp;gt; pnt.name
=&amp;gt; "Point"
&lt;/pre&gt;
&lt;h3&gt;Behavior&lt;/h3&gt;
&lt;p&gt;A &lt;code&gt;Struct&lt;/code&gt; generated Ruby class is more similar to a C++ struct than to a C struct because it does not only carry data but also functionality.  Some of that is already predefined when &lt;code&gt;Struct.new&lt;/code&gt; returns:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; pnt = Struct.new :x, :y
=&amp;gt; #&amp;lt;Class:0x101c2fa0&amp;gt;
irb(main):002:0&amp;gt; puts pnt.instance_methods(false)
x
x=
y
y=
=&amp;gt; nil
irb(main):003:0&amp;gt; puts Struct.instance_methods(false)
==
eql?
hash
to_s
inspect
to_a
values
size
length
each
each_pair
[]
[]=
select
values_at
members
=&amp;gt; nil
&lt;/pre&gt;
&lt;p&gt;These can be devided into two categories:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;methods which originally deal with struct members.&lt;/li&gt;
	&lt;li&gt;methods that allow an instance to mimic behavior of other classes.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Struct Member Behavior&lt;/h4&gt;
&lt;p&gt;First of all there are attribute accessors (those defined in the &lt;code&gt;Struct&lt;/code&gt; generated class, irb listing position 2).  Then there are those methods that make a &lt;code&gt;Struct&lt;/code&gt; instance usable as &lt;code&gt;Hash&lt;/code&gt; key:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;eql?&lt;/li&gt;
	&lt;li&gt;hash&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are defined in such a way that two &lt;code&gt;Struct&lt;/code&gt; instances are equivalent if all of their fields are:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):014:0&amp;gt; a = pnt.new 1,2
=&amp;gt; #&amp;lt;struct x=1, y=2&amp;gt;
irb(main):015:0&amp;gt; b = pnt.new 1,2
=&amp;gt; #&amp;lt;struct x=1, y=2&amp;gt;
irb(main):016:0&amp;gt; a.eql? b
=&amp;gt; true
irb(main):017:0&amp;gt; a.hash
=&amp;gt; -1066353017
irb(main):018:0&amp;gt; b.hash
=&amp;gt; -1066353017
&lt;/pre&gt;
&lt;p&gt;Method &lt;code&gt;==&lt;/code&gt; also implements a similar equivalence relation but with a twist:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):019:0&amp;gt; b = pnt.new 1.0,2
=&amp;gt; #&amp;lt;struct x=1.0, y=2&amp;gt;
irb(main):020:0&amp;gt; a.eql? b
=&amp;gt; false
irb(main):021:0&amp;gt; a == b
=&amp;gt; true
&lt;/pre&gt;
&lt;p&gt;It uses method &lt;code&gt;==&lt;/code&gt; of field values internally in the same way as &lt;code&gt;eql?&lt;/code&gt; uses &lt;code&gt;eql?&lt;/code&gt; of field values.  By providing these methods &lt;code&gt;Struct&lt;/code&gt; can save you a lot of tedious typing of pretty dull code that you would hack manually otherwise.  As we all know, the less code we have to write the less errors we can do&amp;#8230;&lt;/p&gt;
&lt;h4&gt;Mimicry&lt;/h4&gt;
&lt;p&gt;A &lt;code&gt;Struct&lt;/code&gt; can &amp;#8212; within certain limits &amp;#8212; mimic an &lt;code&gt;Array&lt;/code&gt; as well as a &lt;code&gt;Hash&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
# Array alike
irb(main):027:0&amp;gt; a.to_a
=&amp;gt; [1, 2]
irb(main):028:0&amp;gt; a[0]
=&amp;gt; 1
irb(main):029:0&amp;gt; a[1]
=&amp;gt; 2
irb(main):030:0&amp;gt; a.values_at 1,0
=&amp;gt; [2, 1]
irb(main):031:0&amp;gt; a.length
=&amp;gt; 2
irb(main):032:0&amp;gt; a.size
=&amp;gt; 2
irb(main):033:0&amp;gt; a.each {|e| p e}
1
2
=&amp;gt; #&amp;lt;struct x=1, y=2&amp;gt;
irb(main):034:0&amp;gt; a.select {|e| e % 2 == 0}
=&amp;gt; [2]
irb(main):035:0&amp;gt; a &amp;lt;&amp;lt; "more" # of course not
NoMethodError: undefined method `&amp;lt;&amp;lt;' for #&amp;lt;struct x=1, y=2&amp;gt;
        from (irb):41
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'

# Hash alike
irb(main):036:0&amp;gt; a[:x]
=&amp;gt; 1
irb(main):037:0&amp;gt; a["x"]
=&amp;gt; 1
irb(main):038:0&amp;gt; a.each_pair {|k,v| printf "%p =&amp;gt; %p\n",k,v}
:x =&amp;gt; 1
:y =&amp;gt; 2
=&amp;gt; #&amp;lt;struct x=1, y=2&amp;gt;
irb(main):039:0&amp;gt; a.keys # not quite
NoMethodError: undefined method `keys' for #&amp;lt;struct x=1, y=2&amp;gt;
        from (irb):38
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
irb(main):040:0&amp;gt; a.members
=&amp;gt; [:x, :y]
irb(main):041:0&amp;gt; a[:y]=123
=&amp;gt; 123
irb(main):042:0&amp;gt; a
=&amp;gt; #&amp;lt;struct x=1, y=123&amp;gt;
&lt;/pre&gt;
&lt;p&gt;Btw. the generated class also exposes some interesting methods, which do not really need additional explanation:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):049:0&amp;gt; pnt.members
=&amp;gt; [:x, :y]
irb(main):050:0&amp;gt; pnt[10,20] # for lazy typers
=&amp;gt; #&amp;lt;struct x=10, y=20&amp;gt;
&lt;/pre&gt;
&lt;h4&gt;Custom Behavior&lt;/h4&gt;
&lt;p&gt;From time to time you will want additional methods in your &lt;code&gt;Struct&lt;/code&gt; class.  Well, there is a &amp;#8220;hidden&amp;#8221; feature &amp;#8212; the documentation does not mention it but you can pass a block to &lt;code&gt;Struct.new&lt;/code&gt; which will be used as class body:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; Point = Struct.new :x, :y do
irb(main):002:1*   def distance(point)
irb(main):003:2&amp;gt;     Math.sqrt((point.x - self.x) ** 2 +
irb(main):004:3*       (point.y - self.y) ** 2)
irb(main):005:2&amp;gt;   end
irb(main):006:1&amp;gt; end
=&amp;gt; Point
irb(main):007:0&amp;gt; Point[3,4].distance Point[0,0]
=&amp;gt; 5.0
&lt;/pre&gt;
&lt;p&gt;This obsoletes the frequently seen idiom which uses inheritance from a &lt;code&gt;Struct&lt;/code&gt; generated class to add more methods:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):001:0&amp;gt; class Point &amp;lt; Struct.new(:x, :y)
irb(main):002:1&amp;gt;   def distance(point)
irb(main):003:2&amp;gt;     Math.sqrt((point.x - self.x) ** 2 +
irb(main):004:3*       (point.y - self.y) ** 2)
irb(main):005:2&amp;gt;   end
irb(main):006:1&amp;gt; end
=&amp;gt; nil
irb(main):007:0&amp;gt; Point[3,4].distance Point[0,0]
=&amp;gt; 5.0
&lt;/pre&gt;
&lt;p&gt;When using inheritance like this you are wasting resources (a class in this case) and so far I see only one advantage: changing the constructor is a bit easier because the old implementation is available via &lt;code&gt;super&lt;/code&gt; without having to resort to &lt;code&gt;alias&lt;/code&gt; or reimplementing member initialization.  If you want to have multiple subclasses of a &lt;code&gt;Struct&lt;/code&gt; class then inheritance is OK of course.  But in that case I would rather make the super class a named class by assigning it to a constant and use that later when subclassing.&lt;/p&gt;
&lt;h3&gt;Struct vs. OpenStruct vs. Hash&lt;/h3&gt;
&lt;p&gt;All three classes offer functionality with some similarities and overlap.  Sometimes you can use them interchangeably.  Still there are some guidelines you can use to decide when to use which of these.  As always &amp;#8212; take these recommendations with a grain of salt.  If you have your own different rules of thumb let us know.&lt;/p&gt;
&lt;p&gt;Use &lt;code&gt;Struct&lt;/code&gt; if&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;you need a data container and fields are fixed and known beforehand (i.e. at time of writing of the code),&lt;/li&gt;
	&lt;li&gt;you need a structured &lt;code&gt;Hash&lt;/code&gt; key,&lt;/li&gt;
	&lt;li&gt;you want to quickly define a class with a few fields,&lt;/li&gt;
	&lt;li&gt;you need to detect errors caused by misspelled field names.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use &lt;code&gt;OpenStruct&lt;/code&gt; if&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;the number of fields is fixed at runtime but varies frequently during development time (this is often the case for objects that should hold the result of command line option parsing),&lt;/li&gt;
	&lt;li&gt;you need a mock or a want to quickly have objects at hand which can be filled via usual attribute setters and getters.  You might want to replaced with a proper class (or &lt;code&gt;Struct&lt;/code&gt;) later &amp;#8212; with explicit attribute declarations via &lt;code&gt;attr_accessor&lt;/code&gt; and relatives.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use &lt;code&gt;Hash&lt;/code&gt; if&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;the number of fields is unknown at coding time,&lt;/li&gt;
	&lt;li&gt;there is a potentially unlimited number of fields (e.g. when reading key values from a file as is often the case for script based text processing).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now it also becomes apparent why I indicated that &lt;code&gt;Symbols&lt;/code&gt; are appropriate for naming &lt;code&gt;Struct&lt;/code&gt; fields: when defining a &lt;code&gt;Struct&lt;/code&gt; you are actually declaring attributes much the same way as with an ordinary class.  Still, &lt;code&gt;Struct&lt;/code&gt; is a bit inconsistent here when it allows &lt;code&gt;Symbols&lt;/code&gt; and &lt;code&gt;Strings&lt;/code&gt; as keys for the &lt;code&gt;Hash&lt;/code&gt; like access via &lt;code&gt;[]&lt;/code&gt; and &lt;code&gt;[]=&lt;/code&gt;.  The convenience of being able to use both probably outweighs this inconsistency.  Overall that &lt;code&gt;Hash&lt;/code&gt; likelyness is probably one of the less important features of &lt;code&gt;Struct&lt;/code&gt;.  You are likely going to use it when refactoring from / to real &lt;code&gt;Hashes&lt;/code&gt; of if you have an application that has to deal with several &lt;code&gt;Structs&lt;/code&gt; in a uniform way and needs to use metadata (obtained via #members) to access fields.&lt;/p&gt;
&lt;h3&gt;Deficiencies, any?&lt;/h3&gt;
&lt;p&gt;My whishlist for &lt;code&gt;Struct&lt;/code&gt; is pretty short.  The only thing that I am missing is an equally quick and elegant way to add &lt;code&gt;Comparable&lt;/code&gt; functionality to a generated &lt;code&gt;Struct&lt;/code&gt; class.  That could be achieved by having something like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class Struct
  def self.comparable
    define_method :&amp;lt;=&amp;gt; do |o|
      members.each do |m|
        c = self[m] &amp;lt;=&amp;gt; o[m]
        return c unless c == 0
      end
      0
    end
    include Comparable
  end
end
&lt;/pre&gt;
&lt;p&gt;Now we can do&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):014:0&amp;gt; Point = Struct.new(:x, :y).comparable
irb(main):015:0&amp;gt; puts points = (1..5).map {Point[rand(4), rand(4)]}
#&amp;lt;struct Point x=3, y=3&amp;gt;
#&amp;lt;struct Point x=1, y=3&amp;gt;
#&amp;lt;struct Point x=2, y=3&amp;gt;
#&amp;lt;struct Point x=1, y=0&amp;gt;
#&amp;lt;struct Point x=0, y=0&amp;gt;
=&amp;gt; nil
irb(main):016:0&amp;gt; puts points.sort
#&amp;lt;struct Point x=0, y=0&amp;gt;
#&amp;lt;struct Point x=1, y=0&amp;gt;
#&amp;lt;struct Point x=1, y=3&amp;gt;
#&amp;lt;struct Point x=2, y=3&amp;gt;
#&amp;lt;struct Point x=3, y=3&amp;gt;
=&amp;gt; nil
&lt;/pre&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Today we looked at various aspects of Ruby&amp;#8217;s built in class &lt;code&gt;Struct&lt;/code&gt;.  &lt;code&gt;Struct&lt;/code&gt; allows you to create classes quickly (typically in a one liner) while providing a lot of useful functionality out of the box &amp;#8212; most notably it is the fastest way to get a class suitable as a &lt;code&gt;Hash&lt;/code&gt; key (apart from using one of the other built in classes like &lt;code&gt;String&lt;/code&gt;, &lt;code&gt;Symbol&lt;/code&gt; or even &lt;code&gt;Array&lt;/code&gt;).  If you haven&amp;#8217;t been using &lt;code&gt;Struct&lt;/code&gt; so far you might want to try it out in your next hack.  Enjoy less typing with more &lt;code&gt;Structs&lt;/code&gt;!&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 21 Sep 2009 21:48:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html</guid></item><item><title>Muppet Labs closing for now</title><description>&lt;h3&gt;Looking back&lt;/h3&gt;
&lt;p&gt;It is time to look back at the Muppet Laboratory series.  Here&amp;#8217;s my list of noteworthy items &amp;#8212; I will keep it rather short as interest in the series seems to have dwindled anyway:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;The first version with &lt;span class="caps"&gt;LRU&lt;/span&gt; caching integrated was way too complex and had ridiculous performance.  Continuous checking of interactions against the search criteria made the code complex and slow.  In this case the &lt;a href="http://en.wikipedia.org/w/index.php?title=Premature_optimization"&gt;premature optimization&lt;/a&gt; was worse than ineffective.&lt;/li&gt;
	&lt;li&gt;I am test lazy (I confess) but apparently this works remarkably well.  Note, that this does not mean I don&amp;#8217;t test.  But for my scripts I usually do not write &lt;a href="http://www.ruby-doc.org/stdlib/libdoc/test/unit/rdoc/classes/Test/Unit.html"&gt;unit tests&lt;/a&gt; or &lt;a href="http://rspec.info/"&gt;RSpecs&lt;/a&gt;.  (file under: Ruby Bad Practices &amp;#8212; fortunately the acronym is the same)&lt;/li&gt;
	&lt;li&gt;Overall the &lt;span class="caps"&gt;LRU&lt;/span&gt; approach works remarkably well at limiting memory usage while processing huge amounts of input data so I would say the plan succeeded mostly.&lt;/li&gt;
	&lt;li&gt;I found the idea to use block passed to a method as a class body especially interesting.  I will probably recycle this for other cases where I want to provide a framework (including command line parsing) which needs to be filled with user code.&lt;/li&gt;
	&lt;li&gt;Summertime is a bad time for a blog series &amp;#8212; author and readers are occupied with other things, mainly enjoying sunny outsides.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What did you consider most valuable for you?  What would you have done differently (software as well as blog series wise)?  I would be curious to learn what your thoughts are.&lt;/p&gt;
&lt;h3&gt;What&amp;#8217;s ahead?&lt;/h3&gt;
&lt;p&gt;The next article will be about the versatility of &lt;code&gt;Struct&lt;/code&gt;.  This is another &lt;a href="http://blog.rubybestpractices.com/posts/gregory/009-beautiful-blocks.html"&gt;swiss army knife&lt;/a&gt; of the lazy Ruby coder.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 07 Sep 2009 19:03:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/016-Muppet-Lab_close.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/016-Muppet-Lab_close.html</guid></item><item><title>Completing the Animal</title><description>&lt;h3&gt;Leftovers&lt;/h3&gt;
&lt;p&gt;Summertime was a bit tough &amp;#8212; but the end is near!  Today I will cover two major parts that were needed to complete the &amp;#8220;animal&amp;#8221;.  After that I will present lessons that I learned in this public blog software project.  Muppet Laboratories will then close their doors but &lt;a href="http://github.com/rklemme/muppet-laboratories/tree/master"&gt;the code&lt;/a&gt; will still be there.&lt;/p&gt;
&lt;p&gt;For a fully functioning Animal mainly two things were missing:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;command line parsing,&lt;/li&gt;
	&lt;li&gt;creation of a proper filter mechanism based on command line arguments.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We will first look at command line parsing.&lt;/p&gt;
&lt;h3&gt;Command Line Parsing&lt;/h3&gt;
&lt;p&gt;There is actually no rocket since involved whatsoever.  The most noteworthy piece of information is probably that you need to include &amp;#8216;optparse/time&amp;#8217; in order to be able to allow &lt;code&gt;OptionParser&lt;/code&gt; to parse time stamps passed as option arguments.  (Funny thing is, &lt;code&gt;OptionParser&lt;/code&gt; will also happily accept strings like &amp;#8220;now&amp;#8221; and parse them as timestamps.)&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
require 'ostruct'
require 'optparse'
require 'optparse/time'

  def self.parse_command_line(argv = ::ARGV) 
    o = OpenStruct.new(:output_dir =&amp;gt; '.')

    # parse
    OptionParser.new do |opts|
      opts.on '-d', '--dir=DIRECTORY', 'Output directory ' do |v|
        o.output_dir = v
      end

      opts.on '-r', '--rx=REGEXP', ::Regexp, 'Regular expression matched',
      'against log line text' do |v|
        o.rx = v
      end

      opts.on '-t', '--time=TIME', ::Time, 'timestamp' do |v|
        o.ts = v
      end

      opts.on '-s', '--start=TIME', ::Time, 'start timestamp' do |v|
        o.start_ts = v
      end

      opts.on '-e', '--end=TIME', ::Time, 'end timestamp' do |v|
        o.end_ts = v
      end

      opts.on '--ids=ID_LIST', ::Array, 'Comma separated list of interaction ids' do |v|
        (o.ids ||= Set.new).merge v
      end

      opts.on '--id-file=FILE', 'File with ids one per line',
       '(empty lines are ignored)' do |v|
        s = o.ids ||= Set.new

        File.foreach v do |line|
          line.strip!
          s &amp;lt;&amp;lt; line unless line == ''
        end
       end

      opts.on '--buffer=INTERACTIONS', ::Integer,
        'Max no. of interactions to keep in memory' do |v|
        o.max_size = v
        end

      opts.on_tail '-h', '--help', 'Print this help' do
        puts opts
        exit 0
      end
    end.parse! argv

    raise 'Only one of time or (start, end) allowed' if o.ts &amp;amp;&amp;amp; (o.start_ts || o.end_ts)
    raise 'Missing end timestamp' if o.start_ts &amp;amp;&amp;amp; !o.end_ts
    raise 'Missing start timestamp' if !o.start_ts &amp;amp;&amp;amp; o.end_ts

    o
  end
&lt;/pre&gt;
&lt;p&gt;I personally find &lt;code&gt;OptionParser&lt;/code&gt; very elegant and complete but other seem to prefer other command line parsing packages, like &lt;a href="http://ruby-doc.org/stdlib/libdoc/getoptlong/rdoc/index.html"&gt;GetoptLong&lt;/a&gt; and there are &lt;a href="http://totalrecall.wordpress.com/2008/09/05/command-line-parsing-choosing-a-ruby-library/"&gt;others around&lt;/a&gt; as well.  Main advantages of &lt;code&gt;OptionParser&lt;/code&gt; in my opinion:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Options, their documentation and their processing are all placed close to each other.&lt;/li&gt;
	&lt;li&gt;Built in support for conversion of argument strings to most basic types and even lists of values.&lt;/li&gt;
	&lt;li&gt;It is a part of the standard Ruby distribution, i.e you can safely assume that it is available.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Filter Creation&lt;/h3&gt;
&lt;p&gt;There are several approaches that can be taken on creating a bit of filter code from options taken from the command line &amp;#8212; all have different strengths and weaknesses:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Interpreting the option set on each filter invocation (slow but easy to implement).&lt;/li&gt;
	&lt;li&gt;Combination of written filter code with filter criteria values stuffed in a closure (faster than the first option but equally easy to implement).&lt;/li&gt;
	&lt;li&gt;Generation of filter code (usually fastest, can be a bit complex to implement).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I picked option 2 because of the speed advantage and the avoidance of &lt;a href="http://www.rubycentral.com/pickaxe/taint.html"&gt;safety issues&lt;/a&gt; that come with using &lt;code&gt;eval&lt;/code&gt;.  I also find it a bit inelegant to generate code but that&amp;#8217;s a purely aesthetic argument which I don&amp;#8217;t claim any substance for.  (But since we&amp;#8217;re into Ruby at least partly for the fun, that argument is not too far off the mark.)&lt;/p&gt;
&lt;p&gt;As interface I choose &lt;code&gt;lambda's&lt;/code&gt; square brackets.  So, here is the filter creation code:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
    def create_filter(opts)
      # store in local variables to ensure
      # ensure values do not get changed
      # and a small speed improvement
      o_ids = opts.ids
      o_ts = opts.ts
      o_start_ts = opts.start_ts
      o_end_ts = opts.end_ts
      o_rx = opts.rx

      f = case
          when o_ids
            warn 'WARNING: Ignoring time filters with ids given' if
            o_ts || o_start_ts || o_end_ts
            lambda {|ip| o_ids.include? ip.id}
          when o_ts
            lambda {|ip|
              ip.entries.first.time_stamp &amp;lt;= o_ts &amp;amp;&amp;amp;
                ip.entries.last.time_stamp &amp;gt;= o_ts
            }
          when o_start_ts &amp;amp;&amp;amp; o_end_ts
            lambda { |ip|
              ip.entries.last.time_stamp &amp;gt;= o_start_ts &amp;amp;&amp;amp; 
                ip.entries.first.time_stamp &amp;lt;= o_end_ts
            }
          end

      case
      when f &amp;amp;&amp;amp; o_rx
        lambda {|ip| f[ip] &amp;amp;&amp;amp; ip.entries.any? {|e| o_rx =~ e.line}}
      when o_rx
        lambda {|ip| ip.entries.any? {|e| o_rx =~ e.line}}
      when f
        f
      else
        YES
      end
    end
&lt;/pre&gt;
&lt;p&gt;As you can see there is a bit of prioritization going on: I choose to ignore time range options if also interaction ids are used as filter criterion.  Reason is that interactions are usually short and additional time based filters would either have to add more interactions to the result or remove interactions whose id was selected (depending on whether &amp;#8220;and&amp;#8221; or &amp;#8220;or&amp;#8221; combination was chosen).&lt;/p&gt;
&lt;p&gt;The text filter (regular expression really) on the other hand is &amp;#8220;and&amp;#8221; connected with the other filter, i.e. if the list of interactions selected by time or ids is large then it will be narrowed down through the text filter.&lt;/p&gt;
&lt;p&gt;One word about time filters: when given a single point in time, the test is pretty straightforward &amp;#8212; the first timestamp must be lower and the last timestamp must be larger than the test time, i.e. the interaction was active at the given time.  The test for a given time interval (start and end time given separately) may look a bit odd at first.  However, there is no typo involved.  The test ensures that the test interval and the interaction&amp;#8217;s time interval overlap.  Or put differently: the test ensures that all interactions are included that were active during the test interval.&lt;/p&gt;
&lt;h3&gt;Things left to do&lt;/h3&gt;
&lt;p&gt;For now I am pretty satisfied with the results so I do not feel a lot pressure to continue working on it.  However, there are a few things that could be done:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Cleanup of &lt;code&gt;require&lt;/code&gt; and deletion of source files that were not used.&lt;/li&gt;
	&lt;li&gt;Adding of statistics output like lines read, lines written, interactions found etc.&lt;/li&gt;
	&lt;li&gt;Profiling and optimization.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anything else?&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 24 Aug 2009 21:09:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/015-Completing_the_Animal.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/015-Completing_the_Animal.html</guid></item><item><title>A bit of Optimization</title><description>&lt;p&gt;As suggested &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/013-LRU-Explanations.html"&gt;earlier&lt;/a&gt; I did a bit of benchmarking and found these timings of the first version which kept everything in memory and the next version with two &lt;span class="caps"&gt;LRU&lt;/span&gt; storages (for InteractionProcessor and IO objects):&lt;/p&gt;
&lt;pre&gt;
v0.1

robert@fussel ~/Eigene Dateien/Projects/muppet-laboratories
$ time ruby19 bin/sample-animal.rb sample.log

real    11m53.226s
user    6m34.561s
sys     4m33.921s


v0.2.1

robert@fussel ~/Eigene Dateien/Projects/muppet-laboratories
$ time ruby19 bin/sample-animal.rb sample.log

real    25m40.842s
user    14m40.655s
sys     8m29.701s
&lt;/pre&gt;
&lt;p&gt;I then went on and refactored the code in these ways:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;removed &lt;span class="caps"&gt;LRU&lt;/span&gt; storage of &lt;code&gt;IO&lt;/code&gt; objects&lt;/li&gt;
	&lt;li&gt;removed state pattern and replaced it with a single filter test again&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The solution now resembles the first version quite a bit with the difference that the storage of &lt;code&gt;InteractionProcessor&lt;/code&gt; instances is now an &lt;code&gt;LRUHash&lt;/code&gt;.  The &lt;code&gt;InteractionProcessor&lt;/code&gt; now looks very simple &amp;#8212; again:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  class InteractionProcessor
    
    # mode for writing files
    OPEN_MODE = IO::WRONLY | IO::CREAT | IO::TRUNC

    attr_reader :id, :coord, :entries

    def initialize(id, coordinator)
      @id = id
      @coord = coordinator
      @entries = []
    end

    # Process the first line
    def process_initial(time_stamp, line)
      process(time_stamp, line)
    end

    # Process an initial line
    def process(time_stamp, line)
      @entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
    end

    # Append a continuation line to the last entry
    def append_line(line)
      @entries.last.line &amp;lt;&amp;lt; line
    end

    def finish
      if ! @entries.empty? &amp;amp;&amp;amp; @coord.filter[self]
        fn = file_name
        FileUtils.mkdir_p(File.dirname(fn))

        File.open(fn, OPEN_MODE) do |io|
          @entries.each {|e| io.puts(e.line)}
        end
      end
    end

    private

    # calculate the file name, this fails if
    # there are no entries!
    def file_name
      ts = @entries.first.time_stamp
      File.join(@coord.options.output_dir,
                ts.strftime('%Y-%m-%d'),
                ts.strftime('%H-%M'),
                ts.strftime('%S.%3N-') + id)
    end
  end
&lt;/pre&gt;
&lt;p&gt;And now this is the timing we get:&lt;/p&gt;
&lt;pre&gt;
v0.3

robert@fussel ~/Eigene Dateien/Projects/muppet-laboratories
$ time ruby19 bin/sample-animal.rb sample.log

real    7m29.546s
user    2m16.702s
sys     4m8.296s
&lt;/pre&gt;
&lt;p&gt;Now, this is even faster than the first version!  The difference seems to mainly be caused by a reduction in user time.  I would have expected at most a small change in system time for less paging since we limit the memory usage.  Maybe the difference is caused by the fact that the main &lt;code&gt;Hash's&lt;/code&gt; size does not increase beyond a fixed limit.&lt;/p&gt;
&lt;p&gt;I guess the &lt;span class="caps"&gt;LRU&lt;/span&gt; based storage of file handles qualified as premature optimization: apparently it wasn&amp;#8217;t necessary to get rid of individual log lines as fast as possible.  Instead, it makes more sense to treat an interaction as entity and wait with the processing until the interaction is complete (well, not really: until the &lt;span class="caps"&gt;LRU&lt;/span&gt; storage decides, that the interaction must be purged from memory).  That&amp;#8217;s a nice example how granularity of processing directly influences performance and application design.  In fact, treating an interaction as entity is probably also the &lt;a href="http://en.wikipedia.org/wiki/KISS_principle"&gt;most straightforward&lt;/a&gt; approach.  And it turns out to be more efficient than the much more complex state pattern logic.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sun, 09 Aug 2009 20:47:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/014-First_Optimization.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/014-First_Optimization.html</guid></item><item><title>LRU Integration explained</title><description>&lt;p&gt;Today I will present my reasoning which lead me to the implementation of the &lt;code&gt;LRUHash&lt;/code&gt; as well as how I integrated it into the main code.  We will look at how &lt;code&gt;LRUHash&lt;/code&gt; works and how it is integrated into the project.  Then I will answer some questions that were actually asked &amp;#8212; or only thought.&lt;/p&gt;
&lt;h3&gt;How &lt;code&gt;LRUHash&lt;/code&gt; works&lt;/h3&gt;
&lt;p&gt;The interface should resemble a &lt;code&gt;Hash&lt;/code&gt; as much as possible so that &lt;code&gt;LRUHash&lt;/code&gt; can be easily substituted for a &lt;code&gt;Hash&lt;/code&gt; instance without alteration of the code that uses it.  As long as the instance remains below the maximum size it behaves much the same as a regular &lt;code&gt;Hash&lt;/code&gt;.  Only when the size limit defined via &lt;code&gt;max_size&lt;/code&gt; is reached any subsequent store operations will remove the oldest entry.&lt;/p&gt;
&lt;p&gt;Internally a &lt;code&gt;LRUHash&lt;/code&gt; maintains a linked list of &lt;code&gt;LRUHash::Node&lt;/code&gt; instances and a &lt;code&gt;Hash&lt;/code&gt;.  Nodes contain key and value of a hash entry as well as links to their predecessor and successor.  The &lt;code&gt;Hash&lt;/code&gt; is used for fast access of individual nodes while the linked list is used to maintain information about access order: every node which is accessed is moved to the head of the list so the least recently used element is always at the tail.&lt;/p&gt;
&lt;p&gt;There are two nodes referenced as &lt;code&gt;head&lt;/code&gt; and &lt;code&gt;tail&lt;/code&gt; which are never changed.  This makes it easier to move nodes around in the list because extraction and insertion are always operations on inner nodes and do not need to account for the first and last element on the list which is tedious because then you would also have to change the first and last pointer.&lt;/p&gt;
&lt;p&gt;Additionally to the &lt;code&gt;default_proc&lt;/code&gt; which works exactly the same way as in &lt;code&gt;Hash&lt;/code&gt; there is a &lt;code&gt;release_proc&lt;/code&gt; which is invoked whenever an item is removed from the &lt;code&gt;LRUHash&lt;/code&gt; &amp;#8212; either via explicit delete operations like &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;delete_if&lt;/code&gt; and &lt;code&gt;clear&lt;/code&gt; or via the automated expiry which kicks in as soon as the &lt;code&gt;LRUHash&lt;/code&gt; reaches its maximum size.&lt;/p&gt;
&lt;h3&gt;How &lt;code&gt;LRUHash&lt;/code&gt; is used in the project&lt;/h3&gt;
&lt;p&gt;There are two &lt;code&gt;LRUHash&lt;/code&gt; instances in class &lt;code&gt;Coordinator&lt;/code&gt;:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;A storage for interaction processors,&lt;/li&gt;
	&lt;li&gt;A storage for &lt;code&gt;File&lt;/code&gt; objects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first instance is used to limit the overall number of interactions which are kept in memory.  Once the &lt;code&gt;LRUHash&lt;/code&gt; is filled and a new interaction needs to be stored the least recently used one is expired and removed.&lt;/p&gt;
&lt;p&gt;The second one is needed because &lt;code&gt;InteractionProcessor&lt;/code&gt; instances keep their &lt;code&gt;File&lt;/code&gt; object open and the limit for open file handles per process is usually much lower than the reasonable limit for interactions kept in memory.  So &lt;code&gt;File&lt;/code&gt; objects are stored in the &lt;code&gt;LRUHash&lt;/code&gt; and closed whenever necessary and reopened as well.&lt;/p&gt;
&lt;h3&gt;Why do you use &lt;code&gt;equal?&lt;/code&gt;?&lt;/h3&gt;
&lt;p&gt;There are no &amp;#8220;equivalent&amp;#8221; nodes in a single &lt;code&gt;LRUHash&lt;/code&gt; instance because they all have different keys.  Actually the concept of equivalence is not needed here; rather I just needed to check for &lt;em&gt;identity&lt;/em&gt; which is exactly what &lt;code&gt;equal?&lt;/code&gt; does.&lt;/p&gt;
&lt;h3&gt;Why did you not use PUPA&amp;#8217;s Ruby/Cache?&lt;/h3&gt;
&lt;p&gt;ged suggested in &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/012-LRU-Integration.html#comment-12710034"&gt;his comment&lt;/a&gt; the use of &lt;a href="http://www.nongnu.org/pupa/ruby-cache.html"&gt;PUPA&amp;#8217;s Ruby/Cache&lt;/a&gt;.  I had looked at it before cooking &lt;a href="http://github.com/rklemme/muppet-laboratories/blob/739b688ebe27284b8239d486ac067d2b305a6b87/lib/lruhash.rb"&gt;my own little version&lt;/a&gt; and decided against using it for several reasons:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;It did not include the &lt;a href="http://ruby-doc.org/core/classes/Hash.html#M002854"&gt;default_proc mechanism of Ruby&amp;#8217;s Hash&lt;/a&gt; although you can use &lt;a href="http://www.nongnu.org/pupa/ruby-cache-MANUAL.html#label:43"&gt;&lt;code&gt;fetch&lt;/code&gt;&lt;/a&gt; with a block the same way as in &lt;a href="http://ruby-doc.org/core/classes/Hash.html#M002849"&gt;Hash&lt;/a&gt;.&lt;/li&gt;
	&lt;li&gt;It uses the notion of &amp;#8220;object size&amp;#8221; (explained &lt;a href="http://www.nongnu.org/pupa/ruby-cache-MANUAL.html#label:7"&gt;here&lt;/a&gt;) as one of the limits of cache size.  Besides the problems with defining &amp;#8220;object size&amp;#8221; which I pointed to &lt;a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/242080"&gt;here&lt;/a&gt; and &lt;a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/329527"&gt;here&lt;/a&gt; the more crucial reason was the problem with updating the cache&amp;#8217;s idea of the current size of objects (either the overhead is significant, because every time the overall size limit is checked all objects need to be looked at or the cache cannot be aware of size changes if it caches object sizes).&lt;/li&gt;
	&lt;li&gt;It uses wall clock expiration time for cached objects &amp;#8212; something I did not have use for in my implementation.&lt;/li&gt;
	&lt;li&gt;The fun of cooking my own.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While items 2 and 3 could be remedied by using arbitrary large limits the overhead of calculating values would remain (at least the &lt;a href="http://www.nongnu.org/pupa/ruby-cache-MANUAL.html"&gt;documentation&lt;/a&gt; does not state that the overhead is saved when any of these limits are left out).&lt;/p&gt;
&lt;h3&gt;Performance Observations&lt;/h3&gt;
&lt;p&gt;While I did not actually measure timings the current version is significantly slower than the first draft version which kept everything in memory before writing out interactions to individual files.  I suspect that this may be caused by these factors:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Criterion evaluation is done multiple times per interaction (although it&amp;#8217;s still a dummy currently).&lt;/li&gt;
	&lt;li&gt;Frequent opening and closing of files caused by &lt;span class="caps"&gt;LRU&lt;/span&gt; handling of IO objects which is necessary because of the much lower limit of open file handles compared with interaction instances.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both were introduced to get rid of log file lines as soon as possible.  This also lead to a complex design (especially in the area of evaluation of the criterion).  The effect may change though if real criteria are used.  This is certainly something that I need to analyze.&lt;/p&gt;
&lt;p&gt;An improved version would probably get rid of the immediate writing of lines as soon as possible and defer that to the point in time when the interaction is removed from the &lt;span class="caps"&gt;LRU&lt;/span&gt; cache of interactions.  That way the criterion needs to be executed only once, can be made much simpler &amp;#8212; especially for complex criteria which need to look at multiple entries such as the time range criterion.  Also, we do not need to keep multiple files open.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Note: This shows how important it is to keep an open mind and to think again about the code you have written from time to time.  I have discovered quite a bit of quirks I put into my code by this.  Of course you can say that if you do it right the first time, this is never necessary &amp;#8212; but who would claim to write optimal code in the first attempt?&lt;/i&gt;&lt;/p&gt;
&lt;h3&gt;Finally, a Best Practice&lt;/h3&gt;
&lt;p&gt;During my vacation I read &lt;a href="http://en.wikipedia.org/wiki/Zen_and_the_Art_of_Motorcycle_Maintenance"&gt;Zen and the Art of Motorcycle Maintenance&lt;/a&gt; (you can &lt;a href="http://www.virtualschool.edu/mon/Quality/PirsigZen/"&gt;browse the text&lt;/a&gt; but I&amp;#8217;d rather read a real book).  I had read it some 15 years ago and wanted to go back to this fascinating text which provides so many perspectives and crystallization points for thought.&lt;/p&gt;
&lt;p&gt;Now, what does this have to do with Ruby?  The best practice here is: once in a while do something completely different.  It helps keep your mind flexible and will open it up to new approaches to old matters as well as inspire your creativity.  You will also notice, that while you&amp;#8217;re away from your everyday business your mind will actually continue to work on it which typically shows by suddenly having an idea that turns up when you least expect it.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Fri, 31 Jul 2009 16:33:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/013-LRU-Explanations.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/013-LRU-Explanations.html</guid></item><item><title>LRU Integration</title><description>&lt;p&gt;This will be just a short article since I am a bit tight on time right now.&lt;/p&gt;
&lt;p&gt;For those of you who might not have heard the term &amp;#8220;&lt;span class="caps"&gt;LRU&lt;/span&gt;&amp;#8221; means &amp;#8220;least recently used&amp;#8221;.  It refers to a &lt;a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used"&gt;replacement strategy&lt;/a&gt; used in caches.  For every cache with an upper limit on the number of elements cached it must be decided which element to remove from the cache once the cache gets full and another element should be put in the cache.  The decision can have dramatic impacts on the efficiency of the cache (its hit ratio).&lt;/p&gt;
&lt;p&gt;&lt;span class="caps"&gt;LRU&lt;/span&gt; is easy to implement and works pretty well in many cases so I picked that one.  &amp;#8220;&lt;span class="caps"&gt;LRU&lt;/span&gt;&amp;#8221; means, remove the element which has not been used for the longest time.  This is typically implemented using a doubly linked list because with that elements can be moved to the head very quickly (O(1)).  The algorithm works by moving every accessed element to the front of the list and deleting the last one when space is needed.&lt;/p&gt;
&lt;p&gt;I created a class with a &lt;code&gt;Hash&lt;/code&gt; like interface and an additional feature, a &lt;code&gt;release_proc&lt;/code&gt; which gets called with the removed entry&amp;#8217;s key and value.  That way we can automate the cleanup process easily.&lt;/p&gt;
&lt;p&gt;Internally the class has a Hash for fast access which has list nodes as values.  These are the main methods for &lt;span class="caps"&gt;LRU&lt;/span&gt; mechanism during read access.&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def fetch(key, &amp;amp;b)
    n = @h[key]

    if n
      # hit -&amp;gt; move to front
      front(n).value
    else
      (b || FETCH)[key]
    end
  end

  # move node to front
  def front(node)
    node.insert_after(@head)
  end
&lt;/pre&gt;
&lt;p&gt;Individual entries are of class &lt;code&gt;Node&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # A single node in the doubly linked LRU list of nodes
  Node = Struct.new :key, :value, :pred, :succ do
    def unlink
      pred.succ = succ if pred
      succ.pred = pred if succ
      self.succ = self.pred = nil
      self
    end

    def insert_after(node)
      raise 'Cannot insert after self' if equal? node
      return self if node.succ.equal? self

      unlink

      self.succ = node.succ
      self.pred = node

      node.succ.pred = self if node.succ
      node.succ = self

      self
    end
  end
&lt;/pre&gt;
&lt;p&gt;And here&amp;#8217;s the code for the removal of old entries:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  def store(key, value)
    # same optimization as in Hash
    key = key.dup.freeze if String === key &amp;amp;&amp;amp; !key.frozen?

    n = @h[key]

    unless n
      if size == max_size
        # reuse node to optimize memory usage
        n = delete_oldest
        n.key = key
        n.value = value
      else
        n = Node.new key, value
      end

      @h[key] = n
    end

    front(n).value = value
  end

  # remove the node and invoke the cleanup proc
  # if set
  def remove_node(node)
    n = @h.delete(node.key)
    n.unlink
    release_proc and release_proc[n.key, n.value]
    n
  end

  # remove the oldest node returning the node
  def delete_oldest
    n = @tail.pred
    raise "Cannot delete from empty hash" if @head.equal? n
    remove_node n
  end
&lt;/pre&gt;
&lt;p&gt;You can see the whole story in the &lt;a href="http://github.com/rklemme/muppet-laboratories/tree/master"&gt;git repo&lt;/a&gt; where the &lt;code&gt;LRUHash&lt;/code&gt; is also integrated into the main animal code.  Class &lt;code&gt;InteractionProcessor&lt;/code&gt; has changed a bit as well as &lt;code&gt;Coordinator&lt;/code&gt;.  Some places are still a bit inelegant but that will have to wait until I have a bit more time.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Fri, 10 Jul 2009 21:07:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/012-LRU-Integration.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/012-LRU-Integration.html</guid></item><item><title>Animal Interaction Processing</title><description>&lt;p&gt;As stated I intend to &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/008-First_Design_Considerations.html#comment-10936141"&gt;exploit locality of interaction log messages&lt;/a&gt; by &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/009-Shadow_of_the_Animal.html"&gt;using &lt;span class="caps"&gt;LRU&lt;/span&gt;&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used"&gt;cache algorithm&lt;/a&gt; and handle log lines as efficiently as possible.  The general concept looks like this:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Whenever a new interaction is seen in the log file, create a new &lt;code&gt;InteractionProcessor&lt;/code&gt; for it and put it in some storage which allows for fast access.&lt;/li&gt;
	&lt;li&gt;Any further processing of log lines read for a particular interaction is dealt with by that &lt;code&gt;InteractionProcessor&lt;/code&gt; instance.&lt;/li&gt;
	&lt;li&gt;The storage should have an upper limit on the number of entries and expire entries via a &lt;a href="http://en.wikipedia.org/wiki/Cache_algorithms"&gt;cache algorithm&lt;/a&gt; in order to avoid overusing memory.&lt;/li&gt;
	&lt;li&gt;To further reduce memory usage, write log lines of an interaction to the corresponding output file as soon as possible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It should be immediately clear that this confronts us with a number of challenges we will now have to look at.&lt;/p&gt;
&lt;h3&gt;Challenge: Early Expiry&lt;/h3&gt;
&lt;p&gt;Considering that we do not have a means to detect the end of an interaction we can only rely on the expiry algorithmof our &lt;code&gt;InteractionProcessor&lt;/code&gt; storage.  This means, there is always the chance that an &lt;code&gt;InteractionProcessor&lt;/code&gt; is evicted from the storage although there are more log records to come for this interaction.  This situations becomes more likely if&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;the size limit of the storage is reduced,&lt;/li&gt;
	&lt;li&gt;the number of interactions which are active at one point in time increases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The latter can be caused by high activity on the original system (many interactions are started in a time interval) or interactions having huge gaps (making them live longer).&lt;/p&gt;
&lt;p&gt;Now, what does it mean for our processing?  Since we can only detect the initial line of an interaction through the fact that there is not yet an interaction processor for this interaction, we may be in a situation that we believe this is a new interaction while in fact we have seen it already.  This may lead to wrong filtering results (for example, if the beginning timestamp of this interaction needs to be evaluated for filtering and the difference between the real first timestamp and the second &amp;#8220;first&amp;#8221; timestamp makes a difference).&lt;/p&gt;
&lt;p&gt;There are a few things that we can do:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Check the filesystem for this interaction and read previous log records into memory before continuing processing this interaction.&lt;/li&gt;
	&lt;li&gt;Remember all interaction ids along with their initial timestamps or with the file name to know whether an interaction has been seen already and to find it efficiently in the filesystem to read previous records (see 1).&lt;/li&gt;
	&lt;li&gt;Nothing.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Option 1 is very inefficient, because we do not know the initial timestamp and so must potentially search multiple directories.  Option 2 is dangerous because it does not impose any limits on the storage needed for interaction ids with their timestamps.  That leaves us with option 3 &amp;#8211; a seemingly bad choice on first sight.&lt;/p&gt;
&lt;p&gt;To comfort you let&amp;#8217;s try to find out &lt;em&gt;how&lt;/em&gt; dangerous it actually is to do nothing about it.  Assuming for the moment that &lt;code&gt;test-gen.rb&lt;/code&gt; has a realistic model of log files we will see in practice.  Now let&amp;#8217;s also hope I get the math right&amp;#8230;  Looking into the source code, you will see that the average time interval between two lines of a single interaction is 5 seconds:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
Timer = Struct.new :t do
  def tic
    self.t += rand(10_000) / 1000.0
  end
end
&lt;/pre&gt;
&lt;p&gt;Furthermore we see that there are 1000 new interactions per minute (variable &lt;code&gt;commands&lt;/code&gt;) or, put differently, a new interaction starts every 60ms.  Furthermore there are on average 5.5 (2 + 3.5) log records per interaction (note, I fixed the time distribution of start times which wasn&amp;#8217;t uniform in the first version of the generator):&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
while t &amp;lt; end_time
  # create new entries
  commands.times do
    id = generate_id
    tx.t = t + rand(60_000) / 1000.0

    entries.add(Entry.create(tx.t, id, 'START'))

    (2 + rand(8)).times do
      pick = rand(10)
      msg = pick == 0 ? MESSAGES.last : MESSAGES[pick % 2]
      entries.add(Entry.create(tx.tic, id, msg))
    end

    entries.add(Entry.create(tx.tic, id, 'END'))
  end

  # write entries of this minute
  entries.print t
  t += 60
end
&lt;/pre&gt;
&lt;p&gt;This means, the average life span of an interaction is 22.5 seconds (5 sec * (5.5 &amp;#8211; 1)).  So now, continuing with our average calculation, this means that there are roughly 375 new interactions in a 22.5 second interval (22.5 sec / 60ms).  So, for the average case keeping 400 &lt;code&gt;InteractionProcessors&lt;/code&gt; in memory is sufficient.  Of course we need to take into accound that this is just an average calculation and also does not cover temporary load surges but if we apply factor 100 (i.e. 40,000 storage limit) this still doesn&amp;#8217;t look like it will blast system memory and we should be safe for most situations (unfortunately my stochastic skills are totally rusty today, so if someone cares to calculate the likelyhood of failure that would be interesting to see).  Assuming that we need 0.5 KB per &lt;code&gt;InteractionProcessor&lt;/code&gt; because of log lines that need to be kept in memory we&amp;#8217;re at about 20 MB which isn&amp;#8217;t a lot of memory these days.&lt;/p&gt;
&lt;p&gt;To sum it up: with the formula &lt;code&gt;(avg entries - 1) * avg interval * new IA per minute / 60s&lt;/code&gt; we get the average number of interactions active at a time.  It turns out that for our sample log generator this is 375 which leaves a lot of headroom for the interaction processor storage.  Without diving into stochastics too much it is obvious that we are pretty safe if we do not implement a solution for dealing with interactions that were expired too early.&lt;/p&gt;
&lt;h3&gt;Challenge: How to keep as few log lines in memory as possible?&lt;/h3&gt;
&lt;p&gt;Since our concept includes getting rid of log lines from memory as soon as possible, we will have to look at how this might be achieved.  Remember, we want to only output interactions which are included in our filter criteria, which can be quite different and need to look at&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;interaction id,&lt;/li&gt;
	&lt;li&gt;initial timestamp,&lt;/li&gt;
	&lt;li&gt;last timestamp,&lt;/li&gt;
	&lt;li&gt;a string matched somewhere in any of the log lines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It should be immediately clear that not all of these criteria can be evaluated when the initial line of an interaction is seen.  A simple matches / does not match logic won&amp;#8217;t help here.  We need a filter that can tell us &amp;#8220;matches&amp;#8221;, &amp;#8220;does not match&amp;#8221; and &amp;#8220;maybe matches&amp;#8221; where the first two answers must only be given if there is no new information (log lines) which can make it invalid.  So we will use a filter that returns any of &lt;code&gt;:yes&lt;/code&gt;, &lt;code&gt;:no&lt;/code&gt; or &lt;code&gt;:maybe&lt;/code&gt;.  This is our dummy filter, which also has a changed interface:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
    YES = Class.new do
      def first(iap, line, ts) :yes end
      def initial(iap, line, ts) :yes end
      def followup(iap, line) :yes end
    end.new
&lt;/pre&gt;
&lt;p&gt;We do not use the complete &lt;code&gt;InteractionProcessor&lt;/code&gt; as argument but add the current line and also the timestamp if needed.  This also leads to three processing states in our &lt;code&gt;InteractionProcessor&lt;/code&gt;: undecided, including, excluding.  We handle this using the &lt;a href="http://c2.com/cgi/wiki?StatePattern"&gt;state pattern&lt;/a&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  class InteractionProcessor

    UNDECIDED = Class.new do
      def process_initial(iap, line, time_stamp)
        case iap.coord.filter.first(iap, line, time_stamp)
        when :yes
          # we'll improve this once LRU is there
          iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
          INCLUDE
        when :no
          iap.entries.clear
          EXCLUDE
        when :maybe
          iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
          self
        else
          raise 'Illegal return'
        end
      end

      def process(iap, line, time_stamp)
        case iap.coord.filter.initial(iap, line, time_stamp)
        when :yes
          # we'll improve this once LRU is there
          iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
          INCLUDE
        when :no
          iap.entries.clear
          EXCLUDE
        when :maybe
          iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
          self
        else
          raise 'Illegal return'
        end
      end

      def append_line(iap, line)
        case iap.coord.filter.followup(iap, line)
        when :yes
          # we'll improve this once LRU is there
          l = iap.entries.last and l.line &amp;lt;&amp;lt; line
          INCLUDE
        when :no
          iap.entries.clear
          EXCLUDE
        when :maybe
          l = iap.entries.last and l.line &amp;lt;&amp;lt; line
          self
        else
          raise 'Illegal return'
        end
      end
    end.new

    INCLUDE = Class.new do
      def process_initial(iap, line, time_stamp)
        iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
        self
      end

      def process(iap, line, time_stamp)
        iap.entries &amp;lt;&amp;lt; Entry.new(time_stamp, line)
        self
      end

      def append_line(iap, line)
        l = iap.entries.last and l.line &amp;lt;&amp;lt; line
        self
      end
    end.new

    EXCLUDE = Class.new do
      def process_initial(iap, line, ts)
        self
      end

      def process(iap, line, ts)
        self
      end

      def append_line(iap, line)
        self
      end
    end.new

  # ...

    # Process the first line
    def process_initial(time_stamp, line)
      @state = @state.process_initial(self, line, time_stamp)
    end

    # Process an initial line
    def process(time_stamp, line)
      @state = @state.process(self, line, time_stamp)
    end

    # Append a continuation line to the last entry
    def append_line(line)
      @state = @state.append_line(self, line)
    end

  # ...

  end
&lt;/pre&gt;
&lt;h3&gt;Challenge: Number of Open File Descriptors&lt;/h3&gt;
&lt;p&gt;If we store 40,000 &lt;code&gt;InteractionProcessors&lt;/code&gt; in memory then the worst case with regard to file descriptors is that all have one open.  That&amp;#8217;s more than usually allowed for user processes.  We&amp;#8217;ll remedy this by using a &lt;span class="caps"&gt;LRU&lt;/span&gt; strategy here as well: we will have a set of open files and close the least recently used if we hit a fixed limit.  I will cover this in one of the next versions, when we&amp;#8217;ll have &lt;span class="caps"&gt;LRU&lt;/span&gt; handling integrated.  (Maybe there is even a gem we can reuse.)&lt;/p&gt;
&lt;h3&gt;Summary and Outlook&lt;/h3&gt;
&lt;p&gt;This has grown into a rather largish article but I wanted to cover &lt;code&gt;InteractionProcessor&lt;/code&gt; with reasonable completion.  Note, that we still do not have the &lt;span class="caps"&gt;LRU&lt;/span&gt; mechanics and a couple of other things so this class will change again.  Next articles will have to deal with &lt;span class="caps"&gt;LRU&lt;/span&gt; implementation if I do not find anything suitable and integration of that into the rest of the application.  Furthermore, I expect the implementation of the filtering to be one more complex and thus interesting piece.  Stay tuned.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Tue, 30 Jun 2009 16:30:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/011-Animal_Interaction.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/011-Animal_Interaction.html</guid></item><item><title>The Animal raises its head</title><description>&lt;p&gt;You can find the first &lt;a href="http://github.com/rklemme/muppet-laboratories/blob/16dc8851554bf29cee37a0dd75a7869c99b10c7d/bin/sample-animal.rb"&gt;sample-animal.rb&lt;/a&gt; over there at &lt;a href="http://github.com/rklemme/muppet-laboratories/tree/master"&gt;github&lt;/a&gt;.  Deficits of this version:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;No command line parsing,&lt;/li&gt;
	&lt;li&gt;No filtering,&lt;/li&gt;
	&lt;li&gt;Works only for moderately sized files,&lt;/li&gt;
	&lt;li&gt;No particular optimizations yet (well, apart from freezing of id Strings).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One remark about method &lt;code&gt;main&lt;/code&gt;: this is the result of asking myself &amp;#8220;What is the minimal interface I can provide for users of the Animal and who basically only need to define the parser class?&amp;#8221;  At first I found it a bit odd, but in terms of lines of code this is likely one of the leanest solutions.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ll have to call it a day now, but I invite you to have a look and comment.  Do you consider method &lt;code&gt;main&lt;/code&gt; a good idea?  I&amp;#8217;ll chime in with some more explanations later.&lt;/p&gt;
&lt;p&gt;&lt;ins&gt;[Update:_]&lt;/ins&gt; If you are looking for the entry point of the implementation for the sample log you need to look into file &lt;code&gt;sample-animal.rb&lt;/code&gt;:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
#!/usr/local/bin/ruby19 -w

# Implementation of the parser for the sample
# generator script test-gen.rb.
$: &amp;lt;&amp;lt; File.join(File.dirname(File.dirname($0)), "lib")

require 'time'
require 'animal'

# main defines the custom parser class!
Animal.main do

  TIME_FORMAT = '%Y-%m-%d %H:%M:%S.%N'.freeze

  attr_accessor :year
  attr_reader :interaction_id, :time_stamp

  def parse(line)
    if %r{
       ^
       ( \d{4}-\d{2}-\d{2} \s \d{2}:\d{2}:\d{2}(?:\.\d+)? )
       \s+
       (\S+) # interaction_id
       \s+
       }x =~ line
      @time_stamp = Time.strptime $1, TIME_FORMAT
      @interaction_id = $2
    else
      @time_stamp = nil
      @interaction_id = nil
    end
  end

  def initial_line?
    time_stamp
  end
end

# EOF
&lt;/pre&gt;
&lt;p&gt;The intersting thing is: this is complete already!  As long as the file format does not change we won&amp;#8217;t have to touch this file any more.  Basically we only provide information about the log file format through the parser whose class is defined with the code block passed to main.  The rest &amp;#8211; command line argument parsing, creating of all necessary objects etc. is done in &lt;code&gt;main&lt;/code&gt; which is not much longer either:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
require 'ostruct'

# Namespace for all the Animal related classes
# Animal is the project on Ruby Best Practices
# blog which demonstrates the thought process
# of writing an application
module Animal

  # Parse the given command line and return an option instance.
  # Options are removed from the argument so what is left in
  # there must be file names.
  def self.parse_command_line(argv = ::ARGV) 
    o = OpenStruct.new(:output_dir =&amp;gt; ".")
    # parse
    o
  end

  # This metho allows to write extremely short applications
  # because it accepts a block which is used to define the
  # parser class.  Alternatively users can provide a parser
  # instance.  The default command line is added implicitly
  # and will be option parsed and the whole processing will
  # start automatically.
  def self.main(parser = nil, argv = ::ARGV, &amp;amp;class_body)
    $stderr.puts 'WARNING: ignoring class body' if parser &amp;amp;&amp;amp; class_body
    parser ||= Class.new(&amp;amp;class_body).new
    options = parse_command_line(argv)
    coord = Coordinator.new
    coord.parser = parser
    coord.options = options
    coord.process_files argv
  end

  # autoload init
  %w{
    Coordinator
    ProcessingStorage
    FileStatistics
    InteractionProcessor
  }.each do |cl|
    autoload cl, "animal/#{cl.gsub(/([a-z])([A-Z])/, '\\1_\\2').downcase}"
  end

end
&lt;/pre&gt;
&lt;p&gt;There is really not much magic in &lt;code&gt;main&lt;/code&gt; apart from (ab)using the method block as class body.  (Btw, this is something you can do with &lt;code&gt;Struct&lt;/code&gt; as well.  I&amp;#8217;ll write about that in a future post.)  I can&amp;#8217;t think of a leaner interface to get the job done but maybe someone out there has another idea.&lt;/p&gt;
&lt;p&gt;I had thought of defining a single regular expression with capturing group indexes for timestamp and key.  But then we might also need a way to specify timestamp conversion.  And then there are also file formats which have timestamps separated in time and date and even binary formats&amp;#8230;  This looks like an area we should revisit after v1.0 &amp;#8211; maybe there is some room for improvement.  For now we&amp;#8217;ll stick with the line oriented formats which should cover a pretty wide range &amp;#8211; including the requirements.&lt;/p&gt;
&lt;p&gt;I am also not convinced that the &lt;code&gt;autoload&lt;/code&gt; part is really that great.  Initially it looked like a good idea but now I am not sure any more: writing out class and file names is probably better readable and as easy as the automated class name to file name conversion hack.&lt;/p&gt;
&lt;p&gt;Now, what do you guys say?&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 22 Jun 2009 22:36:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/010-The_Animal_raises_its_Head.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/010-The_Animal_raises_its_Head.html</guid></item><item><title>Shadow of the Animal</title><description>&lt;p&gt;Let&amp;#8217;s recapitulate where we have been for readers who are new to the blog and for the convenience of others.  I &lt;a href="005_Enter_the_Muppet_Laboratories.html"&gt;started this&lt;/a&gt; a while ago as an experiment trying to give some insights into my way of reasoning.  I choose processing of large log files as subject which should help me analyze problems on production systems.  I do believe that this might also help others as log file analysis is a fairly common task.  The key point here is to be able to efficiently sift through many large logfiles and give access to the relevant information (see also the &lt;a href="007-Requirements.html"&gt;list of requirements&lt;/a&gt; and the &lt;a href="006-The_Muppet_Project.html"&gt;initial description&lt;/a&gt; of the scope).&lt;/p&gt;
&lt;p&gt;I picked &amp;#8220;Animal&amp;#8221; as name of the logfile analyzer because that must sift through tons of log data in a similar way as the &lt;a href="http://muppet.wikia.com/wiki/Animal"&gt;like named character&lt;/a&gt; of a &lt;a href="http://muppet.wikia.com/wiki/The_Muppet_Show"&gt;TV show&lt;/a&gt; treated his instrument &amp;#8211; ferociously.&lt;/p&gt;
&lt;p&gt;In today&amp;#8217;s article I will present some major design decisions.  Some core components of the architecture have been mentioned in the &lt;a href="008-First_Design_Considerations.html"&gt;previous article&lt;/a&gt; already so I won&amp;#8217;t repeat them here.&lt;/p&gt;
&lt;h3&gt;Design Decisions&lt;/h3&gt;
&lt;p&gt;As I want to give insights into my way of reasoning about software and design in particular I will list some of the major design decisions along with my rationale.&lt;/p&gt;
&lt;h4&gt;No class for individual log entries&lt;/h4&gt;
&lt;p&gt;The reasoning has been presented in a &lt;a href="008-First_Design_Considerations.html"&gt;previous article&lt;/a&gt;: I want to avoid the overhead of allocating short lived objects.  You can play around with a &lt;a href="http://github.com/rklemme/muppet-laboratories/tree/master"&gt;toy Java project&lt;/a&gt; (Eclipse as well as ant) to see the effect I was talking about.  I had up to 9% time difference between the straight version and the version which puts data in a special record class.&lt;/p&gt;
&lt;h4&gt;No meta data&lt;/h4&gt;
&lt;p&gt;I had pondered the option to use a two step approach with an initial analysis step which will generate meta data which then is used by another program to efficiently extract relevant information.  The advantage clearly would have been that we can apply a lot of different filtering and selection criteria without having to go through the initial analysis phase.  I have decided against for these reasons:&lt;/p&gt;
&lt;dl&gt;
	&lt;dt&gt;Easier propagation of relevant data&lt;/dt&gt;
	&lt;dd&gt;Others do not need the extraction program and the original log files to look at interactions,&lt;/dd&gt;
	&lt;dt&gt;Simpler software structure&lt;/dt&gt;
	&lt;dd&gt;Only one program is needed, no writing and reading of meta data needed,&lt;/dd&gt;
	&lt;dt&gt;More robust&lt;/dt&gt;
	&lt;dd&gt;A change in the location of the original logs does not affect analysis.&lt;/dd&gt;
&lt;/dl&gt;
&lt;h4&gt;Efficient Processing&lt;/h4&gt;
&lt;p&gt;This is the part I personally find most interesting: how do we manage to stream process log data while still limiting memory usage?  We cannot simply aggregate all interactions in a huge Hash and dump the interesting bits at the end because this will burn too much memory (which costs speed through paging) or even make the whole process fail completely.  That would be especially bad since in a case of lacking memory we would loose all the work that has been done up to the point.&lt;/p&gt;
&lt;p&gt;My idea for solving this involves two components:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;A hash based storage with &lt;span class="caps"&gt;LRU&lt;/span&gt; semantics,&lt;/li&gt;
	&lt;li&gt;An interaction processor.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Basically the &lt;span class="caps"&gt;LRU&lt;/span&gt; storage is responsible for keeping only a limited number of interaction processors in memory while the interaction processor is responsible for dealing as efficiently as possible with a single interaction.  This means, the interaction processor will have to efficiently decide whether an interaction is included in the output and write out log lines as soon as possible (!) to a file in a given location.&lt;/p&gt;
&lt;h4&gt;Output Storage&lt;/h4&gt;
&lt;p&gt;Every interaction will be stored in its own file.  To avoid potential issues with file systems becoming inefficient with thousands of files we will use an output directory tree with these levels:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;date, formatted &lt;span class="caps"&gt;YYYY&lt;/span&gt;-MM-DD (example &amp;#8216;2006-06-21&amp;#8217;),&lt;/li&gt;
	&lt;li&gt;hour and minute, two digit 24 hour clock and minute (example &amp;#8216;19-11&amp;#8217;).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Interaction file names will include the timestamp of the first record.  The naming will be like this: &amp;#8220;SS.&lt;span class="caps"&gt;SSS&lt;/span&gt;-&lt;interaction id&gt;&amp;#8221; where &amp;#8220;SS&amp;#8221; are the seconds of the timestamp and &amp;#8220;&lt;span class="caps"&gt;SSS&lt;/span&gt;&amp;#8221; are milliseconds&amp;quot;.  A valid name might then look like this &lt;code&gt;2009-06-21/19-11/00.003-sdmyrsmmlbxodfd&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The advantage is that we will not face any filesystem issues because there will be only a few thousand entries per minute.  Plus, because of the hierarchical naming of directories and files we can easily use textual sorting of names and get chronological output.  If we discover that there are too many entries per minute we can still extend the schema pretty easily to include a directory level for seconds.&lt;/p&gt;
&lt;h3&gt;Structure&lt;/h3&gt;
&lt;p&gt;The structure of the application will look like this: there is an option parsing method which outputs options which contain a combination of all processing options.  These options will be hand off to the Coordinator which is the main driver of the processing.  The Coordinator also needs a Parser instance which is capable of parsing log lines and providing interaction_id as well as a time_stamp of the last parsed entry.  The coordinator will create InteractionProcessors as needed and hold them in a structure keyed by interaction_id.  Each InteractionProcessor will handle a single interaction only.  It knows the coordinator through which it gains access to options and a filter evaluator which determines whether an interaction is included; this filter works on the InteractionProcessor.&lt;/p&gt;
&lt;h3&gt;Thought Process&lt;/h3&gt;
&lt;p&gt;Now you might wonder how I arrived at these decisions.  Well, I wanted the resulting bit of software to get the job done simply and efficiently.  Generating a whole lot of meta data which helps in finding interactions efficiently would have created additional complexity as I have pointed out.  So it was logical to rather choose a grepish approach, i.e. read the input once and spit out everything that we are interested in.&lt;/p&gt;
&lt;p&gt;Since volume of input data is large we cannot expect to hold all the input in memory at some point in time.  So we would rather have to pick a streaming approach.  This means, we can only hold as many entities in memory at a time.  Luckily, log entries of a single interaction have high locality, i.e. whenever the first occurs follow up lines are not too far away and the interaction end is usually near as well.&lt;/p&gt;
&lt;p&gt;Now, I considered two approaches for aging out interactions from the processor&amp;#8217;s storage: timestamp based and access based.  I picked access based because in reality there are some interactions which take a bit longer than the usual interaction which would make picking a high latency necessary.  This in turn would keep short interactions longer in memory than necessary.  Also, with this approach you cannot put a hard numeric limit on the number of interactions in memory which could lead to failures for some logs while other logs are processed ok.  With a &lt;span class="caps"&gt;LRU&lt;/span&gt; based storage we can set a size limit and avoid this problem.  Of course, long interactions which also have many log entries could still cause memory issues but since these are comparatively rare the likelyhood of issues is small.&lt;/p&gt;
&lt;p&gt;I will now wander off hacking together a rough application which does not yet do much but should give you an idea of how all the pieces are supposed to work together.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Sun, 21 Jun 2009 19:36:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/009-Shadow_of_the_Animal.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/009-Shadow_of_the_Animal.html</guid></item><item><title>First Design Considerations</title><description>&lt;p&gt;Now the interesting part begins!  In an early phase like this I like to look at the problem I want to solve from different angles to get a feeling for implementation options.  The output will look like an unsorted collection of notes &amp;#8211; and that&amp;#8217;s what it is!  I don&amp;#8217;t claim that this is the best approach around but this helps me to get a better grip of the problem.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Side note&lt;/em&gt;: I believe that despite all the methodologies around the way you approach a new problem is highly influenced by your personal way of thinking and working.  There is no one size fits all approach and &amp;#8220;chaos&amp;#8221; at the beginning is nothing to be afraid of.  In fact it may well be that &amp;#8220;chaos&amp;#8221; gives birth to the best ideas.&lt;/p&gt;
&lt;h3&gt;Obtaining Information about individual Records&lt;/h3&gt;
&lt;p&gt;Requirement M3 makes it necessary that in some place we have a plug where we can insert something that does log file format specific things.  Basically for parsing a single line a block would be sufficient.  If the format parses ok, the interaction identifier is returned, if not, we have a continuation line.  However, it seems this solution is too simplistic: for time filtering (M6) and time gap detection (S3) we certainly also need the timestamp of the record.  M7, which I had forgotten initially, also makes it necessary that the year is somehow synthesized &amp;#8211; or inserted into the line parser from surrounding code.  So, we will likely end up with something that would be an interface in other programming languages, i.e. we need a parser class which speaks a certain protocol / has a certain &lt;span class="caps"&gt;API&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s an alternative approach: we create a record class and write our parser in a way that it spits out instances of this record class all the time:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
Record = Struct.new :time_stamp, :interaction_id, :message

parser = ...

parser.each do |rec|
  interaction = find_interaction(rec.interaction_id)
  time_gap = rec.time_stamp - interaction.last_time_stamp
  ...
end
&lt;/pre&gt;
&lt;p&gt;From an object oriented point of view there&amp;#8217;s much which speaks in favour of this:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;A record is an entity in our description of the problem domain and as such it is naturally to have it represented in the application,&lt;/li&gt;
	&lt;li&gt;We can add record specific functionality to this class (e.g. get all potential keywords used for searching / filtering),&lt;/li&gt;
	&lt;li&gt;We do not have to care about multiline detection, this is completely encapsulated in the parser,&lt;/li&gt;
	&lt;li&gt;We can enforce class invariants (e.g. all fields must be non nil and &lt;code&gt;time_stamp&lt;/code&gt; must be a &lt;code&gt;Time&lt;/code&gt; instance).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, there&amp;#8217;s a price to pay in object allocations and collections.  Remember, we are talking about files with tons of records; if all these records are short lived (i.e. not used for storing them somewhere) we&amp;#8217;re generating a lot of overhead which the garbage collector has to get rid of again.  So I rather lean towards the &amp;#8220;parser interface&amp;#8221; approach mentioned above:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
parser = ...

parser.line = line

if parser.continuation_line?
  last_record &amp;lt;&amp;lt; "\n" &amp;lt;&amp;lt; line
else
  ...
  last_record = line
  diff = parser.time_stamp - last_time
  ...
end
&lt;/pre&gt;
&lt;p&gt;Now, it&amp;#8217;s debatable whether this falls into the category of &lt;a href="http://en.wikipedia.org/wiki/Premature_optimization#When_to_optimize"&gt;premature optimisation&lt;/a&gt;, but since we know that we are going to crunch a lot of data we should probably avoid this kind of overhead.&lt;/p&gt;
&lt;h3&gt;Filtering&lt;/h3&gt;
&lt;p&gt;From the requirements it is clear that we would like to have quite flexible filtering (M6, S3, S4).  We have two options:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;We apply the filtering during processing.&lt;/li&gt;
	&lt;li&gt;Processing gathers meta data about the data and we can efficiently query later with whatever filters we like.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Hybrid approaches are also possible, e.g. time filtering during processing but searching for keywords later.  From the user&amp;#8217;s point of view option 2 is certainly more desirable because it gives greater flexibility.  On the other hand this must be balanced with processing overhead and disk space.  As usual, if we want great flexibility for queries after analysis we need more disk space for meta data (indexes).  In another hybrid approach the types of desired filters are provided as input into the processor so that only indexes for criteria are created which are actually queried later.  The &amp;#8220;flexible query&amp;#8221; approach certainly reminds me of relational database.&lt;/p&gt;
&lt;p&gt;When we do online filtering (option 1 above) we need to keep some aspects in mind.  For example, if we want to filter by text found in an interaction log message (the part after the interaction id in the sample generator output), that text might not appear in the first record of a particular interaction yet we would want to see all preceeding records as well (M1).  Similar for time range filtering: an interaction might have started before the range start but end after it.  It should be clear that if we choose this approach we need a way to keep some history of past records during processing.  Since we can&amp;#8217;t store everything we&amp;#8217;ve seen so far we need to decide what we need to keep and what we can discard.&lt;/p&gt;
&lt;p&gt;I need to ponder this a bit more but at the moment I would rather choose the online filtering approach for these reasons:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Less disk space used on output,&lt;/li&gt;
	&lt;li&gt;Potentially less write IO during processing, which would help performance since we are almost certainly IO bound,&lt;/li&gt;
	&lt;li&gt;Easier distribution to other staff &amp;#8211; if we need meta data to get at information efficiently this also means that someone needs to have the software to extract the information he wants (plus the index data and the original data set which is large).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Statistics&lt;/h3&gt;
&lt;p&gt;Collecting statistics like no. of records and no. of lines per file isn&amp;#8217;t too expensive so we probably create a statistics class which records just that.&lt;/p&gt;
&lt;h2&gt;Application Structure&lt;/h2&gt;
&lt;p&gt;If you do not know &lt;a href="http://en.wikipedia.org/wiki/Class-Responsibility-Collaboration_card"&gt;&lt;span class="caps"&gt;CRC&lt;/span&gt; Cards&lt;/a&gt; yet this is a good opportunity to look into this simple concept.  It does not have the power of an &lt;span class="caps"&gt;UML&lt;/span&gt; collaboration diagram but it helps identify classes and how they might play together.&lt;/p&gt;
&lt;p&gt;We can roughly identify some pieces already:&lt;/p&gt;
&lt;dl&gt;
	&lt;dt&gt;Parser&lt;/dt&gt;
	&lt;dd&gt;parse individual lines of the input, identify relevant information (timestamp, interaction id).&lt;/dd&gt;
	&lt;dt&gt;FileStatistics&lt;/dt&gt;
	&lt;dd&gt;collect various data points about a processed file (lines, time taken&amp;#8230;).&lt;/dd&gt;
	&lt;dt&gt;OptionProcessor&lt;/dt&gt;
	&lt;dd&gt;process command line options.&lt;/dd&gt;
	&lt;dt&gt;Coordinator&lt;/dt&gt;
	&lt;dd&gt;This is the &lt;a href="http://en.wikipedia.org/wiki/Master_Control_Program_(Tron)#Master_Control_Program"&gt;&lt;span class="caps"&gt;MCP&lt;/span&gt;&lt;/a&gt; which binds everything together and coordinates the processing.&lt;/dd&gt;
	&lt;dt&gt;ProcessingStorage&lt;/dt&gt;
	&lt;dd&gt;this is the foggiest entity so far; it will be responsible for storing information about processed interactions.  Since I haven&amp;#8217;t decided on the strategy yet, I&amp;#8217;ll leave it at this for now.  We&amp;#8217;ll have to make this more concrete later.  One thing is for sure: this will likely the most complex class &amp;#8211; or rather set of classes.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;So much for initial design considerations.  I&amp;#8217;ll stop here, to give us a break and allow for some discussion, which probably will turn up new aspects and new ideas.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Fri, 12 Jun 2009 20:36:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/008-First_Design_Considerations.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/008-First_Design_Considerations.html</guid></item><item><title>Requirements Summary of the Laboratory Project</title><description>&lt;p&gt;I will now take on the role of project secretary and put all requirements into a form suitable for easier reference.  There will be the following groups:&lt;/p&gt;
&lt;dl&gt;
	&lt;dt&gt;must (M)&lt;/dt&gt;
	&lt;dd&gt;requirement must be fulfilled&lt;/dd&gt;
	&lt;dt&gt;should (S)&lt;/dt&gt;
	&lt;dd&gt;requirement should be fulfilled but is not a show stopper&lt;/dd&gt;
	&lt;dt&gt;nice to have (N)&lt;/dt&gt;
	&lt;dd&gt;this one is purely optional&lt;/dd&gt;
	&lt;dt&gt;future (F)&lt;/dt&gt;
	&lt;dd&gt;this might be a good idea for a later release but should not influence the current version&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;I will later reference requirements via letter and number, e.g. &amp;#8220;S4&amp;#8221; refers to &amp;#8220;filter by text&amp;#8221; requirement (and not a fast car from a German manufacturer).&lt;/p&gt;
&lt;p&gt;Until now there was not much reasoning about software although there were some ideas (and even prototypes) mentioned in the discussion thread of the &lt;a href="006-The_Muppet_Project.html"&gt;last article&lt;/a&gt;.  With the next article I will start to flesh out the architecture (or even some alternatives) along with the reasoning that led to different decisions.&lt;/p&gt;
&lt;h3&gt;Must have&lt;/h3&gt;
&lt;table&gt;
	&lt;tr&gt;
		&lt;th&gt;number&lt;/th&gt;
		&lt;th&gt;description&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 1&lt;/td&gt;
		&lt;td&gt;Provide efficient access to all entries of selected interactions&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 2&lt;/td&gt;
		&lt;td&gt;Retain ordering of input lines per interaction&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 3&lt;/td&gt;
		&lt;td&gt;Parsing needs to be able to identify multi line entries&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 4&lt;/td&gt;
		&lt;td&gt;Parsing of lines needs to be easily adjustable to different line formats&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 5&lt;/td&gt;
		&lt;td&gt;Analysis must not rely on any particular text to identify first and last lines of an interaction&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 6&lt;/td&gt;
		&lt;td&gt;Filter interactions by time range (between a and b, before a, after a)&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 7&lt;/td&gt;
		&lt;td&gt;If timestamps are parsed formats without year must be properly parsed&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;Should have&lt;/h3&gt;
&lt;table&gt;
	&lt;tr&gt;
		&lt;th&gt;number&lt;/th&gt;
		&lt;th&gt;description&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 1&lt;/td&gt;
		&lt;td&gt;Input file names should be provided via command line arguments&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 2&lt;/td&gt;
		&lt;td&gt;Analysis should deal gracefully with partial interactions (at the beginning and end of a logfile)&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 3&lt;/td&gt;
		&lt;td&gt;Idenfity pauses in interactions with a configurable length and report them&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 4&lt;/td&gt;
		&lt;td&gt;Filter interactions by some text contained in messages, ideally via regexp matching&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 5&lt;/td&gt;
		&lt;td&gt;Make interactions available in their original formatting&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;Nice to have&lt;/h3&gt;
&lt;table&gt;
	&lt;tr&gt;
		&lt;th&gt;number&lt;/th&gt;
		&lt;th&gt;description&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 1&lt;/td&gt;
		&lt;td&gt;Identification of the proper chronological order of files&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 2&lt;/td&gt;
		&lt;td&gt;Read from stdin&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 3&lt;/td&gt;
		&lt;td&gt;&lt;span class="caps"&gt;HTML&lt;/span&gt; output&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 4&lt;/td&gt;
		&lt;td&gt;Do not use more than 20% on top of a &lt;code&gt;cat &amp;gt;/dev/null&lt;/code&gt;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 5&lt;/td&gt;
		&lt;td&gt;Statistics about the analysis such as time range, files read, lines read, interactions found etc.&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;h3&gt;Future&lt;/h3&gt;
&lt;table&gt;
	&lt;tr&gt;
		&lt;th&gt;number&lt;/th&gt;
		&lt;th&gt;description&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 1&lt;/td&gt;
		&lt;td&gt;Analysis data stored in a relational database for later analysis with &lt;span class="caps"&gt;SQL&lt;/span&gt; queries&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt; 2&lt;/td&gt;
		&lt;td&gt;Correlate data from different components with different log file naming schemes and formats&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Please let me know if I missed something or there are other mistakes (such as contradictory requirements).&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 08 Jun 2009 21:59:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/007-Requirements.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/007-Requirements.html</guid></item><item><title>The Laboratory Project</title><description>&lt;p&gt;The company I am with creates billing systems for telecom companies.  More precisely we call them &amp;#8220;convergent billing systems&amp;#8221; because billing is not limited to phone calls or &lt;span class="caps"&gt;SMS&lt;/span&gt;.  You can also bill multimedia content, internet traffic and whatnot.  Basically everything that runs through your telephone or IT infrastructure.  In our office we even have a solution where the billing system is used to charge money via our corporate id cards at the local vending machine.&lt;/p&gt;
&lt;p&gt;These billing systems have a tremendous amount of interesting features.  Listing them all would need an article of its own.  For the sake of our project it is important to remember that these systems are quite complex and have a high throughput.  We have a central component implemented as a &lt;a href="http://java.sun.com/javaee/"&gt;&lt;span class="caps"&gt;JEE&lt;/span&gt;&lt;/a&gt; application which front ends to customer care and other systems that may be present at a customer site.  (The term &amp;#8220;customer&amp;#8221; is really ambiguous: there are our customers, which are telco companies and their customers, i.e. everybody using their phone lines.  I will talk about &lt;em&gt;our&lt;/em&gt; customers only.)&lt;/p&gt;
&lt;p&gt;Now, this central application writes out logfiles using usual mechanisms (you can look at &lt;a href="http://logging.apache.org/log4j/"&gt;log4j&lt;/a&gt; in case you do not know Java logging and are interested in more details).  As you can imagine these log files grow from large to huge because they will contain several lines of per business interaction (basically this is a remote call from a client system).  Sometimes these interactions fail and our support staff need to look at these log files to find out what&amp;#8217;s wrong.  When it&amp;#8217;s a tricky issue we people from the development department need to do this as well.  Simplyfying this task is the goal of this toy project.&lt;/p&gt;
&lt;p&gt;Some things you need to know about these log files:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;all log entries belonging to the same interaction share a common identifier,&lt;/li&gt;
	&lt;li&gt;they are huge,&lt;/li&gt;
	&lt;li&gt;multiple interactions are interspersed,&lt;/li&gt;
	&lt;li&gt;some log entries span multiple consecutive lines,&lt;/li&gt;
	&lt;li&gt;log files are rotated, i.e. there are some incomplete interactions at the beginning and end.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have set up a &lt;a href="http://github.com/rklemme/muppet-laboratories/tree/master"&gt;git repository&lt;/a&gt; where I created a &lt;a href="http://github.com/rklemme/muppet-laboratories/blob/ef7a28d935ada03ff141ffee6f3b19e8dfc71875/bin/test-gen.rb"&gt;generator script&lt;/a&gt; which you can use to generate sample logs and get a rough idea how they look like.  Of course, there is no real data but the basic properties from above are present.  Try to run the script with &lt;code&gt;test-gen.rb -h&lt;/code&gt; to see supported parameters.  If you don&amp;#8217;t the script will throw a lot of text at you.  I included &amp;#8220;&lt;span class="caps"&gt;START&lt;/span&gt;&amp;#8221; and &amp;#8220;&lt;span class="caps"&gt;END&lt;/span&gt;&amp;#8221; messages per interaction only to simplify testing &amp;#8211; in practice we cannot rely on the presence of lines which unambiguously identify the first and last line of an interaction.&lt;/p&gt;
&lt;p&gt;What I like to have at the end of the project is a Ruby application which efficiently separates individual interactions thus they can easily be analyzed individually.  In order to get there, we will start collecting requirements.  Let&amp;#8217;s assume for the moment that I am your customer and you try to get a list of requirements.  I would like to invite you to query me via the comment system.  I am sure, together we will come up with a better list than if I would write it down alone.  In the next article of the series I will then present the summary of our efforts.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Wed, 03 Jun 2009 21:56:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/006-The_Muppet_Project.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/006-The_Muppet_Project.html</guid></item><item><title>Enter the Muppet Laboratories</title><description>&lt;p&gt;I have decided I will start an experiement.  I haven&amp;#8217;t done this before and I have no idea whether it will work as indended but I am sure we can pull this off together.  Here&amp;#8217;s the deal: I will try to let you participate in my thought process which hopefully will turn up some useful if not best practices.  I don&amp;#8217;t claim to be Don Knuth but hopefully you&amp;#8217;ll find it interesting and worthwhile nevertheless.  The tricky part will be to actually catch all the reasonings and record them in a format suitable for blogging.&lt;/p&gt;
&lt;p&gt;In this article I will give a brief overview of how I intend to proceed.  Of course a thing like this needs some form of project.  We will come to that in a minute.  The whole series of articles will roughly look like this:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Initial presentation (this article)&lt;/li&gt;
	&lt;li&gt;Description of the project goal&lt;/li&gt;
	&lt;li&gt;Summary of collected requirements&lt;/li&gt;
	&lt;li&gt;Several postings about actual code and decisions&lt;/li&gt;
	&lt;li&gt;Presentation of the final result&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After I present the goal you are invited to help distill requirements via the blog&amp;#8217;s &lt;a href="http://disqus.com/"&gt;comment system&lt;/a&gt;.  I will be the customer for the moment and I am sure with your active support we will get a list together with requirements that the project can meet.&lt;/p&gt;
&lt;p&gt;The whole thing is loosely related to my job but I think you can extract at least parts of it for your own purposes.  Of course we&amp;#8217;ll be doing everything nicely modularized so reuse won&amp;#8217;t be an issue.  :-) I any case the main benefit for us all should be the thought process and insight into decision making during software development.&lt;/p&gt;
&lt;p&gt;The project will be about analysis of large log files which contain high numbers of small business transactions.  In parallel with coming up with a more detailed description I will try to come up with a generator which we can use to create files for testing purposes.  This should help you get a better idea of what I am talking about.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://muppet.wikia.com/wiki/Veterinarian%27s_Hospital"&gt;Tune in next week when you&amp;#8217;ll hear Dr. Bob say&amp;#8230;&lt;/a&gt;&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Fri, 29 May 2009 17:54:00 +0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/005_Enter_the_Muppet_Laboratories.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/005_Enter_the_Muppet_Laboratories.html</guid></item><item><title>Control flow features and readability</title><description>&lt;p&gt;First of all I would like to thank our readers who participate in discussions so actively!  These discussions provide interesting food for thought as well as inspirations for new blog entries.  Today&amp;#8217;s article was partly inspired by the question that surfaced recently: &amp;#8220;Why is &lt;code&gt;catch .. throw&lt;/code&gt; seen so infrequently?&amp;#8221;&lt;/p&gt;
&lt;p&gt;In this article I will explore how the choice of &lt;a href="http://en.wikipedia.org/wiki/Control_flow"&gt;control flow&lt;/a&gt; constructs and their usage affects readability and maintainability of code.  For the purpose of this investigation I will provide a definition, partly because the Wikipedia article seems a bit inconsistent and partly because it may change.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A control flow construct is a language feature which disrupts the normal progression to the next statement and conditionally or unconditionally branches to another location in source code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This definition includes the usual &lt;code&gt;if ... then&lt;/code&gt; but also method invocation and &lt;code&gt;return&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Classes of control flow statements&lt;/h3&gt;
&lt;p&gt;Following a sequence of regular (i.e. non control flow) statements is easy.  Things get only slightly more complicated for the reader in case of &lt;code&gt;if ... then ... else ... end&lt;/code&gt; as long as proper indentation is used and we do not have to scroll multiple pages to see the other branch(es).&lt;/p&gt;
&lt;p&gt;But it is a different story altogether if the stack frame changes and we as readers have to &lt;strong&gt;jump&lt;/strong&gt; to a completely different location &amp;#8212; maybe even in a different file!  Hence I will classify control flow constructs depending on the distance measured in stack frames:&lt;/p&gt;
&lt;dl&gt;
	&lt;dt&gt;level-0&lt;/dt&gt;
	&lt;dd&gt;There is no change in stack frame at all.&lt;/dd&gt;
	&lt;dt&gt;level-1&lt;/dt&gt;
	&lt;dd&gt;Code moves one stack frame up or down.&lt;/dd&gt;
	&lt;dt&gt;level-n&lt;/dt&gt;
	&lt;dd&gt;An arbitrary amount of stack frames can be added or removed.&lt;/dd&gt;
	&lt;dt&gt;level-x&lt;/dt&gt;
	&lt;dd&gt;The stack is completely exchanged.&lt;/dd&gt;
	&lt;dt&gt;level-1n&lt;/dt&gt;
	&lt;dd&gt;This is a hybrid where &amp;#8220;technical&amp;#8221; and &amp;#8220;visual&amp;#8221; jumps differ.&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;I am sure those first three categories do not bear any surprises for you.  The last two probably need a bit of explanation.  Category &lt;em&gt;level-x&lt;/em&gt; contains Ruby&amp;#8217;s continuations.  I won&amp;#8217;t cover them here because I am not too familiar with them, they are rarely used in &amp;#8220;ordinary&amp;#8221; code and they are so complex that they probably deserve an article of their own.  It should be obvious that this complexity does not really help understanding code.&lt;/p&gt;
&lt;h3&gt;level-0&lt;/h3&gt;
&lt;p&gt;This category includes&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;code&gt;if ... then ... elsif ... else ... end&lt;/code&gt; as well as &lt;code&gt;unless ...&lt;/code&gt;,&lt;/li&gt;
	&lt;li&gt;ternary operator &lt;code&gt;... ? ... : ...&lt;/code&gt;,&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;for&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt; and &lt;code&gt;until&lt;/code&gt; loops,&lt;/li&gt;
	&lt;li&gt;both forms of &lt;code&gt;case&lt;/code&gt; statements,&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;and&lt;/code&gt; and &lt;code&gt;or&lt;/code&gt;,&lt;/li&gt;
	&lt;li&gt;all statement modifiers (&lt;code&gt;if&lt;/code&gt; etc. at the end of a statement).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All these provide for easy following of code and can &amp;#8212; when chosen wisely &amp;#8212; even form Ruby code which almost reads like English.  There are really only two things that can make them hard to follow&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Not indenting code properly.&lt;/li&gt;
	&lt;li&gt;Putting too much lines of code between different keywords belonging to the same construct.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As long as you follow these basic rules, the reader of your code will be able to follow the flow easily.&lt;/p&gt;
&lt;h3&gt;level-1&lt;/h3&gt;
&lt;p&gt;In this category we have&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;method invocation,&lt;/li&gt;
	&lt;li&gt;method return.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note, that I did not place &lt;code&gt;return&lt;/code&gt; in this category &amp;#8212; we will see it in &lt;em&gt;level-1n&lt;/em&gt; below.&lt;/p&gt;
&lt;p&gt;The readability of a method invocation depends mostly on how good it conveys what the method does.  A Ruby programmer who reads &lt;code&gt;x.to_s&lt;/code&gt; immediately knows that this expression returns a string representation of &lt;code&gt;x&lt;/code&gt;.  (Strictly speaking there is of course no guarantee that this method actually returns a String instance but for all practical purposes we can safely assume this.)&lt;/p&gt;
&lt;p&gt;For unknown methods the name is crucial for our understanding &amp;#8212; or at least rough idea &amp;#8212; of what happens during the method call.  This is important because only if we have this understanding we can continue reading and understand what the current method does.  This shows how maintainability not only depends on proper modularization but also on well chosen method names.  In fact it might be more important to get a method&amp;#8217;s name right than to have documentation which covers all aspects of the method&amp;#8217;s semantics.  Don&amp;#8217;t get me wrong, I do not want you to neglect your documentation!  But a proper chosen name goes a long way in telling the reader of the code that &lt;em&gt;uses&lt;/em&gt; this method what it does.&lt;/p&gt;
&lt;h3&gt;level-n&lt;/h3&gt;
&lt;p&gt;Again, there are two contenders:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;code&gt;raise ... rescue&lt;/code&gt;,&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;catch ... throw&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although these two may seem to only differ in syntax on first sight, there are fundamental differences.  I believe that these are ultimately the reason why we see exceptions quite frequently while we rarely discover a &lt;code&gt;catch&lt;/code&gt; statement in code.  Let&amp;#8217;s first look at exceptions:&lt;/p&gt;
&lt;p&gt;The power of exceptions comes from decoupling the signalling of an error and its handling.  The meaning of the error condition is engraved in the exception class (via inheritance and documentation).  We know that &lt;code&gt;Errno::ENOENT&lt;/code&gt; denotes a non existing file and we can write an exception handler for this.  When we &lt;code&gt;raise&lt;/code&gt; this exception we do not know how many stack frames upwards there will be a handler for it &amp;#8212; and we do not need to.  If there is none ultimately the interpreter will exit with an error message and an exit code != 0.&lt;/p&gt;
&lt;p&gt;Contrast this with &lt;code&gt;catch ... throw&lt;/code&gt; &amp;#8212; everything seems to be the opposite here:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;This combination is for &lt;em&gt;regular&lt;/em&gt; control flow and not for dealing with &lt;em&gt;error&lt;/em&gt; situations.&lt;/li&gt;
	&lt;li&gt;You&amp;#8217;ll first see the &lt;code&gt;catch&lt;/code&gt; and then &lt;code&gt;throw&lt;/code&gt;.&lt;/li&gt;
	&lt;li&gt;There is strong coupling between the &lt;code&gt;catch&lt;/code&gt; location and the &lt;code&gt;throw&lt;/code&gt; location in code via the symbol used; and both statements can be in different methods altogether.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, if you indeed place &lt;code&gt;catch&lt;/code&gt; and &lt;code&gt;throw&lt;/code&gt; in different methods you have established a strong link between the two: in order for the program to work properly both need to use the same symbol and both must be aware of the returned object (if any) so the result of &lt;code&gt;catch&lt;/code&gt; can be processed in any meaningful way.  You also need to take care about the last statement in the block attached to the &lt;code&gt;catch&lt;/code&gt; in order to not accidentally return something which will be interpreted as thrown return value.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s fairly easy to complicate things even more by placing &lt;code&gt;catch&lt;/code&gt; and &lt;code&gt;throw&lt;/code&gt; methods in different classes or by nesting two or more &lt;code&gt;catch&lt;/code&gt; constructs.  It&amp;#8217;s fairly safe to say that these obfuscating effects are best avoided by using &lt;code&gt;catch ... throw&lt;/code&gt; in a single method only &amp;#8212; and in that case there are other control flow constructs that we can usually use.  In fact I am still searching for more convincing examples of &lt;code&gt;catch ... throw&lt;/code&gt; usage; so far the best contender has been a jump out of multiple nested loops.  Although, to me the &amp;#8220;multiple nested loops&amp;#8221; item has a slight &lt;a href="http://c2.com/xp/CodeSmell.html"&gt;code smell&lt;/a&gt; of its own.  But, read on&amp;#8230;&lt;/p&gt;
&lt;p&gt;&lt;ins&gt;&lt;em&gt;Addition:&lt;/em&gt;&lt;/ins&gt; Actually, you can &lt;a href="http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/branches/ruby_1_9_1/lib/find.rb?view=markup"&gt;find&lt;/a&gt; a good application of &lt;code&gt;catch ... throw&lt;/code&gt; in Ruby&amp;#8217;s standard library.  In this case it is elegant and &lt;code&gt;catch ... throw&lt;/code&gt; has the advantage of not interfering with exception handling: assume more complex code in the block which is passed to &lt;code&gt;Find.find&lt;/code&gt; which has a &lt;code&gt;rescue&lt;/code&gt; internally then using exceptions behind the scene might have surprising effects.&lt;/p&gt;
&lt;h3&gt;level-1n&lt;/h3&gt;
&lt;p&gt;You might be curious why I came up with this category.  Let&amp;#8217;s first look at an example: assume there is a method that yields all Ruby files which are found below any number of directories and we use that to write another method which returns the first of those files that we own:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
def find_my_ruby_file(*dirs)
  find_ruby_files *dirs do |f|
    return f if File.stat(f).owned?
  end
  nil
end
&lt;/pre&gt;
&lt;p&gt;Now, if &lt;code&gt;return&lt;/code&gt; is executed &lt;code&gt;find_my_ruby_file&lt;/code&gt; exits and returns &lt;code&gt;f&lt;/code&gt;, we go up one stack frame.  Visually this is true, but in reality, when &lt;code&gt;return&lt;/code&gt; is executed, the stack is several levels deeper than it appears to be.  You can easily check that by inserting something like &lt;code&gt;puts caller(0)&lt;/code&gt; before the &lt;code&gt;return&lt;/code&gt; &amp;#8211; here&amp;#8217;s a complete version that you can use to experiment:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
require 'find'

def find_ruby_files(*dirs)
  Find.find *dirs do |file|
    yield file if test ?f, file and /\.rb\z/i =~ file
  end
ensure
  puts 'exit find_ruby_files'
end

def find_my_ruby_file(*dirs)
  find_ruby_files *dirs do |f|
    if File.stat(f).owned?
      puts 'found!', 'stack &amp;lt;&amp;lt;&amp;lt;', caller(0), '&amp;gt;&amp;gt;&amp;gt; stack'
      return f
    end
  end
  nil
ensure
  puts 'exit find_my_ruby_file'
end

p find_my_ruby_file(*ARGV)
&lt;/pre&gt;
&lt;p&gt;You&amp;#8217;ll find some surprising entries there and as a side effect I now know one more example where &lt;code&gt;catch ... throw&lt;/code&gt; does seem like the best tool for the job&amp;#8230;&lt;/p&gt;
&lt;p&gt;This property of being able to exit more stack frames than visible on first sight is shared by &lt;code&gt;break&lt;/code&gt; which can be used to exit multiple stack levels in a similar way.  It won&amp;#8217;t leave the current method (i.e. the method in which it is &lt;a href="http://en.wikipedia.org/wiki/Lexical_scope#Static_scoping_.28also_known_as_lexical_scoping.29"&gt;lexically scoped&lt;/a&gt;) but via invoked methods with blocks the unnesting can include any number of stack frames.  This reinforces what we have said &lt;a href="002_Writing_Block_Methods.html"&gt;before&lt;/a&gt; since we now clearly see that there are more ways that a block of code can be left &amp;#8220;not normally&amp;#8221; besides &lt;code&gt;raise&lt;/code&gt; and &lt;code&gt;throw&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;What do we learn?&lt;/h3&gt;
&lt;p&gt;Looking at control flow constructs from an execution stack perspective has provided some interesting insights (at least to me).  While general rules for readability (proper indentation, limit the amount someone has to read etc.) do apply here as well, some of those constructs can have surprising effects on program behavior and should be used carefully.  This is especially true for the &amp;#8220;far reaching&amp;#8221; constructs which can actually make the stack unwind multiple levels &amp;#8212; or even change completely as in the case of &lt;code&gt;callcc&lt;/code&gt;.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Tue, 19 May 2009 20:46:00 -0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/004-Control_Flow.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/004-Control_Flow.html</guid></item><item><title>The Universe between begin and end</title><description>&lt;p&gt;This time we&amp;#8217;ll explore the space between &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt;.  Today&amp;#8217;s article won&amp;#8217;t be as much about individual best practices but rather I will try to explore various aspects of &lt;code&gt;begin ... end&lt;/code&gt; blocks so you can decide how to make best use of this tool.  In fact, there are so many aspects that this construct sometimes reminds me of a swiss army knife &amp;#8211; not so much because you can do everything with it but rather because it has so many features that you can use.  When it comes to control structuring elements &lt;code&gt;begin ... end&lt;/code&gt; is probably the most complex thing found in Ruby.&lt;/p&gt;
&lt;p&gt;One final introductory remark: although I am pretty sure that Ruby hasn&amp;#8217;t changed between 1.8 and 1.9 with regard to &lt;code&gt;begin ... end&lt;/code&gt; I did my tests with 1.9.1 only.  So if you find something to be wrong, please comment!&lt;/p&gt;
&lt;h3&gt;Starting the Investigation&lt;/h3&gt;
&lt;p&gt;For easier reference here&amp;#8217;s a block which contains all the options:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
begin
  # do our work
rescue
  # standard oops!
rescue SomeException =&amp;gt; e
  # oops!
rescue Exception =&amp;gt; e
  # deal with other errors
else
  # good, no exception surfaced!
ensure
  # good or bad, this needs to be done
end
&lt;/pre&gt;
&lt;p&gt;Note that you can replace &lt;code&gt;begin&lt;/code&gt; with &lt;code&gt;def meth...&lt;/code&gt; to define a method with &amp;#8220;integrated&amp;#8221; exception handling.  That way you can avoid one level of nesting.&lt;/p&gt;
&lt;p&gt;There are various interesting aspects to each section:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;control flow (most particularly, how is the section left?),&lt;/li&gt;
	&lt;li&gt;result (if any),&lt;/li&gt;
	&lt;li&gt;documentation (what does it tell me that code is in a particular section?).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&amp;#8217;ll keep these in the back of our heads when exploring section after section.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;begin&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This is the main section where you place the code that does the work you want to get done.  If evaluation reaches the end of this section normally the result of the last expression evaluated is also the result of this section.  In the absence of &lt;code&gt;rescue&lt;/code&gt; or &lt;code&gt;else&lt;/code&gt; that value is propagated to the surrounding context.&lt;/p&gt;
&lt;p&gt;If we look at control flow things start to get a bit more involved.  First, you should note that there are these additional ways this section (actually &lt;em&gt;any&lt;/em&gt; section of code) can be terminated:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;code&gt;return&lt;/code&gt; is executed,&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;break&lt;/code&gt; is executed in a &lt;code&gt;do&lt;/code&gt; block invoked,&lt;/li&gt;
	&lt;li&gt;an exception is triggered via &lt;code&gt;raise&lt;/code&gt;,&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;throw&lt;/code&gt; is invoked.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;How many of those did you think of?  (Did I miss another one?)  All these have one thing in common: the end of the section is not reached normally and the code produces a different result.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;rescue&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This can have an optional &lt;code&gt;ExceptionType =&amp;gt; variable&lt;/code&gt; in which case exceptions of this class and subclasses (if there are no preceeding rescue clauses with them) are caught.  If that optional part is missing only &lt;code&gt;StandardError&lt;/code&gt; and subclasses are caught.&lt;/p&gt;
&lt;p&gt;If there are multiple &lt;code&gt;rescue&lt;/code&gt; clauses, order matters.  You must rescue most specific errors first and less specificerrors later because otherwise a super class &lt;code&gt;rescue&lt;/code&gt; clause will shadow a sub class clause which comes later.  Only one rescue clause of a group is ever executed.&lt;/p&gt;
&lt;p&gt;In case a rescue clause is executed the result of the whole block is that of the code in the rescue clause.  In other words: when catching exceptions the exception code replaces the block&amp;#8217;s result.  Unless, that is, you invoke &lt;code&gt;raise&lt;/code&gt; or &lt;code&gt;retry&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The tricky part about &lt;code&gt;rescue&lt;/code&gt; is which exceptions to catch.  If you cannot handle an exception, you should not catch it because otherwise you will prevent other code that is capable of handling it to work.  Actually, as long as you do not &lt;code&gt;raise&lt;/code&gt; again, nobody outside will notice that there was an error in the first place.  A special case is &lt;code&gt;raise&lt;/code&gt; without arguments: sometimes it is reasonable to catch all exceptions, log the event and rethrow:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
def f
  do_work
rescue Exception =&amp;gt; e
  log.error "There was an error: #{e.message}"
  raise
end
&lt;/pre&gt;
&lt;h3&gt;&lt;code&gt;else&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This section does make sense only if there is a &lt;code&gt;rescue&lt;/code&gt; as well.  (You&amp;#8217;ll get a warning if you use it without it.)  If the main section completes regularly and an &lt;code&gt;else&lt;/code&gt; section is present it is executed.&lt;/p&gt;
&lt;p&gt;Now, some of the &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html#dsq-cite-8500023"&gt;discussion&lt;/a&gt; revolved around the utility of this section and whether code placed here is equivalent to code placed in other places.  While at first sight it may seem that you can just place it at the end of the main section a closer look reveals some subtleties, some of which have been mentioned already in the discussion.  I&amp;#8217;ll list them here anyway for completeness reasons:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;If the code raises an exception, it might be rescued when placed in the main section but it won&amp;#8217;t be rescued when placed in &lt;code&gt;else&lt;/code&gt; section.  This might also lead to unnecessary retries if there is a &lt;code&gt;retry&lt;/code&gt; in the &lt;code&gt;rescue&lt;/code&gt; clause.&lt;/li&gt;
	&lt;li&gt;Placing the code after &lt;code&gt;end&lt;/code&gt; generally has a similar effect as putting it in an &lt;code&gt;else&lt;/code&gt; section but if there is an &lt;code&gt;ensure&lt;/code&gt; section order of execution between the two code bits is reversed.  This can make a serious difference if the &lt;code&gt;else&lt;/code&gt; code uses a resource which is cleaned up in &lt;code&gt;ensure&lt;/code&gt;.&lt;/li&gt;
	&lt;li&gt;When using the version of &lt;code&gt;begin ... end&lt;/code&gt; in a method definition there is no place &amp;#8220;after end&amp;#8221; which is part of the method invocation:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre name="code" class="ruby"&gt;
def do_the_work
  some_code_which_may_throw
rescue ArgumentError
  $stderr.puts "Ooops! Passed the wrong argument."
else
  puts "Job done."
end
&lt;/pre&gt;
&lt;h3&gt;&lt;code&gt;ensure&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This section is the last one before the final &lt;code&gt;end&lt;/code&gt;.  Code in this section will be executed regardless of how the main section is left, i.e. even in case of &lt;code&gt;raise&lt;/code&gt;, &lt;code&gt;throw&lt;/code&gt;, &lt;code&gt;return&lt;/code&gt; and &lt;code&gt;break&lt;/code&gt;!  The result of the section is ignored unless you choose to explicitly &lt;code&gt;return&lt;/code&gt; it or raise an exception.  While raising exceptions might be reasonable in some cases, it should generally be avoided because those exceptions will shadow errors coming from the main section or from &lt;code&gt;rescue&lt;/code&gt; clauses:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):007:0&amp;gt; def f
irb(main):008:1&amp;gt; raise "Foo"
irb(main):009:1&amp;gt; ensure
irb(main):010:1* raise "Bar"
irb(main):011:1&amp;gt; end
=&amp;gt; nil
irb(main):012:0&amp;gt; f
RuntimeError: Bar
        from (irb):10:in `ensure in f'
        from (irb):10:in `f'
        from (irb):12
        from /usr/local/bin/irb19:12:in `&amp;lt;main&amp;gt;'
irb(main):013:0&amp;gt;
&lt;/pre&gt;
&lt;p&gt;Another thing you should definitively avoid is a &lt;code&gt;return&lt;/code&gt; statement in &lt;code&gt;ensure&lt;/code&gt; because this will shadow the result from the &amp;#8220;business logic&amp;#8221; code in the main section:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
irb(main):013:0&amp;gt; def f
irb(main):014:1&amp;gt; 1
irb(main):015:1&amp;gt; ensure
irb(main):016:1* 2
irb(main):017:1&amp;gt; end
=&amp;gt; nil
irb(main):018:0&amp;gt; f
=&amp;gt; 1
irb(main):019:0&amp;gt; def g
irb(main):020:1&amp;gt; 1
irb(main):021:1&amp;gt; ensure
irb(main):022:1* return 2
irb(main):023:1&amp;gt; end
=&amp;gt; nil
irb(main):024:0&amp;gt; g
=&amp;gt; 2
irb(main):025:0&amp;gt;
&lt;/pre&gt;
&lt;p&gt;There&amp;#8217;s a reason why the result of executing &lt;code&gt;ensure&lt;/code&gt; section is ignored: this code has nothing to do with the &amp;#8220;business logic&amp;#8221; which should go into the main section but is solely for cleaning up.  Assume you open a file with &lt;code&gt;io = File.open&lt;/code&gt; (which I hope you won&amp;#8217;t do after reading a &lt;a href="001-Using_blocks_for_Robustness.html"&gt;previous post&lt;/a&gt;), have search code in the main section and place &lt;code&gt;io.close&lt;/code&gt; in an &lt;code&gt;ensure&lt;/code&gt; section.  Then you would not want to shadow the result of the search by the result from the close operation.&lt;/p&gt;
&lt;p&gt;Although ehsanul &lt;a href="http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html#dsq-cite-8562863"&gt;questioned the utility&lt;/a&gt; of &lt;code&gt;ensure&lt;/code&gt; I certainly use it more often than &lt;code&gt;else&lt;/code&gt;.  While &lt;code&gt;else&lt;/code&gt; section code can be put somewhere else in some cases, there is no other place of code which can easily replace an &lt;code&gt;ensure&lt;/code&gt; clause and guarantee the same robustness of the code at the same time.  This is nicely demonstrated by his suggested alternative which uses only &lt;code&gt;rescue&lt;/code&gt; without &lt;code&gt;Exception&lt;/code&gt; &amp;#8212; this does not catch all exceptions!  &amp;#8220;Robustness&amp;#8221; in this case refers not only to runtime robustness but also robustness of the code against maintenance (i.e. changes).&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s assume you have caught all exceptions via &lt;code&gt;rescue&lt;/code&gt; clauses which do not raise and placed your cleanup code after &lt;code&gt;end&lt;/code&gt; as it was suggested.  Code works as expected and everything is fine.  You might have to use a local variable to make sure the proper value is returned from your method but this is just a nuisance.  Now all these changes will put execution of your cleanup code at danger:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;A &lt;code&gt;rescue&lt;/code&gt; clause is removed.&lt;/li&gt;
	&lt;li&gt;A previously uncaught exception type is thrown from the main section (the change need not be in your piece of code).&lt;/li&gt;
	&lt;li&gt;A &lt;code&gt;rescue&lt;/code&gt; clause is changed to raise an exception.&lt;/li&gt;
	&lt;li&gt;An &lt;code&gt;else&lt;/code&gt; section is introduced and code might throw.&lt;/li&gt;
	&lt;li&gt;Code in &lt;code&gt;else&lt;/code&gt; section is changed to raise an exception.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And lastly, do not underestimate readability!  Placing code in an &lt;code&gt;ensure&lt;/code&gt; section tells the reader immediately that he can ignore it when trying to understand what the main purpose of the code is.  This is &amp;#8220;only&amp;#8221; cleanup which makes sure some resources that were used for the calculation are not kept longer than needed.  Whereas, if you place that code after &lt;code&gt;end&lt;/code&gt; it could belong to the normal flow of the core business logic.&lt;/p&gt;
&lt;h3&gt;Guidelines&lt;/h3&gt;
&lt;p&gt;So after all the detail here are some rules which I hope will guide you in making the most appropriate use of &lt;code&gt;begin ... end&lt;/code&gt; blocks.&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;If you have cleanup code that must be executed under all circumstances, put it in &lt;code&gt;ensure&lt;/code&gt;.&lt;/li&gt;
	&lt;li&gt;Do not place &lt;code&gt;return&lt;/code&gt; or &lt;code&gt;break&lt;/code&gt; in &lt;code&gt;ensure&lt;/code&gt; sections and try to avoid throwing exceptions from them.&lt;/li&gt;
	&lt;li&gt;Place &lt;code&gt;rescue&lt;/code&gt; clauses for more specific exceptions (sub classes) before those for less specific ones (super classes).&lt;/li&gt;
	&lt;li&gt;&lt;code&gt;rescue&lt;/code&gt; the most specific exception you can handle.&lt;/li&gt;
	&lt;li&gt;Do not &lt;code&gt;rescue&lt;/code&gt; exceptions that you cannot or do not want to handle.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Keep in mind that these are just guidelines and you have to check on a case by case basis which is the most appropriate solution.  As Buddha says: verify the teaching with your own mind.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Fri, 01 May 2009 10:05:00 -0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/003-The_Universe_between_begin_and_end.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/003-The_Universe_between_begin_and_end.html</guid></item><item><title>Writing Block Methods with automatic Resource Cleanup</title><description>&lt;p&gt;After we have seen how &lt;tt&gt;File.open&lt;/tt&gt; with a block is safer than without we will look into how such methods are created today.&lt;/p&gt;
&lt;h3&gt;Ingredients&lt;/h3&gt;
&lt;p&gt;We need two ingredients:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;&lt;tt&gt;yield&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;A &lt;tt&gt;begin &amp;#8230; end&lt;/tt&gt; block with &lt;tt&gt;ensure&lt;/tt&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;tt&gt;yield&lt;/tt&gt; is a keyword and invokes the block which is passed to a method. If the caller did not provide a block he&amp;#8217;ll earn a &lt;tt&gt;LocalJumpError&lt;/tt&gt;. Result of evaluating &lt;tt&gt;yield&lt;/tt&gt; is the value of the block (remember, we also called them &amp;#8220;anonymous functions&amp;#8221;).&lt;/p&gt;
&lt;p&gt;As you will probably know &lt;tt&gt;begin &amp;#8230; end&lt;/tt&gt; blocks can be used to catch exceptions and handle them properly. But this construct has two more features apart from &lt;tt&gt;rescue&lt;/tt&gt;; a full blown block might look like this:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
begin
  # do our work
rescue SomeException =&amp;gt; e
  # oops!
rescue Exception =&amp;gt; e
  # deal with other errors
else
  # good, no exception surfaced!
ensure
  # good or bad, this needs to be done
end
&lt;/pre&gt;
&lt;p&gt;Code in an &lt;tt&gt;else&lt;/tt&gt; clause after &lt;tt&gt;rescue&lt;/tt&gt; is executed when the block is left normally, i.e. without an exception being thrown. Note that in case of an exception it is irrelevant whether it is caught in this &lt;tt&gt;begin &amp;#8230; end&lt;/tt&gt; block or not &amp;#8212; &lt;tt&gt;else&lt;/tt&gt; will not be executed. Code after &lt;tt&gt;ensure&lt;/tt&gt; is executed under all circumstances &amp;#8212; regardless whether an exception is thrown or not. This is the feature that we&amp;#8217;ll exploit for our cleanup.&lt;/p&gt;
&lt;p&gt;An important thing to know is that the result of the &amp;#8220;ensure&amp;#8221; code does not affect the block&amp;#8217;s result which is normally the value of the last expression evaluated between &lt;tt&gt;begin&lt;/tt&gt; and the first &lt;tt&gt;rescue&lt;/tt&gt;. So anything the cleanup code returns is invisible to the caller (which makes perfect sense if you think about it). Results of &lt;tt&gt;rescue&lt;/tt&gt; and &lt;tt&gt;else&lt;/tt&gt; clauses &lt;em&gt;are&lt;/em&gt; retained when they are executed.&lt;/p&gt;
&lt;h3&gt;Cooking&lt;/h3&gt;
&lt;p&gt;Assume we have a class that distributes stdin to a set of files much the same as the &lt;a href="http://en.wikipedia.org/wiki/Tee_(command)"&gt;&amp;#8216;tee&amp;#8217; command line utility&lt;/a&gt; does. This class features a method &lt;tt&gt;open&lt;/tt&gt; which opens files and another method &lt;tt&gt;close&lt;/tt&gt; which closes all file descriptors. Then we can create a class method similar to &lt;tt&gt;File.open()&lt;/tt&gt; which does the automatic cleanup via an &lt;tt&gt;ensure&lt;/tt&gt; section:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
class Tee
  def self.open(file_names, mode = "w")
    tee = new(file_names, mode)
    tee.open

    if block_given?
      begin
        yield tee
      ensure
        tee.close
      end
    else
      tee
    end
  end
end
&lt;/pre&gt;
&lt;p&gt;This code does actually work similar to &lt;tt&gt;File.open&lt;/tt&gt; because it acts differently depending on the presence of a block:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;If there is no block, then the &amp;#8220;tee&amp;#8221; is created and opened. It is the return value of the method.&lt;/li&gt;
	&lt;li&gt;If there is a block, then that is called with the opened &amp;#8220;tee&amp;#8221; instance and after termination this is closed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;#8217;s really all there is to it. A final remark: in order to ensure that this pattern works properly you need to make sure the initialization is done &lt;em&gt;before&lt;/em&gt; the &lt;tt&gt;begin&lt;/tt&gt;. Otherwise any exceptions thrown during initialization will trigger the cleanup code which then will act on an incomplete or completely different object altogether (chances are that you will get &lt;tt&gt;NoMethodError&lt;/tt&gt; from &lt;tt&gt;nil&lt;/tt&gt; in this case).&lt;/p&gt;
&lt;p&gt;You can also look at the &lt;a href="http://gist.github.com/98693"&gt;full code&lt;/a&gt; if you want to see the complete story.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Mon, 20 Apr 2009 19:47:00 -0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html</guid></item><item><title>Using Blocks for Robustness</title><description>&lt;p&gt;Ruby&amp;#8217;s blocks can be used for many purposes &amp;#8212; in fact, they might well be the most used feature of the language. Today we will start looking at a frequently-used idiom, analyze its implications for robustness and demonstrate how blocks can greatly improve it.&lt;/p&gt;
&lt;h3&gt;Robustness&lt;/h3&gt;
&lt;p&gt;When talking about robustness of software&lt;sup class="footnote"&gt;&lt;a href="#fn1"&gt;1&lt;/a&gt;&lt;/sup&gt; we mean the degree to which it is able to function properly under changed circumstances.  Those changes could be internal (i.e. introduced by a code change) or external (e.g. by changed input data).  Sometimes achieving robustness is a hard task especially for complex software like operating systems.  In this article we&amp;#8217;ll look at robustness at a smaller scale.&lt;/p&gt;
&lt;h3&gt;A common file handling idiom&lt;/h3&gt;
&lt;p&gt;When you read through ruby-talk you will find code that looks like &lt;a href="http://groups.google.com/group/comp.lang.ruby/msg/168977b073d02ad7"&gt;this&lt;/a&gt; (I just did some minimal adjustments to formatting and corrected obvious errors):&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # read file into an array
  f = File.open("mylogfile.log")
  saved = File.open("archive.log", "a")
  results = f.readlines
  # Is there a way to remove everything that's been read here?
  f.close

  results.each_line do |sending|
    conn_to_db.print("data #{sending}")
    # Make sure we keep a backup
    saved.puts("#{sending}")
  end

  # Close the archive file
  saved.close
&lt;/pre&gt;
&lt;p&gt;The logic is straightforward: a file is opened, all its lines are read and then it is closed again. Similarly, the second file is opened for appending, then written to and finally closed. Sure, the second file could be opened later, but this is not where I want to draw your attention to.&lt;/p&gt;
&lt;p&gt;Rather, please notice that &lt;tt&gt;File.open&lt;/tt&gt; is used together with an explicit &lt;tt&gt;close&lt;/tt&gt;. Closing the file IO object ensures that all pending output is written to the file and operating system resources associated with the underlying file handle are freed. While not closing a read only file &lt;em&gt;usually&lt;/em&gt; does not have dramatic consequences, not closing a file opened for writing likely &lt;em&gt;has&lt;/em&gt; dramatic consequences. You can see it by running this bit of code through Ruby 1.8.*:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  NAME = "foo"

  f1 = File.open NAME, "w"
  f1.puts "text 1"
  # forgot to close f1
  f2 = File.open NAME, "a"
  f2.puts "text 2"
  # forgot to close f2
&lt;/pre&gt;
&lt;p&gt;On my system (CentOS 5.3, Ruby 1.8.5) I see only the first line in the file but the second &lt;tt&gt;puts&lt;/tt&gt; has no effect.  Even if you add a &lt;tt&gt;f2.close&lt;/tt&gt; at the end of the script you won&amp;#8217;t see any difference.  You need to add the &lt;tt&gt;f1.close&lt;/tt&gt; before you open &lt;tt&gt;f2&lt;/tt&gt; to actually see two lines in the output file.&lt;/p&gt;
&lt;h3&gt;File.close is the solution, or is it?&lt;/h3&gt;
&lt;p&gt;We just remember to always add a &lt;tt&gt;f.close&lt;/tt&gt; whenever we have opened a file and be done. Everything written to the IO object will reach the file and we do not need to worry about anything any more.  Or do we?  Unfortunately, this is not the case.  Especially in situations like the one shown above where exceptions can happen (we do not know what &lt;tt&gt;conn_to_db.print&lt;/tt&gt; does, but it looks like another IO operation which could fail) we need to take additional measures to &lt;em&gt;ensure&lt;/em&gt; the file is really closed.&lt;/p&gt;
&lt;p&gt;We can use &lt;tt&gt;begin &amp;#8230; rescue &amp;#8230; end&lt;/tt&gt; to do proper exception handling and make sure file descriptors are closed properly.  But this tends to get verbose.  Fortunately there is another option: &lt;tt&gt;File.open&lt;/tt&gt; accepts a block and ensures that the file descriptor is closed regardless how the block is left.&lt;/p&gt;
&lt;h3&gt;Large files&lt;/h3&gt;
&lt;p&gt;When dealing with files as shown in the code example above there is another potential source of trouble: if files can grow arbitrarily large they will burn a lot of memory when read as a whole or even terminate the program because there is not enough memory available.&lt;/p&gt;
&lt;p&gt;For reading files there is an even better alternative: &lt;tt&gt;File.foreach&lt;/tt&gt; will read a file line by line (or using a different separator given as second argument) and hand each string to the block that is provided.  This saves an additional &lt;tt&gt;io.each {|line| &amp;#8230;}&lt;/tt&gt; inside &lt;tt&gt;the File.open&lt;/tt&gt; block &amp;#8212; less typing and one level of indentation less.  Also, as an added benefit we can efficiently process large files because we do not have to hold the complete file in memory when using iterative solutions.&lt;/p&gt;
&lt;h3&gt;Example rewritten&lt;/h3&gt;
&lt;p&gt;With these tools equipped we are now ready to rewrite the original bit to a more robust version:&lt;/p&gt;
&lt;pre name="code" class="ruby"&gt;
  # read file and write to archive
  File.open("archive.log", "a") do |saved|
    File.foreach("mylogfile.log") do |sending|
      conn_to_db.print("data #{sending}")
      # Make sure we keep a backup
      saved.puts(sending) # removed superfluous string interpolation
    end
  end
&lt;/pre&gt;
&lt;p&gt;Note, that if the timing of the original script is critical we might want to use &lt;tt&gt;File.readlines&lt;/tt&gt; in order to read in all lines of &amp;#8220;mylogfile.log&amp;#8221; before starting to write to the &amp;#8220;archive.log&amp;#8221; as the original code does.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Using Ruby&amp;#8217;s blocks goes a long way to making programs and scripts that deal with file IO more robust with regard to&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;exceptions occurring somewhere along the way,&lt;/li&gt;
	&lt;li&gt;large files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, whenever you do file IO in a Ruby program, remember you can use&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;the block form of &lt;tt&gt;File.open&lt;/tt&gt; or&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;File.foreach&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;to make your code more robust without paying a significant price.&lt;/p&gt;
&lt;p&gt;You can even resort to &lt;tt&gt;File.readlines&lt;/tt&gt; if you have to read the whole file into memory and at least save a bit of typing.  If you make it a habit to always use these idioms chances are that you&amp;#8217;ll save a lot file IO related bug hunting in the future.&lt;/p&gt;
&lt;p&gt;Next time I will look into how to create a method like &lt;tt&gt;File.open&lt;/tt&gt; which uses a block to ensure automatic resource deallocation.&lt;/p&gt;
&lt;p class="footnote" id="fn1"&gt;&lt;sup&gt;1&lt;/sup&gt; Here is a &lt;a href="http://www.linfo.org/robust.html"&gt;definition of robust&lt;/a&gt; and a &lt;a href="http://en.wikipedia.org/wiki/Robustness_Principle"&gt;Wikipedia Article about Robustness Principle&lt;/a&gt;.&lt;/p&gt;</description><author>shortcutter@googlemail.com (Robert Klemme)</author><pubDate>Thu, 09 Apr 2009 14:25:00 -0000</pubDate><link>http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html</link><guid>http://blog.rubybestpractices.com/posts/rklemme/001-Using_blocks_for_Robustness.html</guid></item></channel></rss>