NOTE:This blog had a good run, but is now in retirement.
If you enjoy the content here, please support Gregory's ongoing work on the Practicing Ruby journal.

USP: IO#dup and the dup(2) system call

2011-10-19 04:59, written by Eric Wong

IO#dup vs. Object#dup

IO#dup is Object#dup in Ruby: it creates a shallow copy of an existing object. To create a shallow copy, the IO#initialize_copy callback method performs the dup(2) syscall on the underlying file descriptor the IO object wraps.

Like Object#dup in Ruby, dup(2) is a shallow clone that does not copy the underlying open file object in the kernel, but creates a new reference to an existing kernel object.

Thus, two (or more) file descriptors in the same process can refer to the same open file in the kernel.

Before calling IO#dup, we have a 1:1:1 relationship:

  • one Ruby IO object
  • one file descriptor
  • one open file object in the kernel
    [Ruby]    user space   |  kernel space
    ------------------------------------------------
                           |
    io_orig ----------- fd[orig] ----> file object
                           |
    ------------------------------------------------
    (file descriptors (fd) are the bridge here kernel and user space)

After we call IO#dup, we have two 2:2:1 relationship:

  • two Ruby IO objects
  • two file descriptors
  • one file object in the kernel
    [Ruby]    user space   |  kernel space
    ------------------------------------------------
                           |
    io_orig ----------- fd[orig] -\
                           |       >---> file object
    io_copy ----------- fd[copy] -/
                           |
    ------------------------------------------------

IO#dup can be called on the same IO object any number times, so there may be an N:N:1 relationship as long as the process (and system) resource limits are not exceeded.

Most kernel-level (but not user space) changes to one IO object are immediately visible in the IO object(s) it was copied from (or copied to).

Effect on IPC

IO#dup means IO#close / close(2) will only remove a reference to the file object in the kernel. Only when the last file descriptor for a given file object is closed is the actual file object closed and released in the kernel.

For applications relying on receiving an end-of-file condition (from a socket or pipe), IO#dup1 can (sometimes inadverdantly) prevent the end-of-file condition from being reached in the reader.

1 – and similar functions, like fork()

License: GPLv3 (or later, at the discretion of Eric Wong)

blog comments powered by Disqus