Wednesday, October 21, 2009

Answer to: A Symbolic Puzzler

This blog post contains the results and answer to the previous post, A Symbolic Puzzler.

The answer will be covered here, and I'll follow this up with a fourth post that covers my thoughts on this issue.

What were your guesses?



Clearly, the most popular answer was that the test would fail.

What would have been my guess?

Look, nobody codes in puzzler fashion, so without details I'll explain what I expected to occur from my own production code, but in terms of this test. I wouldn't be confident that TESTDIR_SYMLINK.getCanonicalPath() returned the non-canonicalized location, but excepting that, I certainly would assume that once the symbolic link was created at the end, the second call to TESTDIR_SYMLINK.getCanonicalPath()would return the symlinked directory. So my guess would have been a. It passes.

The Answer

The correct answer is: d. It depends. More specifically, it depends on the VM's arguments.

The Explanation

If you ran this test straight up without any special VM arguments, the test would fail (which might lead you to think the answer is b. It fails.)
junit.framework.ComparisonFailure: expected:</testdir/[file]> but was:</testdir/[symlink]>
    at junit.framework.Assert.assertEquals(Assert.java:81)
    at junit.framework.Assert.assertEquals(Assert.java:87)
    at ATest.testSymlink(ATest.java:26)
    ...
Why wouldn't the canonicalization return the updated value? It's because the return value from getCanonicalPath was cached from the previous call. Yes, calls to getCanonicalPath are cached.

Let's look at the code underneath getCanonicalPath. The magic lies in some package-private classes in the java.io package, specfically Filesystem and UnixFilesystem. The key operation occurs in UnixFilesystem.canonicalize:

class UnixFileSystem extends FileSystem {
  public String canonicalize(String path) throws IOException {
     if (!useCanonCaches) {
       return canonicalize0(path);
     } else {
       String res = cache.get(path);
       ... 
     }
  }
  private native String canonicalize0(String path) throws IOException;
}

In other words, the path canonicalization computations are cached when useCanonCaches is true. So just when is useCanonCaches true? For that let's look at the static initialization block for Filesystem, the subclass of UnixFilesystem:

// Flags for enabling/disabling performance optimizations for file
// name canonicalization
static boolean useCanonCaches      = true;
static boolean useCanonPrefixCache = true;
... 

static {
    useCanonCaches      = getBooleanProperty("sun.io.useCanonCaches",
                                             useCanonCaches);
    useCanonPrefixCache = getBooleanProperty("sun.io.useCanonPrefixCache",
                                             useCanonPrefixCache);
} 

So by default, the cache canonicalization is on, but when you specify the VM arg -Dsun.io.useCanonCaches=false, the cache is never used.

Getting back to the puzzler, the first call to TESTDIR_SYMLINK.getCanonicalPath() always returns the path to the symlink, while the second call returns either the cached value (the path to the symlink) or the up-to-date resolved symlink, but only when -Dsun.io.useCanonCaches=false is specified.

If you don't beleive me now, go try running the test twice, once without specifying VM arguments, and once while disabling the canonicalization cache, and you'll see that it fails once, and passes another time. Hence, d. It depends.

    Replies to some of the comments

    Here are some of the comments that accompanied the survey:
    Guess: It fails.
    Comment:The target of the symlink doesn't exist so I suspect exists() will return false
    In fact, exists() will return true since the symlink exists. And in fact, the next call to getCanonicalPath returns itself, just as the comment suggested.
    Guess: It depends.
    Comment: Depending on where the root filesystem is mounted, the canonical path might be something else. For example, /etc on Mac is a symlink to /private/etc. In addition, the mountpoint for / might be a networked drive (e.g. netboot) which might have different semantics.

    The bottom line is that you can't necessarily assume that the file you use to access a file is the canonical path for that file, without knowing the filesystem.
    This is an interesting comment. I had hoped this was cleared up by this statement in the original post: "The path /testdir is not a symbolic link to another directory, and the user running the test also owns /testdir." If my explanation was insufficient, then so be it.

    There were other comments to that effect: "Does /bin/rm do what /bin/rm should do?" "Does the user have write permissions?" These are good points, and would be better suited for a UNIX puzzler. I hope people who worried about those cases looked past them to focus on Java's behavior.
    Guess: It throws an exception
    Comment: I want be the first one to check "it throws an exception"!
    Sorry, not the first.

    Meta: Thoughts on writing a puzzler

    There are at least two places where the puzzler's code could have been simplified without sacrificing its quality:
    1. Replace explicit calls that delete the files /testdir/file and /testdir/symlink, with a communicated precondition that /testdir had no files.
    2. This test has an interim call that computes the canonical path of a symlink that points to nothing, leading people to spend too much time worrying about that case. A better snippet would:
      1. point the symlink to a file that exists.
      2. compute the symlink's canonical location thereby (optionally) populating the internal cache.
      3. point the symlink to a second file that also exists.

    No comments: