Tuesday, July 9, 2013

Immutable Java: Lists and Other Collections

Update 2017-09-14

While the techniques in this article still work, I prefer to use Paguro. Here's how:

List

public static final List<String> UNSAFE_ZONES =
        vec("Africa/Cairo",
            "Africa/Johannesburg",
            "America/Anchorage");

Set

private static final Set<String> validModes =
        set(MODE_PREADD, MODE_ADD,
            MODE_PREUPDATE, MODE_UPDATE)));

Map

private static final Map<String,String> shortNameHash =
        map(tup(SERVER_NAME_DEMO, "demo"),
            tup(SERVER_NAME_UAT, "uat"),
            tup(SERVER_NAME_INTEGRATION, "integration"),
            tup(SERVER_NAME_DEV, "dev"));

Paguro

The above have many advantages over the original suggestions in this article:

  • Brevity and Clarity - no extra junk, just declare your collection.
  • No static initializer blocks for maps. If you've ever created a dependency loop inside these blocks you'll appreciate this.
  • Immutability - These Paguro collections are just as safe as Collections.unmodifiables. You can't modify them in place, but you can modify them, producing an entirely new collection with items added, removed, or changed. Java's unmodifiable collections require copying the entire collection in order to change it - an O(n) operation. Paguro collections are designed to copy-on-write with maximum sharing between versions of a collection, so that modifications are O(log n) (usually with a high base, often approaching O(1)). Paguro also provides mutable builders to create immutable collections even faster (if you care about that).

Original Post

Lists

What happens to the UNSAFE_ZONES list in the following code when it is passed to doSomething()?

public static final List<String> UNSAFE_ZONES = Arrays.asList(
        "Africa/Cairo",
        "Africa/Johannesburg",
        "America/Anchorage",
        ...);

displayZones(UNSAFE_ZONES);

A method named displayZones(List zones) sounds pretty safe and it probably won't change the contents of the list you pass it. But unless you dig into that method very carefully, you can't know that for sure. Even if displayZones() is very simple and safe today, someone could use it to wrap third-party logic tomorrow, or make it pass your UNSAFE_ZONES to another method that changes it.

Maybe everything works perfectly, but you don't sleep well at night because someone could, at any time, intentionally or accidentally change your list. Maybe it's already happening somewhere, but it's caused by such a rare data condition that you just haven't noticed yet. Maybe you sleep well, but keeping track of every place that everything could possibly be modified in your application is using valuable thought-space, distracting you from more interesting problems.

Time zones don't change very often. This list probably only needs to be updated once or twice a year. If you make releases more often than that, there is no reason that a list like this needs to change *ever* except at compile time, or maybe when it is read from a file on application start-up. Any attempt by your program to change it on the fly is an error. So make it immutable.

Here is an immutable List in Java 5 or later.

Immutable List

public static final List<String> TIME_ZONES =
        Collections.unmodifiableList(Arrays.asList(
                "Africa/Cairo",
                "Africa/Johannesburg",
                "America/Anchorage",
                ...));

People always bring up concurrency with relation to immutable data structures, and immutability is a godsend with regard to concurrency. You can pass immutable objects and collections around freely between any number of threads without any locking, synchronization, defensive copies, or contention which is a big win both in terms of performance and simplicity. But on a day-to-day level, even with a single thread, the huge benefit to this style of coding is that you aren't left wondering if something changed your collection, because that is just not possible.

Imagine now that you are the author of displayZones() and that you want to communicate to people who use your API that displayZones() will never modify the list that it is sent. One way to do this is to write a comment in the JavaDoc like, "I promise never to modify your list inside displayZones()." For this to be effective, people have to 1. read the JavaDoc and 2. believe you. If I wrote that, would you believe me? Heck, I wouldn't believe myself if I wrote it last month!

It would be fantastic if the compiler would throw a nice fat warning if someone ever updated displayZones() to modify the list it was sent? Unfortunately (thank you alexandroid) even iterator has a remove() method and even in Java 8, streams have a toIterator() method, so there is no safe interface to pass to a function.

// Unsafe - iterator has the remove() method
public void displayZones(Iterable zones) { ...

If Iterable provided only read-only methods for traversing it in order, the caller would could pass a modifiable list without worrying about you changing it, because you could't. No defensive copies, no worries. They could pass any Collection, either mutable or immutable and it wouldn't matter. Alas, this it not the case.

There is still one benefit to passing an immutable collection. If you ever change displayZones to take a mutable List and the caller is already passing a mutable List, they won't get a compiler warning. Won't they be surprised when their list is modified! Immutability here future-proofs the caller's code. If the caller is passing a mutable list though, maybe it means they don't care if it is modified? In any case, unnecessary use of mutable data is a dangerous trap.

Other Collections

List is far from the only type of collection. Let's take a moment to look at what it's like to create an immutable Set or Map in Java (I think it works in Java 5 and 6 if you just fill in the empty <>s). If anyone knows a briefer/better way to do this, please leave a comment:

Immutable Set

private static final Set<String> validModes = Collections.unmodifiableSet(
        new HashSet<>(Arrays.asList(MODE_PREADD, MODE_ADD,
                                    MODE_PREUPDATE, MODE_UPDATE)));

EnumSet is preferred for enums because it executes faster and has a briefer syntax. EnumSets are ordered according to their "natural ordering" which is the order in which the enum constants are declared.

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.of(Mode.PREADD, Mode.ADD, Mode.PREUPDATE, Mode.UPDATE));

If you are using all values of an enum in order:

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.allOf(Mode.class));

Immutable Map

private static final Map<String,String> shortNameHash;
static {
    Map<String,String> m = new HashMap<>();
    m.put(SERVER_NAME_DEMO, "demo");
    m.put(SERVER_NAME_UAT, "uat");
    m.put(SERVER_NAME_INTEGRATION, "integration");
    m.put(SERVER_NAME_DEV, "dev");
    shortNameHash = Collections.unmodifiableMap(m);
}

It is critical in the Map example that the temporary Map m be scoped inside a dedicated block so that it passes out of scope and can never be accessed by anything after the immutable version of it has been created. Map m remains mutable forever, and a lexical closure (block) is the simplest way to keep any user accessible code from maintaining a direct reference to it.

Type Casting

As I've said before, it can be a pain to cast a collection in Java, especially compared to Scala. Unlike (invariant) generic collections, you can painlessly cast an array to its super-type. No need to suppress any unchecked or rawtypes warnings because this is just how arrays "work" (they are covariant). Fortunately, we happen to start with an array in most of the above examples. The example below shows an enum that implements an interface and provides an immutable list of its members (with the type of the interface):

public enum TimeFrame implements DropDownItemInterface {
    ...
    public static final List<DropDownItemInterface> ddiVals =
            Collections.unmodifiableList(Arrays.asList(
                (DropDownItemInterface[]) values()));

If you try casting the resulting list instead, you will see why I bothered to point this out (and why so many people prefer dynamic languages).

Effectiveness

Up to this point, Java is a little wordy, but effective. Where it really breaks down is that List is the only collection which extends an immutable interface. If you had a getShortName(Map m) method, there is no effective way to tell the caller that this method cannot modify the map you pass it. Google Guava falls short here too because its ImmutableMap data structure inherits from Map instead of the other way around. This needs to be fixed at the Java API level, or else, people need to start importing from some new collections API instead of java.util.

Scala and Clojure both make all their collections immutable by default. The mutable version of each type of collection is a sub-class of the immutable one. In either language, you could say getShortName(ImmutableMap m) (or similar) and have the benefits I outlined above. Java could do this too, and I feel very strongly that they should.

The reason why Scala and Clojure collections can be immutable by default is that they are implemented to allow very lightweight copies to be made very quickly. The immutable collections in these languages still have add() and put() methods on them (or equivalent). They just return an entirely new collection which includes the modification of the old one. In a hash-table based collection, only the hash bucket which is changed even needs to be copied over to the new collection. The other buckets can be shared because the collection is immutable!

Java could allow the same kind of fast, shallow copies of collections, but it has the ball and chain of 15-year-old add() and put() methods that return a boolean value instead of the underlying collection. Because this pollutes the namespace for those methods, new method names (like append() and prepend()) would have to be made for all modification operations so that they could return an immutable modified copy of the original collection. The old methods could be deprecated over time. This could lead to some short-term confusion, with people wondering why their immutable collection wasn't changed by an append() call, but I personally believe that it would be worth it in the long run.

Sometimes you need mutable data structures and by all means, use them. But when you don't, prefer immutability and you'll sleep better, think clearer, and write more robust code.

3 comments:

alexandroid said...

Since Iterable provides only read-only methods for traversing it in order, the caller knows that even if they pass you a modifiable list, that they don't need to worry about you changing it, because you can't. No defensive copies, no worries. They can pass any Collection, either mutable or immutable and it won't matter.

What about Iterator.remove()?

Unknown said...

Doh! You are right! Thank you for reading and sharing your thoughts. I'll update appropriately.

Anonymous said...

The problem with unmodifiable list of the jdk is, that it is not immutable, because it does not make a copy of the original list on construction. Further it breaks the api of List by eventually throwing UnsupportedOperationExceptions.