Monday, October 28, 2013

Expression Problem: Discussion

logaan was kind enough to leave some comments on twitter regarding my recent post on Refactoring in Java, Scala, and Clojure. I needed more than 140 characters so I am responding here.

Discussion

Glen replied: (Untyped) Clojure wraps the function with converters instead of wrapping a user-defined data type. Functional view of same issue?

logaan replied: My understanding is that the expression problem is linked to polymorphism. Clojure's solution is protocols. Protocols allow you to add more types to existing behaviour and more behaviour to existing types. In Scala it needs implicits.

Reply

Thank you Logaan for your insightful comments. I really appreciate you taking the time to read my article and offer constructive criticism. Forming this response was very educational for me.

Chris Houser's Solution to the Expression Problem on InfoQ was a very interesting talk and, I assume, the basis for your criticism. It turns out that there was a debate on Lambda the Ultimate about the nature of The Expression Problem in response to Chouser's presentation.

Since Clojure rejects type safety as a design goal, applying Clojure to the Expression problem seemed somewhat of a stretch to me. On the other hand, Martin Odersky's paper points out:

The term expression problem was originally coined by Phil Wadler in a post on the Java-Genericity mailing list, in which he also proposed a solution written in an extended version of Generic Java. Only later it appeared that Wadler’s solution could not be typed.

If Philip Wadler's solution to his 1998 problem was not type-safe, that made me think the door was open to applying this problem to a dynamic language like Clojure.

In terms of needing Scala's implicits to solve the Expression problem, there may be some aspect to this that I was not understanding. It was actually Odersky's paper that made me think traits were the solution to this problem. I am impressed with how well they solve it. Especially compared to all the typing Java requires when I have 8 implementing classes of an interface that needs to change!

Ultimately, the exact definition of Wadler's original problem is much less interesting to me than solving or side-stepping this general type of problem. In response to your insightful criticism, I have renamed my post, "Refactoring in 3 Languages" and provided an alternate Clojure solution below. Thank you again for taking the time to respond.

Alternate Solution

The Clojure page on protocols actually mentions the Expression Problem. But protocols look like more complication than my little example requires. I can imagine how more complicated "Expression-Like Problems" are well served by records and protocols, yet I feel that Clojure's sweetest spot involves side-stepping these issues when practical by ignoring data specification as much as possible (by leveraging maps).

Here is an alternate "solution" to my earlier post using records and protocols:

(defrecord YearMonth [year month])

(defprotocol MONTH_ADDIBLE 
  (addMonths [MONTH_ADDIBLE mos]))

(extend-type YearMonth
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth)))))
                 
;; Tests
(addMonths (YearMonth. 2013, 7) 2)
;; #user.YearMonth{:year 2013, :month 9}
(addMonths (YearMonth. 2012, 12) 1)
;; #user.YearMonth{:year 2013, :month 1}
(addMonths (YearMonth. 2013, 1) -1)
;; #user.YearMonth{:year 2012, :month 12}

;; With an additional field
(addMonths (assoc (YearMonth. 2013, 7) :otherField1 "One") 2)
;; #user.YearMonth{:year 2013, :month 9, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2012, 12) :otherField1 "One") 1)
;; #user.YearMonth{:year 2013, :month 1, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2013, 1) :otherField1 "One") -1)
;; #user.YearMonth{:year 2012, :month 12, :otherField1 "One"}

(defrecord YearMonth2 [yyyyMm])

(defn yearAndMonthToYm [year month] (+ (* year 100) month))

(extend-type YearMonth2
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [year (quot (:yyyyMm ym) 100),
            newMonth (+ (rem (:yyyyMm ym) 100) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :yyyyMm (yearAndMonthToYm (+ year (quot m 12)),
                                                             (+ (rem m 12) 1))))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ year (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :yyyyMm (yearAndMonthToYm y, m)))
                 :else (assoc ym :yyyyMm (yearAndMonthToYm year, newMonth))))))

(addMonths (YearMonth2. 201307) 2)
;; #user.YearMonth2{:yyyyMm 201309}

;; With an additional field
(addMonths (assoc (YearMonth2. 201307) :otherField2 1.1) 2)
;; #user.YearMonth2{:yyyyMm 201309, :otherField2 1.1}

Sunday, September 29, 2013

Refactoring in Java, Scala, and Clojure

Update: Attend my presentation on this post at DevNexus February 25th 2014

Motivation

I don't use the words, Strongly-typed or Dynamic much in this post, but thinking about the relative costs and benefits of type safety was the primary inspiration behind it. You may want to keep your own mental tally of how type-safety helps and how it hurts in the following examples. Because type-safety means so many different things in different languages, I can only assess it's practical merits (or detriments) in the context of real-world examples - hence this post. But maybe after looking at these examples, you can infer general principals from them?

I personally find type safety to be useful at work because of the person-years of development that went into the system that I work on most. But I am not trying to convert people to type-safety, or away from it. My goal is to make the issues more visible so that we can all write better code more easily in the future. When I gave this talk to the Asheville Coders League, I felt some measure of satisfaction that one person told me afterwards that they were going to look into Clojure and another that they would look into Scala.

Problem

When I change an interface in Java, I then have to update all its implementations which can be difficult and time consuming. Making a wrapper class is often even more time consuming. This pain in the neck is sometimes called, "The Expression Problem" and it makes a good example for comparing these three languages.

For this comparison, I will use a class that models a Year/Month combination and a function, "addMonths" that takes the number of months to add (positive or negative) and returns a new YearMonth. Originally we used a data structure with 2 fields (year and month), but it complicated the database queries for ranges of months, so we changed it to a single int of the format YyyyMm. Now we can use > and < to compare YyyyMm fields in our SQL queries - and the raw data is still human readable.

Despite wild claims that Object Oriented Programming is all about mutation, I'm going to use immutable classes in all three languages (a few mutable local variables are used, but never exposed outside of the function that declares them).

Full source examples from this article are available on Github both before (xxxx1) and after (xxxx2) refactoring.

Here's the original Java interface, translated into Scala and Clojure:

Original Interface

Java Interface

public interface YearMonthInterface {
    public int getYear();
    public int getMonth();
}

Scala Trait

Scala has traits instead of interfaces. A Scala trait can include implementations (but not constructors) as we'll see later.

trait YearMonthTrait {
  def year:Int
  def month:Int
}

Clojure

No interface is necessary, but the compiler won't tell you if you fail to match your data to your functions. Protocols could be used, but that's probably not typical and it's certainly not needed for this simple example.

Base YearMonth Implementation

A few simple tests should provide the best overview:

Tests of Base Implementation

// Java
YearMonth.addMonths(YearMonth.of(2013, 7), 2);
// 2013-9
YearMonth.addMonths(YearMonth.of(2012, 12), 1);
// 2013-1
YearMonth.addMonths(YearMonth.of(2013, 1), -1);
// 2012-12

// Scala
YearMonth.addMonths(YearMonth(2013, 7), 2)
// YearMonth(2013,9)
YearMonth.addMonths(YearMonth(2012, 12), 1)
// YearMonth(2013,1)
YearMonth.addMonths(YearMonth(2013, 1), -1)
// YearMonth(2012,12)

;; Clojure
(addMonths {:year 2013, :month 7} 2)
;; {:year 2013, :month 9}
(addMonths {:year 2012, :month 12} 1)
;; {:year 2013, :month 1}
(addMonths {:year 2013, :month 1} -1)
;; {:year 2012, :month 12}

Java Class

public final class YearMonth implements YearMonthInterface {
    private final int year;
    private final int month;
    
    private YearMonth(int y, int m) { year = y; month = m; }

    public static YearMonth of(int y, int m) {
      if (m > 12) {
          // convert to zero-based months for math
          m--;
          // Carry any extra months over to the year
          y = y + (m / 12);
          // Adjust month to be within one year
          m = m % 12;
          // convert back to one-based months
          m++;
      } else if (m < 1) {
          // Carry any extra months over to the year, but the first year
          // in this case is still year-1
          y = y + (m / 12) - 1;
          // Adjust negative month to be within one year.
          // To get the positive month, subtract it from 12
          m = 12 + (m % 12);
      }
      return new YearMonth(y, m);
    }
    
    @Override
    public int getYear() { return year; }
    
    @Override
    public int getMonth() { return month; }
    
    public static YearMonth addMonths(YearMonthInterface ym,
                                      int addedMonths) {
        return of(ym.getYear(), ym.getMonth() + addedMonths);
    }
    
    @Override
    public String toString() {
        return new StringBuilder().append(year).append("-")
                                  .append(month).toString();
    }
}

Scala Case Class and Companion Object

A case class in Scala automates writing the similar Java code. Scala does not have static methods. Instead, everything Java would call a "static" method goes in the companion object in Scala. A companion object is a singleton instance with the same name and in the same file as the class it belongs to.

case class YearMonth(override val year:Int,
                     override val month:Int) extends YearMonthTrait

object YearMonth {
  def addMonths(ym:YearMonthTrait, addedMonths:Int):YearMonth = {
    val newMonth = ym.month + addedMonths
    if (newMonth > 12) {
      // convert to zero-based months for math
      val m = newMonth - 1
      // Carry any extra months over to the year
      new YearMonth(ym.year + (m / 12), (m % 12) + 1)
    } else if (newMonth < 1) {
      // Carry any extra months over to the year, but the
      // first year in this case is still year-1
      val y = ym.year + (newMonth / 12) - 1
      // Adjust negative month to be within one year.
      // To get the positive month, subtract it from 12
      val m = 12 + (newMonth % 12)
      new YearMonth(y, m)
    } else {
      new YearMonth(ym.year, newMonth)
    }
  }
}

Clojure Function

Instead of declaring data types as classes, Clojure prefers to use immutable maps (hash maps). So we skip all data definition steps above and write a function that assumes a map with certain keys. These keys are analogous to the fields in Java and Scala.

(defn addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth))))

Add an Implementing Class

Here we add a second implementing class that contains an additional field.

Test Implementing Class

// Java
YearMonth.addMonths(MonthlyA.of("One", 2013, 7), 2)
// 2013-9

// Scala
YearMonth.addMonths(MonthlyA("One", 2013, 7), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField1 "One", :year 2013, :month 7} 2)
;; {:otherField1 "One", :year 2013, :month 9}

Java

public class MonthlyA implements YearMonthInterface {
    private final  String otherField1;
    private final int year;
    private final int month;

    private MonthlyA(String s, int y, int m) {
        otherField1 = s; year = y; month = m;
    }

    public static MonthlyA of(String s, int y, int m) {
        return new MonthlyA(s, y, m);
    }

    public String getOtherField1() { return otherField1; }

    @Override
    public int getYear() { return year; }

    @Override
    public int getMonth() { return month; }
}

Scala

case class MonthlyA(otherField1:String,
                    override val year:Int,
                    override val month:Int) extends YearMonthTrait

Clojure

No custom data structure is needed because Clojure leverages a Map.


Change Data Representation From Year & Month to YyyyMm

Test

// Java
YearMonth.addMonths(YearMonth.of(201307), 2)
// 2013-9

// Scala
YearMonth.addMonths(YearMonth(201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:yyyyMm 201307} 2)
;; {:yyyyMm 201309}

Java

Java requires that you manually update all the old code to be compatible with new data format. While I'm at it, I'm going to add a convenience static factory method to the base implementation that takes the new data format.

public interface YearMonthInterface {

    ... old methods unchanged ...

    // New! @return yyyyMm or YearMonth.of(year, month).getYyyyMm()
    public int getYyyyMm();
}

public class YearMonth implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    public static YearMonth of(int YyyyMm) {
        return new YearMonth(YyyyMm / 100, YyyyMm % 100);
    }

    // New!
    @Override
    public int getYyyyMm() {
        return (year * 100) + month;
    }
}

public class MonthlyA implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    @Override
    public int getYyyyMm() {
        return YearMonth.of(year, month).getYyyyMm();
    }
}

Scala

Scala lets you add your implementation logic right in the trait instead of touching any implementing classes, but I'm going to add a new factory method to the base class that accepts the new data format the same way I did in Java.

Additional constructors in Scala are implemented as factory methods in the companion object. apply() is the default name for a method, so you don't need to specify it in your client code. You can use it just like a normal factory/constructor (except for constructor pattern matching) as shown in the "Scala Test" example below.

trait YearMonthTrait {

    ... old methods unchanged ...

  // New!
  def yyyyMm:Int = (year * 100) + month
}

object YearMonth {
  // Add yyyyMm factory method to the YearMonth companion object
  // This is like the extra "of" method we just added to the Java version.
  def apply(yyyyMm:Int) = new YearMonth((yyyyMm / 100), (yyyyMm % 100))

    ... old methods unchanged ...

}

Clojure

I found it easiest/clearest to add two conversion methods to make the Clojure function handle both the new and old data formats.

(defn ymToOld [ym] (dissoc (assoc ym :year (quot (:yyyyMm ym) 100)
                                     :month (rem (:yyyyMm ym) 100))
                           :yyyyMm))

(defn ymToNew [ym] (dissoc (assoc ym :yyyyMm (+ (* (:year ym) 100)
                                                (:month ym)))
                           :year :month))

(defn addMonths [ym, addedMonths]
      (if (contains? ym :yyyyMm)
                     (ymToNew (addMonths (ymToOld ym), addedMonths))
          (let [newMonth (+ (:month ym) addedMonths)]

               ... same code from before ...

Add a New Implementing Class Using the New Data Format

Now that all the old code is working with the new data format, we can add a new class, MonthlyB that makes use of the new format internally.

Test New Class

All the old tests pass, even with the new code. I am only showing a few new tests for brevity.

// Java
YearMonth.addMonths(MonthlyB.of(1.1, 201307), 2);
// 2013-9

// Scala
YearMonth.addMonths(MonthlyB(1.1, 201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField2 1.1 :yyyyMm 201307} 2)
;; {:otherField2 1.1, :yyyyMm 201309}

Java

In Java, we have to manually add support for the old data format to the new class.

public class MonthlyB implements YearMonthInterface {
    private final double otherField2;
    private final int yyyyMm;

    private MonthlyB(double d, int yyM) {
        otherField2 = d; yyyyMm = yyM;
    }

    public static MonthlyB of(double d, int yyM) {
        return new MonthlyB(d, yyM);
    }

    public double getOtherField2() { return otherField2; }

    @Override
    public int getYear() { return yyyyMm / 100; }

    @Override
    public int getMonth() { return (yyyyMm % 100); }

    @Override
    public int getYyyyMm() { return yyyyMm; }
}

Scala

In Scala, an adapter trait makes any new classes play nicely with the old code. It only needs to be specified once and can be "mixed in" to as many new classes as necessary.

trait YearMonthNew extends YearMonthTrait {
  def yyyyMm:Int
  def year:Int = yyyyMm / 100
  def month:Int = (yyyyMm % 100)
}

case class MonthlyB(otherField1:Double,
                    override val yyyyMm:Int) extends YearMonthNew

Clojure

Clojure does not specify data types.

Additional Considerations

  • Compiling Scala is slow, taking 2-3x as long as Java. Sbt, on the other hand, is extremely clever, maximizing processor usage and deciding not to compile everything unless it needs to, so compiling Scala with SBT may effectively be faster than compiling Java with Ant.
  • Compiled Clojure code executes about 50% slower than comparable Scala/Java code
  • Both Scala and Clojure require a few small jar files to compile which hold their specific APIs.

Conclusions

All three languages got the job done. All three required specifying the logic for data transformation - the addMonths function. Data structures (user-defined types) showed the biggest difference between the three languages.

In Java, more work is spent defining and updating the types than defining functions that work on them. In Scala, the up-front work of defining types is small and elegant; a minor distraction from the "real work" of transforming that data. In Clojure, all attention is placed on the functions while data structures almost disappear altogether. This is a very beautiful and fast way to code that yields a very simple system, but it lacks the safety guarantee that type-safety gives the other languages. Comprehensive unit test coverage can mitigate this risk, but that is another form of complexity with its own maintenance cost.

In the beginning, Java solved virtually every major issue with C++ and created the JVM which these other two languages are built on. But it's showing its age. I really have trouble finding a situation where Java would win. I suppose if a small jar file size was critical... Really, the biggest advantage Java has over Scala is faster compile times. Maybe if you write in Clojure, you could use small amounts of Java for performance in critical areas? Still, I'd rather use Scala for that than Java.

Both Scala and Clojure seem to eliminate a lot of work that is required in Java, though they take fundamentally different approaches to doing so.

Wednesday, September 18, 2013

Comparing Objects is Relative

This post is part of a series on Comparing Objects.

Equality

For those of us who are implementation minded, that means that boolean equals(Object other) is a flawed API because there is no one definition of equality that will work in every context. In a previous post on Using Java Collections Effectively by Implementing equals() and hashCode(), I quoted Josh Bloch and said that "The behavior of equals(), hashCode(), and compareTo() must be consistent." Java subtly encourages us to define a single definition of equality and comparison per class.

Java programmers have been "ensuring symmetry by controlling only one side of the equation" for years. There's nothing wrong with defining a *default* context for comparison and equality.  Just don't mistake default equality and compareTo() as the only context for comparing objects.

A small potato and a slice of bread might have an equal number of calories (they are equivalent in one context), but the potato takes much more energy to heat up (they are different in another context). So Ideally, we'd like Heating equality and Caloric equality to be different for small potatoes vs. bread.

A context-relative definition of equality might look like this

boolean equals(Object left, Object right)

Hmm... equality and sorting have a good deal of overlap. The Comparator interface already defines something like equality - it returns 0 when things are sorted the same ("equally"), something else otherwise. Better still, any implementation of Comparator provides a context for that comparison.

interface Sugars {
    int gramsSugar();
    public static final Comparator<Sugars> COMPARATOR = (left, right) ->
            left.gramsSugar() - right.gramsSugar();
}

interface Heating {
    int cookingEnergy();
    public static final Comparator<Heating> COMPARATOR = (left, right) ->
            left.cookingEnergy() - right.cookingEnergy();
}

Could we use this for context-relative equality?

The ubiquitous ArrayList, Set, and Map use equals(), hashCode(), and compareTo() are all fixed in their one-sided ways. But the classes implementing SortedSet and SortedMap behave very differently from the other collections when passed a separate Comparator.

Using Context-Relative Equality (in Java)

When you pass a Comparator to TreeSet or TreeMap, you are using context-relative equality. All comparisons for get(), put(), and contains() are made with the Comparator NOT with the methods that the other standard collections use! With a little care you can use this to your advantage:

static class Food implements Heating, Sugars {
    private final int gramsSugar;
    private final int cookingEnergy;
    private Food(int g, int c) { gramsSugar = g; cookingEnergy = c; }
    @Override public int gramsSugar() { return gramsSugar; }
    @Override public int cookingEnergy() { return cookingEnergy; }
    @Override public String toString() {
        return "Food(" + gramsSugar + "," + cookingEnergy + ")";
    }
}

public static void main(String[] args) {
    List<Food> foods = Arrays.asList(new Food(5,3), new Food(4,4));
    SortedSet<Food> foodsByHeat = new TreeSet<>(Heating.COMPARATOR);
    foodsByHeat.addAll(foods);
    System.out.println("Foods by heat:");
    for (Food f : foodsByHeat) { System.out.println(f); }

    SortedSet<Food> foodsBySugar = new TreeSet<>(Sugars.COMPARATOR);
    foodsBySugar.addAll(foods);
    System.out.println("Foods by sugar:");
    for (Food f : foodsBySugar) { System.out.println(f); }
}

Output

Foods by heat:
Food(5,3)
Food(4,4)
Foods by sugar:
Food(4,4)
Food(5,3)

More

Here is the Source for the above.

Here is my earlier SetInterface test.

Tuesday, July 9, 2013

Immutable Java: Lists and Other Collections

Update 2017-09-14

While the techniques in this article still work, I prefer to use Paguro. Here's how:

List

public static final List<String> UNSAFE_ZONES =
        vec("Africa/Cairo",
            "Africa/Johannesburg",
            "America/Anchorage");

Set

private static final Set<String> validModes =
        set(MODE_PREADD, MODE_ADD,
            MODE_PREUPDATE, MODE_UPDATE)));

Map

private static final Map<String,String> shortNameHash =
        map(tup(SERVER_NAME_DEMO, "demo"),
            tup(SERVER_NAME_UAT, "uat"),
            tup(SERVER_NAME_INTEGRATION, "integration"),
            tup(SERVER_NAME_DEV, "dev"));

Paguro

The above have many advantages over the original suggestions in this article:

  • Brevity and Clarity - no extra junk, just declare your collection.
  • No static initializer blocks for maps. If you've ever created a dependency loop inside these blocks you'll appreciate this.
  • Immutability - These Paguro collections are just as safe as Collections.unmodifiables. You can't modify them in place, but you can modify them, producing an entirely new collection with items added, removed, or changed. Java's unmodifiable collections require copying the entire collection in order to change it - an O(n) operation. Paguro collections are designed to copy-on-write with maximum sharing between versions of a collection, so that modifications are O(log n) (usually with a high base, often approaching O(1)). Paguro also provides mutable builders to create immutable collections even faster (if you care about that).

Original Post

Lists

What happens to the UNSAFE_ZONES list in the following code when it is passed to doSomething()?

public static final List<String> UNSAFE_ZONES = Arrays.asList(
        "Africa/Cairo",
        "Africa/Johannesburg",
        "America/Anchorage",
        ...);

displayZones(UNSAFE_ZONES);

A method named displayZones(List zones) sounds pretty safe and it probably won't change the contents of the list you pass it. But unless you dig into that method very carefully, you can't know that for sure. Even if displayZones() is very simple and safe today, someone could use it to wrap third-party logic tomorrow, or make it pass your UNSAFE_ZONES to another method that changes it.

Maybe everything works perfectly, but you don't sleep well at night because someone could, at any time, intentionally or accidentally change your list. Maybe it's already happening somewhere, but it's caused by such a rare data condition that you just haven't noticed yet. Maybe you sleep well, but keeping track of every place that everything could possibly be modified in your application is using valuable thought-space, distracting you from more interesting problems.

Time zones don't change very often. This list probably only needs to be updated once or twice a year. If you make releases more often than that, there is no reason that a list like this needs to change *ever* except at compile time, or maybe when it is read from a file on application start-up. Any attempt by your program to change it on the fly is an error. So make it immutable.

Here is an immutable List in Java 5 or later.

Immutable List

public static final List<String> TIME_ZONES =
        Collections.unmodifiableList(Arrays.asList(
                "Africa/Cairo",
                "Africa/Johannesburg",
                "America/Anchorage",
                ...));

People always bring up concurrency with relation to immutable data structures, and immutability is a godsend with regard to concurrency. You can pass immutable objects and collections around freely between any number of threads without any locking, synchronization, defensive copies, or contention which is a big win both in terms of performance and simplicity. But on a day-to-day level, even with a single thread, the huge benefit to this style of coding is that you aren't left wondering if something changed your collection, because that is just not possible.

Imagine now that you are the author of displayZones() and that you want to communicate to people who use your API that displayZones() will never modify the list that it is sent. One way to do this is to write a comment in the JavaDoc like, "I promise never to modify your list inside displayZones()." For this to be effective, people have to 1. read the JavaDoc and 2. believe you. If I wrote that, would you believe me? Heck, I wouldn't believe myself if I wrote it last month!

It would be fantastic if the compiler would throw a nice fat warning if someone ever updated displayZones() to modify the list it was sent? Unfortunately (thank you alexandroid) even iterator has a remove() method and even in Java 8, streams have a toIterator() method, so there is no safe interface to pass to a function.

// Unsafe - iterator has the remove() method
public void displayZones(Iterable zones) { ...

If Iterable provided only read-only methods for traversing it in order, the caller would could pass a modifiable list without worrying about you changing it, because you could't. No defensive copies, no worries. They could pass any Collection, either mutable or immutable and it wouldn't matter. Alas, this it not the case.

There is still one benefit to passing an immutable collection. If you ever change displayZones to take a mutable List and the caller is already passing a mutable List, they won't get a compiler warning. Won't they be surprised when their list is modified! Immutability here future-proofs the caller's code. If the caller is passing a mutable list though, maybe it means they don't care if it is modified? In any case, unnecessary use of mutable data is a dangerous trap.

Other Collections

List is far from the only type of collection. Let's take a moment to look at what it's like to create an immutable Set or Map in Java (I think it works in Java 5 and 6 if you just fill in the empty <>s). If anyone knows a briefer/better way to do this, please leave a comment:

Immutable Set

private static final Set<String> validModes = Collections.unmodifiableSet(
        new HashSet<>(Arrays.asList(MODE_PREADD, MODE_ADD,
                                    MODE_PREUPDATE, MODE_UPDATE)));

EnumSet is preferred for enums because it executes faster and has a briefer syntax. EnumSets are ordered according to their "natural ordering" which is the order in which the enum constants are declared.

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.of(Mode.PREADD, Mode.ADD, Mode.PREUPDATE, Mode.UPDATE));

If you are using all values of an enum in order:

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.allOf(Mode.class));

Immutable Map

private static final Map<String,String> shortNameHash;
static {
    Map<String,String> m = new HashMap<>();
    m.put(SERVER_NAME_DEMO, "demo");
    m.put(SERVER_NAME_UAT, "uat");
    m.put(SERVER_NAME_INTEGRATION, "integration");
    m.put(SERVER_NAME_DEV, "dev");
    shortNameHash = Collections.unmodifiableMap(m);
}

It is critical in the Map example that the temporary Map m be scoped inside a dedicated block so that it passes out of scope and can never be accessed by anything after the immutable version of it has been created. Map m remains mutable forever, and a lexical closure (block) is the simplest way to keep any user accessible code from maintaining a direct reference to it.

Type Casting

As I've said before, it can be a pain to cast a collection in Java, especially compared to Scala. Unlike (invariant) generic collections, you can painlessly cast an array to its super-type. No need to suppress any unchecked or rawtypes warnings because this is just how arrays "work" (they are covariant). Fortunately, we happen to start with an array in most of the above examples. The example below shows an enum that implements an interface and provides an immutable list of its members (with the type of the interface):

public enum TimeFrame implements DropDownItemInterface {
    ...
    public static final List<DropDownItemInterface> ddiVals =
            Collections.unmodifiableList(Arrays.asList(
                (DropDownItemInterface[]) values()));

If you try casting the resulting list instead, you will see why I bothered to point this out (and why so many people prefer dynamic languages).

Effectiveness

Up to this point, Java is a little wordy, but effective. Where it really breaks down is that List is the only collection which extends an immutable interface. If you had a getShortName(Map m) method, there is no effective way to tell the caller that this method cannot modify the map you pass it. Google Guava falls short here too because its ImmutableMap data structure inherits from Map instead of the other way around. This needs to be fixed at the Java API level, or else, people need to start importing from some new collections API instead of java.util.

Scala and Clojure both make all their collections immutable by default. The mutable version of each type of collection is a sub-class of the immutable one. In either language, you could say getShortName(ImmutableMap m) (or similar) and have the benefits I outlined above. Java could do this too, and I feel very strongly that they should.

The reason why Scala and Clojure collections can be immutable by default is that they are implemented to allow very lightweight copies to be made very quickly. The immutable collections in these languages still have add() and put() methods on them (or equivalent). They just return an entirely new collection which includes the modification of the old one. In a hash-table based collection, only the hash bucket which is changed even needs to be copied over to the new collection. The other buckets can be shared because the collection is immutable!

Java could allow the same kind of fast, shallow copies of collections, but it has the ball and chain of 15-year-old add() and put() methods that return a boolean value instead of the underlying collection. Because this pollutes the namespace for those methods, new method names (like append() and prepend()) would have to be made for all modification operations so that they could return an immutable modified copy of the original collection. The old methods could be deprecated over time. This could lead to some short-term confusion, with people wondering why their immutable collection wasn't changed by an append() call, but I personally believe that it would be worth it in the long run.

Sometimes you need mutable data structures and by all means, use them. But when you don't, prefer immutability and you'll sleep better, think clearer, and write more robust code.

Monday, July 1, 2013

Using Scala Collections (or Guava) from Java

I've been finding more and more reasons to use Scala instead of Java, but migrating a large existing project (e.g. at work) is hard and takes time. So I've been looking into ways to use more and more Scala-type-thinking in Java. Specifically preferring immutability for simplicity sake (it's not just for multithreading any more).

Here's a quick test-run of creating an immutable map the way I would in Scala, but doing it in Java. I had to use 6 initial elements to make this fair because the Guava Map has a 5-element constructor (Scala has a 4-element constructor). It amuses me that the plain Java solution, the one this is most deeply rooted in imperative programming, is the only one to explicitly rely on a lexical closure to ensure immutability of the underlying elements in the map.

// Scala
val OP_MAP = Map(("+", "Plus"),
                 ("-", "Minus"),
                 ("*", "Times"),
                 ("/", "Divided By"),
                 ("%", "Modulo"),
                 ("^", "To the power of"))

// Google Guava
private static final ImmutableMap<String,String> OP_MAP =
        new ImmutableMap.Builder<String,String>()
                .put("+", "Plus"),
                .put("-", "Minus"),
                .put("*", "Times"),
                .put("/", "Divided By"),
                .put("%", "Modulo"),
                .put("^", "To the power of")
                .build();

// Java
private static final Map<String,String> OP_MAP;
static {
    Map<String,String> m = new HashMap<>();
    m.put("+", "Plus");
    m.put("-", "Minus");
    m.put("*", "Times");
    m.put("/", "Divided By");
    m.put("%", "Modulo");
    m.put("^", "To the power of");
    OP_MAP = Collections.unmodifiableMap(m);
}

// Scala collections from Java (based on decompiling the above Scala example)
@SuppressWarnings({"unchecked", "rawtypes"})
private static final WrappedArray<Tuple2<String,String>> wa =
        Predef.wrapRefArray(
                (Tuple2<String,String>[]) new Tuple2[] {
                        new Tuple2<>("+", "Plus"),
                        new Tuple2<>("-", "Minus"),
                        new Tuple2<>("*", "Times"),
                        new Tuple2<>("/", "Divided By"),
                        new Tuple2<>("%", "Modulo"),
                        new Tuple2<>("^", "To the power of") });

@SuppressWarnings("unchecked")
private static final scala.collection.immutable.Map<String,String> OP_MAP =
        (scala.collection.immutable.Map<String,String>)
                scala.Predef$.MODULE$.Map().apply(wa);

Conclusions

Wow. Using Scala from Java can get ugly! Of course, if I was willing to use a mutable Map, the Scala conversion would be simple and painless. But immutability was my goal.

Guava's Builder makes it slightly more elegant than the plain Java solution. But I often found it useful to have a collection that is the same as another with just one additional element. Unfortunately, there is no put() method on Map that returns a new immutable Map. No union(), intersect(), or difference() set operations. There is no forEach(), flatmap(), foldL(), foldR(), map() or other iterator.

While Guava is the winner at solving this particular problem, it's only a baby-step forward from the straight Java solution. I see that a lot of thought was put into the copyOf() method of Guava collections, but I'd much prefer internal iterators etc. that work together as an immutable API as described above. Let me know what I'm missing in the comments.

Sadly, Java 8 seems to only reinforce the assumption of mutability in their new interfaces. All the new iterators modify the underlying collection instead of returning a shallowly modified copy the way the Scala collections do. I'll probably be sticking with the straight-forward Java solution until I make the move to Scala. I might be tempted to wrap some Scala collections in Java-friendly classes that do all of the above that I mentioned, but of course, that's work...

Thanks to the folks on Stack Overflow for their invaluable help decompiling the Scala code!

Saturday, June 1, 2013

Shortcut for creating a typesfe List or Set in Java

Given a method:
weekPlanner(List<DayOfWeek> dowl)
I've been writing the following code to pass a typesafe list to it:
List<DayOfWeek> favDays = new ArrayList<>();
favDays.add(DayOfWeek.FRIDAY);
favDays.add(DayOfWeek.SATURDAY);
favDays.add(DayOfWeek.SUNDAY);
weekPlanner(favDays);
Today I finally realized that I can do this:
weekPlanner(Arrays.asList(DayOfWeek.FRIDAY,
                          DayOfWeek.SATURDAY,
                          DayOfWeek.SUNDAY));
It even says so in the JavaDoc for Arrays.asList(), but who knew? You can also wrap your new list in Collections.unmodifiableList()! I was very excited about using this to make an ImmutableList class so that I could make a call like the following:
weekPlanner(ImList.of(DayOfWeek.FRIDAY,
                      DayOfWeek.SATURDAY,
                      DayOfWeek.SUNDAY));
But then I realized that the first thing I'd want to do is to declare my weekPlanner method so that the caller would know that the method didn't modify the list or have side-effects. If weekPlanner() only uses sequential access to the list, I could declare it as an Iterable:
weekPlanner(Iterable<DayOfWeek> favDays)
Probably though I should really be using an immutable Set:
weekPlanner(ImSet<DayOfWeek> favDays)

But now I can't pass a mutable Set to it anymore because the java.util.Set interface would have to extend an immutable set to make that possible. This mostly explains why Scala collections do not extend the Java Collections API. Another reason is that the List.add() method in Java (and other similar methods) returns a boolean instead of a new or modified List. Having to avoid these bogus Java methods in Scala would be error prone. But that doesn't explain why the Java API doesn't tack on methods like ImmutableList<E> append(E... args). I guess the remove() method would be difficult to retrofit, but I'd get a lot of mileage from just the append method...

I have loved Java for years and am tremendously appreciative of all the effort that went into making such a great language and making it basically freely available. But smarter people than I must have found the mutable skeletons in Java's closet years ago. Why did their voices go unheard by Sun and later Oracle? What is the down-side of adding a few Immutabile parent interfaces to the Java Collections API interfaces (Collection, List, Map, Set, etc.)?

I'd like to think that Martin Odersky would have argued for immutability before he left Sun and focused instead on making Scala. But Scala was conceived in 2001 and released in 2003. I don't know if immutability was a key feature from the beginning, but that's a long time for Java to ignore it. Maybe someone who knows more about this will enlighten me with a comment...

At least for me, the default assumption of public mutability of objects in Java is my primary motivation for moving to a language like Scala. The newer JVM languages have other great perks, but that's probably the one that will make me take action.

Tuesday, April 23, 2013

Casting a Collection: Java vs. Scala

I'm always stubbing my toes against this in Java - I have a collection of one type that I want to cast to a collection of a super-type (or interface) instead so that I can add more varied things to it.

Java 7:

public class CollectionCastTest {
    class B implements Runnable {
        @Override
        public void run() { ; }
    }

    public void main(String... args) {

        // First we'll try with a list that looks like it's allowed
        // to contain anything between B and Object.
        List<B> bList = new ArrayList<>();
        bList.add(new B());

        // Java error: incompatible types
        // List<Runnable> rList1 = bList;

        // Java error: unchecked cast
        // List<Runnable> rList2 = (List<Runnable>) bList;

        // Works, but has a time cost
        List<Runnable> rList3 = new ArrayList<>();
        rList3.addAll(bList);

        // Just works!
        List<? extends Runnable> rList4 = bList;

        // Interestingly, this doesn't work.
        // Java error: addAll in List cannot be applied to...
        // List<? extends Runnable> rList5 = new ArrayList<>();
        // rList5.addAll(bList);

        // Primitive arrays can cast most things, but using them takes time.
        List<Runnable> rList6 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));
        List<? extends Runnable> rList7 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));


    }
}

Scala

Scalas collections are covariant, meaning that you can substitute a sub-type and get away with it. It's certainly powerful.

object CollectionCastTest {
  trait A {
    def hello:String
  }

  class B extends A {
    override def hello = "Hello World!"
  }

  def main(args: Array[String]) {
    val bList = List[B](new B())

    // Works
    val aList1:List[A] = bList

    // Works
    val aList2:List[A] = bList match {
      case l:List[A] => l
      case _ => throw new ClassCastException
    }
    
    val aList3:List[A] = List[A]() ++ bList
  }
}

But does Scala know that these lists are the same?

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = false

Whoops - no. This is because it is using referential equality for B. Why? Because I didn't define an equals() method on A or B so it uses the default equals(). Let's use the same B() in both lists and see what happens:

scala> val b = new B()
b: B = B@4ffe5c31

scala> val bList = List(b)
bList: List[B] = List(B@4ffe5c31)

scala> val aList:List[A] = List[A](b)
aList: List[A] = List(B@4ffe5c31)

scala> aList == bList
res0: Boolean = true

Using different objects for B() are possible too, if we make Scala use an equality test based on the fields of the object instead of it's address. We could define equals() methods that inspect the object's fields, but this is the default implementation of equals() for case classes, so we can make this simple change:

-   class B extends A {
+   case class B extends A {

then

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = true

Wednesday, January 30, 2013

The Language After Java: Reflections on my first JavaOne

My thoughts around my first JavaOne conference centered around, "What is the next big thing going to be?" in the sense of "what do I need to learn to stay current?"  I definitely came back with a short list of very promising, but currently underdog technologies:

  • JUnit (or unit testing in general) is easy and important enough to be part of every project. I even wonder if good unit tests are easier and more beneficial than type-safety in your programming language!
  • Scala (successor to Java)
  • Git (successor to Subversion)
  • Maven (successor to Ant)
  • Functional Programming (it's finally ready for prime-time)
  • Algorithms (more than Data Structures)

These thoughts motivated me to make some tool changes and to take Martin Odersky's "Functional Programming Principals in Scala" (which was mind-blowing).

I've seen the rise of Data Structures, then Object Oriented Programming, and now Functional Programming and Algorithms.  Functional Programming was really a dark horse from my perspective because Lisp is the second-oldest programming language, yet FP failed to become mainstream until very recently.

My functional experience before Scala was confined to e-lisp and XSLT, neither of which make a particularly good general-purpose programming language. The recent death of Moore's Law and the triumph of concurrent processing in cloud services like AWS have given Functional programming better performance and code-comprehension characteristics than imperative programming.  I definitely didn't see that coming.

Sometimes the Don Quixote principal makes bad popular ideas successful - if enough people believe in something it becomes true.  I wondered about that and Functional Programming. "Computer Science" has lived in the Math department of most universities.  If you get enough math geeks doing programming, of course they will start to favor a style that looks and behaves like functions from the field of math.  So what, I thought, if Functional Programming was based on Lambda Calculus - is it practical for solving business problems?

But this was not the right question.  The people asking, "What can a program possibly do" and "What is the best way to do X?" are the ones creating the technological innovations that ultimately drive the rest of the industry.  The limitations of what programs can do (and how efficiently those tasks can be accomplished) constitute a good part of the study of algorithms and the inspiration for new languages.  Sure, you can do most things in most general-purpose languages, but as James Roper said and Havoc Pennington quoted in his recent talk (slides available here), just because you can do something in a language doesn't mean that you will. Some things are just too difficult in some languages. This is why knowing both Imperative and Functional styles is very helpful in addition to studying Algorithms.

The easy problems may continue to be solved in imperative languages without much thinking about algorithms at all: transform one kind of text to another, deliver the right data for a given request, or generate a report from a database.  But the hard problems are increasingly going to require concurrent processing with the most effective algorithms.  Sure, it's easier to do some things in a functional language, but a great algorithm in a limited language beats a crummy algorithm in a concise and beautiful language any day.

Java is a great language, but it is showing its age:

  • Primitives and API classes (like List and Map) are all designed to be mutable.  Concurrent programming with this inherent unsafety is a real bear. Even without concurrency, it's just easier to reason about things that don't change. Just using Java collections properly requires immutability!
  • Collections are managed through iteration which puts the burden of concurrent programming on the user of those collections.  This will be fixed by Lambdas in Java 8.

Scala is the language most similar to Java that overcomes these limitations, so you may find more of that and less Java going forward. In any case, my search for "The Language After Java" has led me instead to a short list of really promising tools, and a rapidly growing interest in Functional Programming and Algorithms.  I only wish I had hit on these ideas earlier in my career.  Thank you JavaOne - and thank you John for getting me to go there!

Friday, January 4, 2013

Ternary Operator (?:) in Java

Overview

Like the if/else statement, the ternary operator creates a logical branching in the code where it is used. Unlike if/else statements, the ternary operator is an expression, meaning that it produces (returns?) a value. Expressions like (a * b) can produce a numeric (int, long, float, double) value and (a || b) can produce a boolean value. But only functions and the ternary operator can yield any type of object or primitive.

Here are two simple examples. Each creates a String s and sets its value depending on the value of x:

// Set salutation using 'if'
String s = null;
if (isFriend) { s = "Hey, " + fName; } else { s = "Hello, " + fName; }
// Set salutation using '? :'
String s = (isFriend ? "Hey, " : "Hello ") + fName;

It's called the "The Ternary Operator" because it is the only operator (in the languages that use it) that takes three operands. Perhaps it would be better to call it "Evaluative-If" or "Question-mark Colon" but I'm not campaigning to rename it today.

Functional programmers like the ternary operator because it creates a block of code that produces a value - much like a closure.

Known Good and Bad Uses

There are countless ways to write bad code, and some language features have more potential for abuse than others. Because of its brevity, the ternary operator can be a particular nightmare, especially when used with statements that rely on operator precedence:

// DO NOT DO THIS - USE PARENTHESIS!
a == b || c & d ? e ^ b : d | b

Also, the ? : is so tiny on the page that it becomes very hard to see when used with lengthy (multi-line) operands. It's designed for short things that generally fit on one line and produce a value.

I have yet to find a significant timing difference between ?: and if/else statements, so we can rule out performance as a reason to choose one syntax over the other.

There was a discussion about the ternary operator on StackExchange recently and one person provided a great example of how NOT to use the ternary operator:

// Nesting the ternary operator is EVIL - DO NOT DO THIS!
int median1(int a, int b, int c) {
    return
        (a < b)
        ?
            (b < c)
            ? b
            :
                (a < c)
                ? c
                : a
        :
            (a < c)
            ? a
            :
                (b < c)
                ? c
                : b;
}

Nesting the ternary operator (? ? : :) is always evil because it's not clear which question-mark goes with which colon. But chaining (? : ? : ? :) with proper indentation and short blocks of code can be very clear. Note: the examples below take advantage of the short-circuiting nature of return statements to remove the else clause for brevity:

// With one 'if' to remove the nesting
int median2(int a, int b, int c) {
  if (a < b) {
    return (b < c) ? b :
           (a < c) ? c : a
  }
  return (a < c) ? a :
         (b < c) ? c : b;
}

I find that easier to read than using if statements:

// All 'if' statements
int median3(int a, int b, int c) {
  if (a < b) {
    if (b < c) { return b; }
    if (a < c) { return c; }
    return a;
  }
  if (a < c) { return a; }
  if (b < c) { return c; }
  return b;
}

In Java, you can call a super-class constructor from a sub-class constructor only in the first line of that sub-class constructor, before any other methods are called. The ternary operator is the only way to change your input data before passing it to the super-class constructor.

public class MyClass extends MySuperclass {
  public MyClass(int a) {
    super(a > 0 ? true : false);
    ...
  }

Anywhere that a value is needed from a very short calculation, the ternary operator can be useful. Especially for preventing NullPointerExceptions:

out.print(name == null ? "" : name);

Conclusion

While the ternary operator has potential for misuse (particularly when nested, poorly indented, covering large blocks of code, or where operator precedence is critical), it also has potential for some good as shown above. Just be careful to use it only for good, ideally in situations that take advantage of it's evaluative nature. Many thanks to Programmers.stackexchange.com for inspiration for this post.