Sunday, September 29, 2013

Refactoring in Java, Scala, and Clojure

Update: Attend my presentation on this post at DevNexus February 25th 2014

Motivation

I don't use the words, Strongly-typed or Dynamic much in this post, but thinking about the relative costs and benefits of type safety was the primary inspiration behind it. You may want to keep your own mental tally of how type-safety helps and how it hurts in the following examples. Because type-safety means so many different things in different languages, I can only assess it's practical merits (or detriments) in the context of real-world examples - hence this post. But maybe after looking at these examples, you can infer general principals from them?

I personally find type safety to be useful at work because of the person-years of development that went into the system that I work on most. But I am not trying to convert people to type-safety, or away from it. My goal is to make the issues more visible so that we can all write better code more easily in the future. When I gave this talk to the Asheville Coders League, I felt some measure of satisfaction that one person told me afterwards that they were going to look into Clojure and another that they would look into Scala.

Problem

When I change an interface in Java, I then have to update all its implementations which can be difficult and time consuming. Making a wrapper class is often even more time consuming. This pain in the neck is sometimes called, "The Expression Problem" and it makes a good example for comparing these three languages.

For this comparison, I will use a class that models a Year/Month combination and a function, "addMonths" that takes the number of months to add (positive or negative) and returns a new YearMonth. Originally we used a data structure with 2 fields (year and month), but it complicated the database queries for ranges of months, so we changed it to a single int of the format YyyyMm. Now we can use > and < to compare YyyyMm fields in our SQL queries - and the raw data is still human readable.

Despite wild claims that Object Oriented Programming is all about mutation, I'm going to use immutable classes in all three languages (a few mutable local variables are used, but never exposed outside of the function that declares them).

Full source examples from this article are available on Github both before (xxxx1) and after (xxxx2) refactoring.

Here's the original Java interface, translated into Scala and Clojure:

Original Interface

Java Interface

public interface YearMonthInterface {
    public int getYear();
    public int getMonth();
}

Scala Trait

Scala has traits instead of interfaces. A Scala trait can include implementations (but not constructors) as we'll see later.

trait YearMonthTrait {
  def year:Int
  def month:Int
}

Clojure

No interface is necessary, but the compiler won't tell you if you fail to match your data to your functions. Protocols could be used, but that's probably not typical and it's certainly not needed for this simple example.

Base YearMonth Implementation

A few simple tests should provide the best overview:

Tests of Base Implementation

// Java
YearMonth.addMonths(YearMonth.of(2013, 7), 2);
// 2013-9
YearMonth.addMonths(YearMonth.of(2012, 12), 1);
// 2013-1
YearMonth.addMonths(YearMonth.of(2013, 1), -1);
// 2012-12

// Scala
YearMonth.addMonths(YearMonth(2013, 7), 2)
// YearMonth(2013,9)
YearMonth.addMonths(YearMonth(2012, 12), 1)
// YearMonth(2013,1)
YearMonth.addMonths(YearMonth(2013, 1), -1)
// YearMonth(2012,12)

;; Clojure
(addMonths {:year 2013, :month 7} 2)
;; {:year 2013, :month 9}
(addMonths {:year 2012, :month 12} 1)
;; {:year 2013, :month 1}
(addMonths {:year 2013, :month 1} -1)
;; {:year 2012, :month 12}

Java Class

public final class YearMonth implements YearMonthInterface {
    private final int year;
    private final int month;
    
    private YearMonth(int y, int m) { year = y; month = m; }

    public static YearMonth of(int y, int m) {
      if (m > 12) {
          // convert to zero-based months for math
          m--;
          // Carry any extra months over to the year
          y = y + (m / 12);
          // Adjust month to be within one year
          m = m % 12;
          // convert back to one-based months
          m++;
      } else if (m < 1) {
          // Carry any extra months over to the year, but the first year
          // in this case is still year-1
          y = y + (m / 12) - 1;
          // Adjust negative month to be within one year.
          // To get the positive month, subtract it from 12
          m = 12 + (m % 12);
      }
      return new YearMonth(y, m);
    }
    
    @Override
    public int getYear() { return year; }
    
    @Override
    public int getMonth() { return month; }
    
    public static YearMonth addMonths(YearMonthInterface ym,
                                      int addedMonths) {
        return of(ym.getYear(), ym.getMonth() + addedMonths);
    }
    
    @Override
    public String toString() {
        return new StringBuilder().append(year).append("-")
                                  .append(month).toString();
    }
}

Scala Case Class and Companion Object

A case class in Scala automates writing the similar Java code. Scala does not have static methods. Instead, everything Java would call a "static" method goes in the companion object in Scala. A companion object is a singleton instance with the same name and in the same file as the class it belongs to.

case class YearMonth(override val year:Int,
                     override val month:Int) extends YearMonthTrait

object YearMonth {
  def addMonths(ym:YearMonthTrait, addedMonths:Int):YearMonth = {
    val newMonth = ym.month + addedMonths
    if (newMonth > 12) {
      // convert to zero-based months for math
      val m = newMonth - 1
      // Carry any extra months over to the year
      new YearMonth(ym.year + (m / 12), (m % 12) + 1)
    } else if (newMonth < 1) {
      // Carry any extra months over to the year, but the
      // first year in this case is still year-1
      val y = ym.year + (newMonth / 12) - 1
      // Adjust negative month to be within one year.
      // To get the positive month, subtract it from 12
      val m = 12 + (newMonth % 12)
      new YearMonth(y, m)
    } else {
      new YearMonth(ym.year, newMonth)
    }
  }
}

Clojure Function

Instead of declaring data types as classes, Clojure prefers to use immutable maps (hash maps). So we skip all data definition steps above and write a function that assumes a map with certain keys. These keys are analogous to the fields in Java and Scala.

(defn addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth))))

Add an Implementing Class

Here we add a second implementing class that contains an additional field.

Test Implementing Class

// Java
YearMonth.addMonths(MonthlyA.of("One", 2013, 7), 2)
// 2013-9

// Scala
YearMonth.addMonths(MonthlyA("One", 2013, 7), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField1 "One", :year 2013, :month 7} 2)
;; {:otherField1 "One", :year 2013, :month 9}

Java

public class MonthlyA implements YearMonthInterface {
    private final  String otherField1;
    private final int year;
    private final int month;

    private MonthlyA(String s, int y, int m) {
        otherField1 = s; year = y; month = m;
    }

    public static MonthlyA of(String s, int y, int m) {
        return new MonthlyA(s, y, m);
    }

    public String getOtherField1() { return otherField1; }

    @Override
    public int getYear() { return year; }

    @Override
    public int getMonth() { return month; }
}

Scala

case class MonthlyA(otherField1:String,
                    override val year:Int,
                    override val month:Int) extends YearMonthTrait

Clojure

No custom data structure is needed because Clojure leverages a Map.


Change Data Representation From Year & Month to YyyyMm

Test

// Java
YearMonth.addMonths(YearMonth.of(201307), 2)
// 2013-9

// Scala
YearMonth.addMonths(YearMonth(201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:yyyyMm 201307} 2)
;; {:yyyyMm 201309}

Java

Java requires that you manually update all the old code to be compatible with new data format. While I'm at it, I'm going to add a convenience static factory method to the base implementation that takes the new data format.

public interface YearMonthInterface {

    ... old methods unchanged ...

    // New! @return yyyyMm or YearMonth.of(year, month).getYyyyMm()
    public int getYyyyMm();
}

public class YearMonth implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    public static YearMonth of(int YyyyMm) {
        return new YearMonth(YyyyMm / 100, YyyyMm % 100);
    }

    // New!
    @Override
    public int getYyyyMm() {
        return (year * 100) + month;
    }
}

public class MonthlyA implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    @Override
    public int getYyyyMm() {
        return YearMonth.of(year, month).getYyyyMm();
    }
}

Scala

Scala lets you add your implementation logic right in the trait instead of touching any implementing classes, but I'm going to add a new factory method to the base class that accepts the new data format the same way I did in Java.

Additional constructors in Scala are implemented as factory methods in the companion object. apply() is the default name for a method, so you don't need to specify it in your client code. You can use it just like a normal factory/constructor (except for constructor pattern matching) as shown in the "Scala Test" example below.

trait YearMonthTrait {

    ... old methods unchanged ...

  // New!
  def yyyyMm:Int = (year * 100) + month
}

object YearMonth {
  // Add yyyyMm factory method to the YearMonth companion object
  // This is like the extra "of" method we just added to the Java version.
  def apply(yyyyMm:Int) = new YearMonth((yyyyMm / 100), (yyyyMm % 100))

    ... old methods unchanged ...

}

Clojure

I found it easiest/clearest to add two conversion methods to make the Clojure function handle both the new and old data formats.

(defn ymToOld [ym] (dissoc (assoc ym :year (quot (:yyyyMm ym) 100)
                                     :month (rem (:yyyyMm ym) 100))
                           :yyyyMm))

(defn ymToNew [ym] (dissoc (assoc ym :yyyyMm (+ (* (:year ym) 100)
                                                (:month ym)))
                           :year :month))

(defn addMonths [ym, addedMonths]
      (if (contains? ym :yyyyMm)
                     (ymToNew (addMonths (ymToOld ym), addedMonths))
          (let [newMonth (+ (:month ym) addedMonths)]

               ... same code from before ...

Add a New Implementing Class Using the New Data Format

Now that all the old code is working with the new data format, we can add a new class, MonthlyB that makes use of the new format internally.

Test New Class

All the old tests pass, even with the new code. I am only showing a few new tests for brevity.

// Java
YearMonth.addMonths(MonthlyB.of(1.1, 201307), 2);
// 2013-9

// Scala
YearMonth.addMonths(MonthlyB(1.1, 201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField2 1.1 :yyyyMm 201307} 2)
;; {:otherField2 1.1, :yyyyMm 201309}

Java

In Java, we have to manually add support for the old data format to the new class.

public class MonthlyB implements YearMonthInterface {
    private final double otherField2;
    private final int yyyyMm;

    private MonthlyB(double d, int yyM) {
        otherField2 = d; yyyyMm = yyM;
    }

    public static MonthlyB of(double d, int yyM) {
        return new MonthlyB(d, yyM);
    }

    public double getOtherField2() { return otherField2; }

    @Override
    public int getYear() { return yyyyMm / 100; }

    @Override
    public int getMonth() { return (yyyyMm % 100); }

    @Override
    public int getYyyyMm() { return yyyyMm; }
}

Scala

In Scala, an adapter trait makes any new classes play nicely with the old code. It only needs to be specified once and can be "mixed in" to as many new classes as necessary.

trait YearMonthNew extends YearMonthTrait {
  def yyyyMm:Int
  def year:Int = yyyyMm / 100
  def month:Int = (yyyyMm % 100)
}

case class MonthlyB(otherField1:Double,
                    override val yyyyMm:Int) extends YearMonthNew

Clojure

Clojure does not specify data types.

Additional Considerations

  • Compiling Scala is slow, taking 2-3x as long as Java. Sbt, on the other hand, is extremely clever, maximizing processor usage and deciding not to compile everything unless it needs to, so compiling Scala with SBT may effectively be faster than compiling Java with Ant.
  • Compiled Clojure code executes about 50% slower than comparable Scala/Java code
  • Both Scala and Clojure require a few small jar files to compile which hold their specific APIs.

Conclusions

All three languages got the job done. All three required specifying the logic for data transformation - the addMonths function. Data structures (user-defined types) showed the biggest difference between the three languages.

In Java, more work is spent defining and updating the types than defining functions that work on them. In Scala, the up-front work of defining types is small and elegant; a minor distraction from the "real work" of transforming that data. In Clojure, all attention is placed on the functions while data structures almost disappear altogether. This is a very beautiful and fast way to code that yields a very simple system, but it lacks the safety guarantee that type-safety gives the other languages. Comprehensive unit test coverage can mitigate this risk, but that is another form of complexity with its own maintenance cost.

In the beginning, Java solved virtually every major issue with C++ and created the JVM which these other two languages are built on. But it's showing its age. I really have trouble finding a situation where Java would win. I suppose if a small jar file size was critical... Really, the biggest advantage Java has over Scala is faster compile times. Maybe if you write in Clojure, you could use small amounts of Java for performance in critical areas? Still, I'd rather use Scala for that than Java.

Both Scala and Clojure seem to eliminate a lot of work that is required in Java, though they take fundamentally different approaches to doing so.

Wednesday, September 18, 2013

Comparing Objects is Relative

This post is part of a series on Comparing Objects.

Equality

For those of us who are implementation minded, that means that boolean equals(Object other) is a flawed API because there is no one definition of equality that will work in every context. In a previous post on Using Java Collections Effectively by Implementing equals() and hashCode(), I quoted Josh Bloch and said that "The behavior of equals(), hashCode(), and compareTo() must be consistent." Java subtly encourages us to define a single definition of equality and comparison per class.

Java programmers have been "ensuring symmetry by controlling only one side of the equation" for years. There's nothing wrong with defining a *default* context for comparison and equality.  Just don't mistake default equality and compareTo() as the only context for comparing objects.

A small potato and a slice of bread might have an equal number of calories (they are equivalent in one context), but the potato takes much more energy to heat up (they are different in another context). So Ideally, we'd like Heating equality and Caloric equality to be different for small potatoes vs. bread.

A context-relative definition of equality might look like this

boolean equals(Object left, Object right)

Hmm... equality and sorting have a good deal of overlap. The Comparator interface already defines something like equality - it returns 0 when things are sorted the same ("equally"), something else otherwise. Better still, any implementation of Comparator provides a context for that comparison.

interface Sugars {
    int gramsSugar();
    public static final Comparator<Sugars> COMPARATOR = (left, right) ->
            left.gramsSugar() - right.gramsSugar();
}

interface Heating {
    int cookingEnergy();
    public static final Comparator<Heating> COMPARATOR = (left, right) ->
            left.cookingEnergy() - right.cookingEnergy();
}

Could we use this for context-relative equality?

The ubiquitous ArrayList, Set, and Map use equals(), hashCode(), and compareTo() are all fixed in their one-sided ways. But the classes implementing SortedSet and SortedMap behave very differently from the other collections when passed a separate Comparator.

Using Context-Relative Equality (in Java)

When you pass a Comparator to TreeSet or TreeMap, you are using context-relative equality. All comparisons for get(), put(), and contains() are made with the Comparator NOT with the methods that the other standard collections use! With a little care you can use this to your advantage:

static class Food implements Heating, Sugars {
    private final int gramsSugar;
    private final int cookingEnergy;
    private Food(int g, int c) { gramsSugar = g; cookingEnergy = c; }
    @Override public int gramsSugar() { return gramsSugar; }
    @Override public int cookingEnergy() { return cookingEnergy; }
    @Override public String toString() {
        return "Food(" + gramsSugar + "," + cookingEnergy + ")";
    }
}

public static void main(String[] args) {
    List<Food> foods = Arrays.asList(new Food(5,3), new Food(4,4));
    SortedSet<Food> foodsByHeat = new TreeSet<>(Heating.COMPARATOR);
    foodsByHeat.addAll(foods);
    System.out.println("Foods by heat:");
    for (Food f : foodsByHeat) { System.out.println(f); }

    SortedSet<Food> foodsBySugar = new TreeSet<>(Sugars.COMPARATOR);
    foodsBySugar.addAll(foods);
    System.out.println("Foods by sugar:");
    for (Food f : foodsBySugar) { System.out.println(f); }
}

Output

Foods by heat:
Food(5,3)
Food(4,4)
Foods by sugar:
Food(4,4)
Food(5,3)

More

Here is the Source for the above.

Here is my earlier SetInterface test.