Thursday, June 12, 2014

Benefits of a static compare() method on Java 8 interfaces

This post is part of a series on Comparing Objects.

If you are modeling alumnae at a college, you might want them sorted sometimes by name, other times by year of graduation.  You may also ask questions about them like whether two alumnae attended at the same time.  In an object-oriented language, you would probably model a Semester object to help solve these problems.

In Java a Semester might have 2 components: the Semester interface, and a default implementation of that interface.  So, ReportCard and Tuition might implement Semester (since there is one of each per semester) and you probably also want a SemesterImpl (default implementation with no other data attached) for showing lists of Semesters in drop-downs and such.

In Java 7 and earlier, you'd place all your Semester-specific functions on the SemesterImpl class.  In Java 8, most of them can go on the Semester interface either as static methods or default instance implementations.  How cool is that?

Ducks-in-order

I've been moving most methods from my default implementations to my interfaces with generally great results.  But you have to be careful with generics and inheritance.  At first, I tried having my Semester interface implement Comparator<Semester>.  Makes sense, no?  I mean, semesters go in order.  Define a comparator and the Java collections are your oyster.  Well, yes it works, but you probably want to have SemesterImpl do that instead...

Think about about ReportCard.  It implements Comparator<ReportCard>, but Comparator<ReportCard> cannot extend Comparator<Semester> because that requires covariant inheritance and I've declared an invariant type.  See my post on Casting a Collection: Java vs. Scala for more details on this kind of problem.  I couldn't figure out how to make Semester implement Comparator<? extends Semester>, or something about that didn't work around the problem, but maybe that's not such a worthy goal.

Do NOT Compare! (3249473645)

Sometimes you want to sort ReportCards by Semester (order received), sometimes by Student (personal GPA), sometimes by Grade (for class standing), or often by all three.  This means you will sort ReportCards differently in different contexts.

This may be a little bit profound for seasoned Java programmers because every class comes with equals() and hashcode() methods.  If you implement Comparable, then compareTo() must be compatible with equals() and hashcode() in very specific, mathematical ways.  This means that Java subtly encourages us to define a single definition of equality and comparison per class.

There's nothing wrong with defining a *default* context for comparison and equality.  No, wait. I take that back:


In any case, Java programmers have been "ensuring symmetry by controlling only one side of the equation" for years. It's not ideal, and it's the cause of numerous, hard to find errors, but with great care you can get by.  In any case, don't mistake default equality and compareTo() as the only context for comparing objects.

Fortunately, classes like TreeSet accept a comparator.  So you end up passing them a ComparatorBySemesterAvgGradeStudent, ComparatorByStudentSemester, etc. depending on your sorting context.

Implementation note: compare(left, right) and compareTo(other) are both instance methods, which means you need to create an instance of the appropriate Comparator in order to use them.  Most comparators are immutable and unchanging.  This means they should be implemented as Enums with a single immutable instance shared and reused by all clients.


So, we know we need to sort things differently in different contexts.  We need multiple different comparators to do that.  Doesn't that lead to a bunch of code that's effectively duplicated between these comparators?  ReportCard, SemesterImpl, and Student all need to take semesters into account for comparison.

The simple solution is to define a static compare(left right) method on an interface to define the default sort order for that interface.  VoilĂ  - we have a context (represented by the name-space of the interface this static method is defined on).  It takes two potentially different classes and compares them relative to that context only. Now we can mix and match contexts with objects as necessary. Methods like this become the building-blocks for more complicated sorting and keep your comparator enums, compareTo(), and equals() methods clear and simple.

This compare method outlined below takes a page from Objects.equals(left, right) in that it performs a referential equality check for speed and to ensure that each object is equal to itself.  This has the side effect that if someone passes compare(null, null) it returns true, which makes sense some: undefined is undefined.  But should we ever compare anything to null? Even itself? For a while I thought, "What harm could come from just sorting nulls last?"

Sortorder

What harm indeed!  How do you use compare() to compose a isGreaterThan(other) method if null is greater than any valid value?  It makes no sense.  Really, there is no meaningful way to compare a value to null because it represents "undefined." Well, isGreaterThan(other) can meaningfully return false if other is null, but the general-purpose compare(left, right) method cannot.  It must return less-than (any negative int), greater-than (any positive int), or equal (0).

The only rational thing to do if either parameter to the compare() method is null is to throw an exception.  Finally, you can just do the comparison.  If your class can be represented as an integer, this could be extremely simple.  For instance, if our hypothetical example had no more than 9 semesters per year, they could number them 2014-1, 2014-2, 2014-3, etc, and store these numbers as a single int: 20141, 20142, 20143.  Thus, the compare() method on Semester would be:
static int compare(Semester left, Semester right) {
    if (left == null) {
        throw new IllegalArgumentException("Can't compare nulls");
    }
    if (left == right) { return 0; } // Also ensures right != null
    return left.yyyyS() - right.yyyyS();
}

Now, SemesterImpl.equals() and hashcode() almost write themselves:

@Override
public int hashCode() {
    return yyyyS();
}

@Override
public boolean equals(Object other) {
    // Cheapest operation first...
    if (this == other) { return true; }

    if ( (other == null) ||
         !(other instanceof SemesterImpl) ||
         (this.hashCode() != other.hashCode()) ) {
        return false;
    }
    // Details...
    return Semester.compare(this, (SemesterImpl) other) == 0;
}

If ReportCards are typically sorted by semester/Student, then that equals() method all but writes itself as well:
@Override
public boolean equals(Object other) {
    // Cheapest operation first...
    if (this == other) { return true; }

    if ( (other == null) ||
         !(other instanceof ReportCard) ||
         (this.hashCode() != other.hashCode()) ) {
        return false;
    }
    // Details...
    ReportCard that = (ReportCard) other;
    int ret = Semester.compare(this, that);
    if (ret != 0) { return ret; }
    return Student.compare(this.student, that.getStudent());
}

You can have as many repetitions of this kind of thing as you want:
ret = Xxxxx.compare(this, that);
if (ret != 0) { return ret; }

Using the same code in your equals() method and your compare() method helps ensure consistency between the two.

To ensure that your equals() method returns false for different surrogate-keyed database objects that sort the same (which you probably want), check out the last example in my post, Using Java Collections Effectively by Implementing equals(), hashCode(), and compareTo().

Thursday, February 6, 2014

Degrees of Lazy Evaluation

The topic of lazy evaluation often comes up when thinking about transforming collections of data. Lazy Evaluation is particularly useful for values that are expensive to compute, but it is also useful as a way of chaining functions to apply in a single pass through the underlying data. Laziness can be coded manually, or baked into an API such as a collection or transformation. Implementing some kinds of laziness is more expensive than others. Sometimes laziness may be inappropriate, such as when a database connection is open and waiting to be read from. I couldn't find degrees of laziness with a quick web search, so I'm defining some here.

Levels of Evaluation Laziness

0: Eager
The entire data source is evaluated immediately. Most statements in most languages work this way most of the time, because that's the way the underlying bytecodes and hardware are designed to work. Eager evaluation of a collection can be done concurrently from inside the collection. The reduce function is implemented this way in Clojure.
1: Delayed
The entire source data is evaluated on the first usage of the result. Examples: builder patterns, isTrueForAll(), forEach(), Java 8 collection transformations, and default Hibernate joined table loading. Delayed evaluation of a collection can be done concurrently from inside the collection.
2: Chunked
The source data is evaluated in the minimum size chunks for each call: filter(), contains(). This level is defined by asymptotic minimum and maximum laziness of levels 1 and 3.
3: Incremental
A maximum of one item is evaluated per call: map(). Incremental evaluation cannot be done concurrently from inside the collection because that would require evaluating more than one item at a time.

Level 2 is weird. A filter that matches nothing, or only the last element is effectively level 1. A filter that matches every element is effectively level 3. I call it level 2 in both cases because of its range. I think it's important to acknowledge level 2 as a unique level because it's chunked. A balanced hashtable of a certain size and load could evaluate one bucket at a time for contains() and be very reliably level 2 without ever being level 1 or level 3. Concurrent processing of Level 2 may be possible inside a collection in some circumstances, but its practicality may be limited when the chunk sizes are not known in advance.

For a single element value, Lazy levels 1, 2, and 3 are indistinguishable - it's either Eager or it's Lazy - period. Two elements can be either Lazy 1 or Lazy 3. With 3 or more elements, Lazy 2 becomes a meaningful description of the laziness.

The results of lazy evaluation can be cached, particularly when that evaluation is expensive and the values will be reused. Clojure Sequences are based on a linked-list data structure which lends itself to (Lazy level 3) incremental evaluation. They are also cached/memoized by default, making them very safe, but somewhat resource intensive.

Using Clojure Sequences requires care with some data sources in order to process a finite subset of the data that fits in memory, or to iterate without holding onto the head of the sequence (thus making the head eligible for garbage collection before the end of the sequence is processed)1. Clojure evaluates (some?) sequences in chunks for efficiency, but incremental evaluation can be forced with a little cleverness.1

Implementation Concerns

Lazy level 0-3 can be represented as a lightweight, one-pass disposable view*, or in a heavier memoized persistent data structure like a linked list. Since views and linked lists traverse the underlying data incrementally, neither is suitable for internal concurrency. Either of these can be processed concurrently externally if necessary, but doing so seems like a design failure at some level - why eagerly process something you went to the trouble of making lazy?

What's a View?

The Iterator interface in Java and some other languages is not suitable for using concurrently. It has 2 methods: hasNext() and next() and lets you enumerate an underlying data source once. Some data sources are ordered, and some are not - the iterator respects that.

The problem is that even if you synchronize both of these methods, two threads can both call hasNext() when there is only one element left. At any time, a broken client can call next() without calling hasNext(). In each of these cases, a thread that calls hasNext() and gets "true" can still call next() and get NoSuchElementException. Yuck!

A better interface, which I'm calling a View (possibly because I misunderstood Paul Phillips using that word). It can be lazily evaluated or iterate over a previously realized collection, but it has a single method: next() which can return the next element (whether or not its null) or a sentinal value: USED_UP. This is 100% thread safe. An arbitrary number of threads can safely and quickly call:

T item = next();
while (item != USED_UP) {
    // do something with item
    item = next();
}

Footnotes

1 Many thanks to my friend Jeff Dik for pointing this out and correcting my earlier comments.

Sunday, January 26, 2014

Upgrading From Windows XP Before April 8th 2014

The Threat

Security experts have been predicting that malware creators all over the world are finding exploits [in Windows XP] and holding on to them. They know if they unleash an exploit now, it will be fixed. But if they are patient and wait, and hope Microsoft doesn't find the vulnerability, then they can use it for maximum gain come April 9.
The same holds true for Office 2003. Support for it ends on April 15, one week later.

Source: http://www.networkworld.com/community/blog/why-april-9th-might-be-its-worst-day-2014

Upgrade Options: Windows, Mac, Android, or Linux

What kind of computer or operating system you use is determined by what software you need to run.

Office

The first question you should ask yourself is exactly how compatible you need to be with the latest version of Microsoft Office. This is not a yes-or-no question. We are all somewhere on a sliding scale of MS Office compatibility. Very few people require full compatibility with the most obscure features of Microsoft Office. MS Office isn't even compatible with different versions of itself!

100% compatibility with the very latest version of MS Office requires Windows 7 or 8 which probably won't run on your old XP machine. You can purchase the necessary hardware and software from any reputable computer store except a Mac store.

You can run the latest MS Office on Mac, Android, or Linux, but only by installing a virtual machine, then installing Windows, then installing MS Office. This is a pain in the neck, (both the install and the ongoing maintenance) but it can be done and is an expensive but effective way to meet an occasional need for the latest MS Office. If bleeding edge MS Office is the primary reason for having the computer, it's easier and cheaper to buy a Windows computer and be done with it.

Many home users meet their basic office needs with Google Docs which comes free with your home Gmail account. It is compatible only with very basic MS Office documents - no fancy templates, embedded objects, or macros, but it's also much safer from virus threats as a result. It may not be advanced enough for sharing documents with customers and prospects, but for home use it works great with letters, posters, simple spreadsheets, and for sharing them with friends.

LibreOffice is free, runs on any operating system (comes pre-installed on popular Linux flavors), and is roughly equivalent to being one version behind MS Office which is the best you can do natively on the Mac anyway. It's easy to use, powerful, and has Visio-like tools and PDF conversion built-in. This is what I use almost exclusively, even though I have several versions of MS Office installed in virtual Windows machines. Try it out to see if you can use it instead of paying for Microsoft Office:

  • Install LibreOffice
  • Tools -> Options -> Load/Save -> General
    • Unckeck "Warn when not saving in ODF"
    • Document Type: Text, Always Save As: Word 97...2003 (NOT Template)
    • Document Type: Spreadsheet, Always Save As: Word 97...2003 (NOT Template)
  • Tools -> Options -> Load/Save -> General -> Microsoft Office
    • Check all the boxes.

Other Software

GoToMeeting/GotoWebinar works on Windows, Mac, Android, and now Linux. I haven't tried the Linux version yet, but it only allows you to attend meetings - not share your screen or use a web-cam. Screen sharing on Linux also works well using Skype (or TeamViewer).

Photoshop is Windows and Mac only, but I have found that a combination of Darktable and GIMP meets 100% of my needs (though Photoshop is more convenient and user-friendly if you can afford it).

Linux

Given the above limitations, Linux is a great way to turbocharge an old computer. I switched to Linux about 5 years ago because of the reliability, ease of use, security, and availability of free software. I hope I never have to go back. The only thing I have used Windows for in the past 6 months is GoToMeeting. I've been using Ubuntu Linux 13.10 which is a more Mac-like experience and a little easier to upgrade. Mint Cinnamon Linux is more like Windows 7.

Try either one out by burning a Live CD and booting from it. That will show you if you need to purchase an nVidia graphics card (on a desktop) or a new wireless card (on a laptop) for compatibility reasons, but these can be acquired very cheaply. If you are buying hardware, a Solid State Drive (SSD) can be a miracle for an aging computer. My 7-year-old Ubuntu laptop with an SSD boots in less than 8 seconds and shuts down in less than 3.

I recommend Mint/Cinnamon Linux or Ubuntu over Xubuntu and Lubuntu. The former are just as lightweight but have more usability features than the latter. Otherwise, this article is pretty good and fills in more details

Monday, October 28, 2013

Expression Problem: Discussion

logaan was kind enough to leave some comments on twitter regarding my recent post on Refactoring in Java, Scala, and Clojure. I needed more than 140 characters so I am responding here.

Discussion

Glen replied: (Untyped) Clojure wraps the function with converters instead of wrapping a user-defined data type. Functional view of same issue?

logaan replied: My understanding is that the expression problem is linked to polymorphism. Clojure's solution is protocols. Protocols allow you to add more types to existing behaviour and more behaviour to existing types. In Scala it needs implicits.

Reply

Thank you Logaan for your insightful comments. I really appreciate you taking the time to read my article and offer constructive criticism. Forming this response was very educational for me.

Chris Houser's Solution to the Expression Problem on InfoQ was a very interesting talk and, I assume, the basis for your criticism. It turns out that there was a debate on Lambda the Ultimate about the nature of The Expression Problem in response to Chouser's presentation.

Since Clojure rejects type safety as a design goal, applying Clojure to the Expression problem seemed somewhat of a stretch to me. On the other hand, Martin Odersky's paper points out:

The term expression problem was originally coined by Phil Wadler in a post on the Java-Genericity mailing list, in which he also proposed a solution written in an extended version of Generic Java. Only later it appeared that Wadler’s solution could not be typed.

If Philip Wadler's solution to his 1998 problem was not type-safe, that made me think the door was open to applying this problem to a dynamic language like Clojure.

In terms of needing Scala's implicits to solve the Expression problem, there may be some aspect to this that I was not understanding. It was actually Odersky's paper that made me think traits were the solution to this problem. I am impressed with how well they solve it. Especially compared to all the typing Java requires when I have 8 implementing classes of an interface that needs to change!

Ultimately, the exact definition of Wadler's original problem is much less interesting to me than solving or side-stepping this general type of problem. In response to your insightful criticism, I have renamed my post, "Refactoring in 3 Languages" and provided an alternate Clojure solution below. Thank you again for taking the time to respond.

Alternate Solution

The Clojure page on protocols actually mentions the Expression Problem. But protocols look like more complication than my little example requires. I can imagine how more complicated "Expression-Like Problems" are well served by records and protocols, yet I feel that Clojure's sweetest spot involves side-stepping these issues when practical by ignoring data specification as much as possible (by leveraging maps).

Here is an alternate "solution" to my earlier post using records and protocols:

(defrecord YearMonth [year month])

(defprotocol MONTH_ADDIBLE 
  (addMonths [MONTH_ADDIBLE mos]))

(extend-type YearMonth
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth)))))
                 
;; Tests
(addMonths (YearMonth. 2013, 7) 2)
;; #user.YearMonth{:year 2013, :month 9}
(addMonths (YearMonth. 2012, 12) 1)
;; #user.YearMonth{:year 2013, :month 1}
(addMonths (YearMonth. 2013, 1) -1)
;; #user.YearMonth{:year 2012, :month 12}

;; With an additional field
(addMonths (assoc (YearMonth. 2013, 7) :otherField1 "One") 2)
;; #user.YearMonth{:year 2013, :month 9, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2012, 12) :otherField1 "One") 1)
;; #user.YearMonth{:year 2013, :month 1, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2013, 1) :otherField1 "One") -1)
;; #user.YearMonth{:year 2012, :month 12, :otherField1 "One"}

(defrecord YearMonth2 [yyyyMm])

(defn yearAndMonthToYm [year month] (+ (* year 100) month))

(extend-type YearMonth2
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [year (quot (:yyyyMm ym) 100),
            newMonth (+ (rem (:yyyyMm ym) 100) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :yyyyMm (yearAndMonthToYm (+ year (quot m 12)),
                                                             (+ (rem m 12) 1))))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ year (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :yyyyMm (yearAndMonthToYm y, m)))
                 :else (assoc ym :yyyyMm (yearAndMonthToYm year, newMonth))))))

(addMonths (YearMonth2. 201307) 2)
;; #user.YearMonth2{:yyyyMm 201309}

;; With an additional field
(addMonths (assoc (YearMonth2. 201307) :otherField2 1.1) 2)
;; #user.YearMonth2{:yyyyMm 201309, :otherField2 1.1}

Sunday, September 29, 2013

Refactoring in Java, Scala, and Clojure

Update: Attend my presentation on this post at DevNexus February 25th 2014

Motivation

I don't use the words, Strongly-typed or Dynamic much in this post, but thinking about the relative costs and benefits of type safety was the primary inspiration behind it. You may want to keep your own mental tally of how type-safety helps and how it hurts in the following examples. Because type-safety means so many different things in different languages, I can only assess it's practical merits (or detriments) in the context of real-world examples - hence this post. But maybe after looking at these examples, you can infer general principals from them?

I personally find type safety to be useful at work because of the person-years of development that went into the system that I work on most. But I am not trying to convert people to type-safety, or away from it. My goal is to make the issues more visible so that we can all write better code more easily in the future. When I gave this talk to the Asheville Coders League, I felt some measure of satisfaction that one person told me afterwards that they were going to look into Clojure and another that they would look into Scala.

Problem

When I change an interface in Java, I then have to update all its implementations which can be difficult and time consuming. Making a wrapper class is often even more time consuming. This pain in the neck is sometimes called, "The Expression Problem" and it makes a good example for comparing these three languages.

For this comparison, I will use a class that models a Year/Month combination and a function, "addMonths" that takes the number of months to add (positive or negative) and returns a new YearMonth. Originally we used a data structure with 2 fields (year and month), but it complicated the database queries for ranges of months, so we changed it to a single int of the format YyyyMm. Now we can use > and < to compare YyyyMm fields in our SQL queries - and the raw data is still human readable.

Despite wild claims that Object Oriented Programming is all about mutation, I'm going to use immutable classes in all three languages (a few mutable local variables are used, but never exposed outside of the function that declares them).

Full source examples from this article are available on Github both before (xxxx1) and after (xxxx2) refactoring.

Here's the original Java interface, translated into Scala and Clojure:

Original Interface

Java Interface

public interface YearMonthInterface {
    public int getYear();
    public int getMonth();
}

Scala Trait

Scala has traits instead of interfaces. A Scala trait can include implementations (but not constructors) as we'll see later.

trait YearMonthTrait {
  def year:Int
  def month:Int
}

Clojure

No interface is necessary, but the compiler won't tell you if you fail to match your data to your functions. Protocols could be used, but that's probably not typical and it's certainly not needed for this simple example.

Base YearMonth Implementation

A few simple tests should provide the best overview:

Tests of Base Implementation

// Java
YearMonth.addMonths(YearMonth.of(2013, 7), 2);
// 2013-9
YearMonth.addMonths(YearMonth.of(2012, 12), 1);
// 2013-1
YearMonth.addMonths(YearMonth.of(2013, 1), -1);
// 2012-12

// Scala
YearMonth.addMonths(YearMonth(2013, 7), 2)
// YearMonth(2013,9)
YearMonth.addMonths(YearMonth(2012, 12), 1)
// YearMonth(2013,1)
YearMonth.addMonths(YearMonth(2013, 1), -1)
// YearMonth(2012,12)

;; Clojure
(addMonths {:year 2013, :month 7} 2)
;; {:year 2013, :month 9}
(addMonths {:year 2012, :month 12} 1)
;; {:year 2013, :month 1}
(addMonths {:year 2013, :month 1} -1)
;; {:year 2012, :month 12}

Java Class

public final class YearMonth implements YearMonthInterface {
    private final int year;
    private final int month;
    
    private YearMonth(int y, int m) { year = y; month = m; }

    public static YearMonth of(int y, int m) {
      if (m > 12) {
          // convert to zero-based months for math
          m--;
          // Carry any extra months over to the year
          y = y + (m / 12);
          // Adjust month to be within one year
          m = m % 12;
          // convert back to one-based months
          m++;
      } else if (m < 1) {
          // Carry any extra months over to the year, but the first year
          // in this case is still year-1
          y = y + (m / 12) - 1;
          // Adjust negative month to be within one year.
          // To get the positive month, subtract it from 12
          m = 12 + (m % 12);
      }
      return new YearMonth(y, m);
    }
    
    @Override
    public int getYear() { return year; }
    
    @Override
    public int getMonth() { return month; }
    
    public static YearMonth addMonths(YearMonthInterface ym,
                                      int addedMonths) {
        return of(ym.getYear(), ym.getMonth() + addedMonths);
    }
    
    @Override
    public String toString() {
        return new StringBuilder().append(year).append("-")
                                  .append(month).toString();
    }
}

Scala Case Class and Companion Object

A case class in Scala automates writing the similar Java code. Scala does not have static methods. Instead, everything Java would call a "static" method goes in the companion object in Scala. A companion object is a singleton instance with the same name and in the same file as the class it belongs to.

case class YearMonth(override val year:Int,
                     override val month:Int) extends YearMonthTrait

object YearMonth {
  def addMonths(ym:YearMonthTrait, addedMonths:Int):YearMonth = {
    val newMonth = ym.month + addedMonths
    if (newMonth > 12) {
      // convert to zero-based months for math
      val m = newMonth - 1
      // Carry any extra months over to the year
      new YearMonth(ym.year + (m / 12), (m % 12) + 1)
    } else if (newMonth < 1) {
      // Carry any extra months over to the year, but the
      // first year in this case is still year-1
      val y = ym.year + (newMonth / 12) - 1
      // Adjust negative month to be within one year.
      // To get the positive month, subtract it from 12
      val m = 12 + (newMonth % 12)
      new YearMonth(y, m)
    } else {
      new YearMonth(ym.year, newMonth)
    }
  }
}

Clojure Function

Instead of declaring data types as classes, Clojure prefers to use immutable maps (hash maps). So we skip all data definition steps above and write a function that assumes a map with certain keys. These keys are analogous to the fields in Java and Scala.

(defn addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth))))

Add an Implementing Class

Here we add a second implementing class that contains an additional field.

Test Implementing Class

// Java
YearMonth.addMonths(MonthlyA.of("One", 2013, 7), 2)
// 2013-9

// Scala
YearMonth.addMonths(MonthlyA("One", 2013, 7), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField1 "One", :year 2013, :month 7} 2)
;; {:otherField1 "One", :year 2013, :month 9}

Java

public class MonthlyA implements YearMonthInterface {
    private final  String otherField1;
    private final int year;
    private final int month;

    private MonthlyA(String s, int y, int m) {
        otherField1 = s; year = y; month = m;
    }

    public static MonthlyA of(String s, int y, int m) {
        return new MonthlyA(s, y, m);
    }

    public String getOtherField1() { return otherField1; }

    @Override
    public int getYear() { return year; }

    @Override
    public int getMonth() { return month; }
}

Scala

case class MonthlyA(otherField1:String,
                    override val year:Int,
                    override val month:Int) extends YearMonthTrait

Clojure

No custom data structure is needed because Clojure leverages a Map.


Change Data Representation From Year & Month to YyyyMm

Test

// Java
YearMonth.addMonths(YearMonth.of(201307), 2)
// 2013-9

// Scala
YearMonth.addMonths(YearMonth(201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:yyyyMm 201307} 2)
;; {:yyyyMm 201309}

Java

Java requires that you manually update all the old code to be compatible with new data format. While I'm at it, I'm going to add a convenience static factory method to the base implementation that takes the new data format.

public interface YearMonthInterface {

    ... old methods unchanged ...

    // New! @return yyyyMm or YearMonth.of(year, month).getYyyyMm()
    public int getYyyyMm();
}

public class YearMonth implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    public static YearMonth of(int YyyyMm) {
        return new YearMonth(YyyyMm / 100, YyyyMm % 100);
    }

    // New!
    @Override
    public int getYyyyMm() {
        return (year * 100) + month;
    }
}

public class MonthlyA implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    @Override
    public int getYyyyMm() {
        return YearMonth.of(year, month).getYyyyMm();
    }
}

Scala

Scala lets you add your implementation logic right in the trait instead of touching any implementing classes, but I'm going to add a new factory method to the base class that accepts the new data format the same way I did in Java.

Additional constructors in Scala are implemented as factory methods in the companion object. apply() is the default name for a method, so you don't need to specify it in your client code. You can use it just like a normal factory/constructor (except for constructor pattern matching) as shown in the "Scala Test" example below.

trait YearMonthTrait {

    ... old methods unchanged ...

  // New!
  def yyyyMm:Int = (year * 100) + month
}

object YearMonth {
  // Add yyyyMm factory method to the YearMonth companion object
  // This is like the extra "of" method we just added to the Java version.
  def apply(yyyyMm:Int) = new YearMonth((yyyyMm / 100), (yyyyMm % 100))

    ... old methods unchanged ...

}

Clojure

I found it easiest/clearest to add two conversion methods to make the Clojure function handle both the new and old data formats.

(defn ymToOld [ym] (dissoc (assoc ym :year (quot (:yyyyMm ym) 100)
                                     :month (rem (:yyyyMm ym) 100))
                           :yyyyMm))

(defn ymToNew [ym] (dissoc (assoc ym :yyyyMm (+ (* (:year ym) 100)
                                                (:month ym)))
                           :year :month))

(defn addMonths [ym, addedMonths]
      (if (contains? ym :yyyyMm)
                     (ymToNew (addMonths (ymToOld ym), addedMonths))
          (let [newMonth (+ (:month ym) addedMonths)]

               ... same code from before ...

Add a New Implementing Class Using the New Data Format

Now that all the old code is working with the new data format, we can add a new class, MonthlyB that makes use of the new format internally.

Test New Class

All the old tests pass, even with the new code. I am only showing a few new tests for brevity.

// Java
YearMonth.addMonths(MonthlyB.of(1.1, 201307), 2);
// 2013-9

// Scala
YearMonth.addMonths(MonthlyB(1.1, 201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField2 1.1 :yyyyMm 201307} 2)
;; {:otherField2 1.1, :yyyyMm 201309}

Java

In Java, we have to manually add support for the old data format to the new class.

public class MonthlyB implements YearMonthInterface {
    private final double otherField2;
    private final int yyyyMm;

    private MonthlyB(double d, int yyM) {
        otherField2 = d; yyyyMm = yyM;
    }

    public static MonthlyB of(double d, int yyM) {
        return new MonthlyB(d, yyM);
    }

    public double getOtherField2() { return otherField2; }

    @Override
    public int getYear() { return yyyyMm / 100; }

    @Override
    public int getMonth() { return (yyyyMm % 100); }

    @Override
    public int getYyyyMm() { return yyyyMm; }
}

Scala

In Scala, an adapter trait makes any new classes play nicely with the old code. It only needs to be specified once and can be "mixed in" to as many new classes as necessary.

trait YearMonthNew extends YearMonthTrait {
  def yyyyMm:Int
  def year:Int = yyyyMm / 100
  def month:Int = (yyyyMm % 100)
}

case class MonthlyB(otherField1:Double,
                    override val yyyyMm:Int) extends YearMonthNew

Clojure

Clojure does not specify data types.

Additional Considerations

  • Compiling Scala is slow, taking 2-3x as long as Java. Sbt, on the other hand, is extremely clever, maximizing processor usage and deciding not to compile everything unless it needs to, so compiling Scala with SBT may effectively be faster than compiling Java with Ant.
  • Compiled Clojure code executes about 50% slower than comparable Scala/Java code
  • Both Scala and Clojure require a few small jar files to compile which hold their specific APIs.

Conclusions

All three languages got the job done. All three required specifying the logic for data transformation - the addMonths function. Data structures (user-defined types) showed the biggest difference between the three languages.

In Java, more work is spent defining and updating the types than defining functions that work on them. In Scala, the up-front work of defining types is small and elegant; a minor distraction from the "real work" of transforming that data. In Clojure, all attention is placed on the functions while data structures almost disappear altogether. This is a very beautiful and fast way to code that yields a very simple system, but it lacks the safety guarantee that type-safety gives the other languages. Comprehensive unit test coverage can mitigate this risk, but that is another form of complexity with its own maintenance cost.

In the beginning, Java solved virtually every major issue with C++ and created the JVM which these other two languages are built on. But it's showing its age. I really have trouble finding a situation where Java would win. I suppose if a small jar file size was critical... Really, the biggest advantage Java has over Scala is faster compile times. Maybe if you write in Clojure, you could use small amounts of Java for performance in critical areas? Still, I'd rather use Scala for that than Java.

Both Scala and Clojure seem to eliminate a lot of work that is required in Java, though they take fundamentally different approaches to doing so.

Wednesday, September 18, 2013

Object Equality is Context-Relative

This post is part of a series on Comparing Objects.

Equality

For those of us who are implementation minded, that means that

public boolean equals(Object other)
is fundamentally wrong because there is no one definition of equality that will work in every context.

Take for example, the issue of user input and output escaping. TaintedString is a simple wrapper for String which shares the same method signatures but no common ancestor. All user input must be turned into TaintedStrings to show that it is untrusted. No output formats should accept a TaintedString - it must be encoded according to the output type before writing it out. But to dirty-check a persistent object for saving, you need to compare the raw String from the database to a TaintedString from the user to see if it changed.

In one context, one class needs to actually equal a class which is totally unrelated from an inheritance point of view. But in most other contexts, TaintedString and String must be totally incompatible so that you can't accidentally write a TaintedString to output without first encoding it.

hashcode(), equals(Object other), and compareTo(Object other) are fine for defining equality and ordering in a single (default) context. What about multiple contexts? The Comparator interface has a compare(Object o1, Object o2) method which compares two objects in a given context. Now you can sort people by last name alphabetically in a phone book using one Comparator, and in another context by the order they arrived to purchase tickets using another Comparator. But Comparator is genericized to work with a single kind of object.

Looking at some of the Hibernate code this morning, I found the following method:

public boolean equals(Object o, Object o2)

Here at last, is a way to compare a TaintedString to a String even though they do not share a common ancestor (besides Object) or implement a common interface. Because of the contract between hashcode(), equals() and Comparator, equality testing and hashing should be tied to the Comparator interface. Comparator already defines equality - it returns 0 when things are equal, something else otherwise. Having a separate equals() method at all is essentially a performance optimization. The default implementation of Comparator.hashcode(Object o) could be return o.hashcode();. This could be overridden whenever that might be incompatible with the other methods.

Types of Equality

Referential equality is painfully limited (though doing better than that for comparing functions would be really hard). Universal equality (defining equals() and hashcode() on your class) is often good enough for most simple applications. Having to specify equality explicitly in every context would be too much work for little projects. So letting universal equality cover the base cases, but allowing context-relative equality override it seems like the ideal solution.

Inspiration

The inspiration for this post was my seriously flawed understanding of the theoretical discussion about "Set is not a functor" on twitter, between Daniel Spiewak (@djspiewak), Jonathan Sterling (@jonsterling), and Viktor Klang (@viktorklang). A sub-topic of mapping a set and comparing Sets and Maps came up.

Tuesday, July 9, 2013

Immutable Java: Lists and Other Collections

Lists

What happens to the UNSAFE_ZONES list in the following code when it is passed to doSomething()?

public static final List<String> UNSAFE_ZONES = Arrays.asList(
        "Africa/Cairo",
        "Africa/Johannesburg",
        "America/Anchorage",
        ...);

displayZones(UNSAFE_ZONES);

A method named displayZones(List zones) sounds pretty safe and it probably won't change the contents of the list you pass it. But unless you dig into that method very carefully, you can't know that for sure. Even if displayZones() is very simple and safe today, someone could use it to wrap third-party logic tomorrow, or make it pass your UNSAFE_ZONES to another method that changes it.

Maybe everything works perfectly, but you don't sleep well at night because someone could, at any time, intentionally or accidentally change your list. Maybe it's already happening somewhere, but it's caused by such a rare data condition that you just haven't noticed yet. Maybe you sleep well, but keeping track of every place that everything could possibly be modified in your application is using valuable thought-space, distracting you from more interesting problems.

Time zones don't change very often. This list probably only needs to be updated once or twice a year. If you make releases more often than that, there is no reason that a list like this needs to change *ever* except at compile time, or maybe when it is read from a file on application start-up. Any attempt by your program to change it on the fly is an error. So make it immutable.

Here is an immutable List in Java 5 or later.

Immutable List

public static final List<String> TIME_ZONES =
        Collections.unmodifiableList(Arrays.asList(
                "Africa/Cairo",
                "Africa/Johannesburg",
                "America/Anchorage",
                ...));

People always bring up concurrency with relation to immutable data structures, and immutability is a godsend with regard to concurrency. You can pass immutable objects and collections around freely between any number of threads without any locking, synchronization, defensive copies, or contention which is a big win both in terms of performance and simplicity. But on a day-to-day level, even with a single thread, the huge benefit to this style of coding is that you aren't left wondering if something changed your collection, because that is just not possible.

Imagine now that you are the author of displayZones() and that you want to communicate to people who use your API that displayZones() will never modify the list that it is sent. One way to do this is to write a comment in the JavaDoc like, "I promise never to modify your list inside displayZones()." For this to be effective, people have to 1. read the JavaDoc and 2. believe you. If I wrote that, would you believe me? Heck, I wouldn't believe myself if I wrote it last month!

Wouldn't it be nice if the compiler would throw a nice fat warning if someone ever updated displayZones() to modify the list it was sent? Since TIME_ZONES is an ordered list and displayZones() will always display them in order, you can declare it to take an Iterable instead of a list:

public void displayZones(Iterable zones) { ...

Since Iterable provides only read-only methods for traversing it in order, the caller knows that even if they pass you a modifiable list, that they don't need to worry about you changing it, because you can't. No defensive copies, no worries. They can pass any Collection, either mutable or immutable and it won't matter.

There is still one benefit to passing an immutable collection. If you ever change displayZones to take a mutable List and the caller is already passing a mutable List, they won't get a compiler warning. Won't they be surprised when their list is modified! Immutability here future-proofs the caller's code. If the caller is passing a mutable list though, maybe it means they don't care if it is modified? In any case, unnecessary use of mutable data is a dangerous trap.

Other Collections

List is far from the only type of collection. Let's take a moment to look at what it's like to create an immutable Set or Map in Java (I think it works in Java 5 and 6 if you just fill in the empty <>s). If anyone knows a briefer/better way to do this, please leave a comment:

Immutable Set

private static final Set<String> validModes = Collections.unmodifiableSet(
        new HashSet<>(Arrays.asList(MODE_PREADD, MODE_ADD,
                                    MODE_PREUPDATE, MODE_UPDATE)));

EnumSet is preferred for enums because it executes faster and has a briefer syntax. EnumSets are ordered according to their "natural ordering" which is the order in which the enum constants are declared.

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.of(Mode.PREADD, Mode.ADD, Mode.PREUPDATE, Mode.UPDATE));

If you are using all values of an enum in order:

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.allOf(Mode.class));

Immutable Map

private static final Map<String,String> shortNameHash;
static {
    Map<String,String> m = new HashMap<>();
    m.put(SERVER_NAME_DEMO, "demo");
    m.put(SERVER_NAME_UAT, "uat");
    m.put(SERVER_NAME_INTEGRATION, "integration");
    m.put(SERVER_NAME_DEV, "dev");
    shortNameHash = Collections.unmodifiableMap(m);
}

It is critical in the Map example that the temporary Map m be scoped inside a dedicated block so that it passes out of scope and can never be accessed by anything after the immutable version of it has been created. Map m remains mutable forever, and a lexical closure (block) is the simplest way to keep any user accessible code from maintaining a direct reference to it.

Type Casting

As I've said before, it can be a pain to cast a collection in Java, especially compared to Scala. Unlike (invariant) generic collections, you can painlessly cast an array to its super-type. No need to suppress any unchecked or rawtypes warnings because this is just how arrays "work" (they are covariant). Fortunately, we happen to start with an array in most of the above examples. The example below shows an enum that implements an interface and provides an immutable list of its members (with the type of the interface):

public enum TimeFrame implements DropDownItemInterface {
    ...
    public static final List<DropDownItemInterface> ddiVals =
            Collections.unmodifiableList(Arrays.asList(
                (DropDownItemInterface[]) values()));

If you try casting the resulting list instead, you will see why I bothered to point this out (and why so many people prefer dynamic languages).

Effectiveness

Up to this point, Java is a little wordy, but effective. Where it really breaks down is that List is the only collection which extends an immutable interface. If you had a getShortName(Map m) method, there is no effective way to tell the caller that this method cannot modify the map you pass it. Google Guava falls short here too because its ImmutableMap data structure inherits from Map instead of the other way around. This needs to be fixed at the Java API level, or else, people need to start importing from some new collections API instead of java.util.

Scala and Clojure both make all their collections immutable by default. The mutable version of each type of collection is a sub-class of the immutable one. In either language, you could say getShortName(ImmutableMap m) (or similar) and have the benefits I outlined above with Iterable. Java could do this too, and I feel very strongly that they should.

The reason why Scala and Clojure collections can be immutable by default is that they are implemented to allow very lightweight copies to be made very quickly. The immutable collections in these languages still have add() and put() methods on them (or equivalent). They just return an entirely new collection which includes the modification of the old one. In a hash-table based collection, only the hash bucket which is changed even needs to be copied over to the new collection. The other buckets can be shared because the collection is immutable!

Java could allow the same kind of fast, shallow copies of collections, but it has the ball and chain of 15-year-old add() and put() methods that return a boolean value instead of the underlying collection. Because this pollutes the namespace for those methods, new method names (like append() and prepend()) would have to be made for all modification operations so that they could return an immutable modified copy of the original collection. The old methods could be deprecated over time. This could lead to some short-term confusion, with people wondering why their immutable collection wasn't changed by an append() call, but I personally believe that it would be worth it in the long run.

Sometimes you need mutable data structures and by all means, use them. But when you don't, prefer immutability and you'll sleep better, think clearer, and write more robust code.