Showing posts with label Java. Show all posts

Thursday, June 12, 2014

Comparators

This post is part of a series on Comparing Objects.

Some things lend themselves to default ordering, such as sorting users by their alphabetized last names. Yet users are often sorted by other criteria, such as the number of purchases they make per year, the amount their account is in debit, or what time they clicked the "purchase" button during a limited promotion. To represent this in Java, simply define a comparator for each context:

enum UserByPromoClick implements Comparator<User> {
    INSTANCE;
    public int compare(User left, User right) {
        ...
    }
}

enum UserByPurchases implements Comparator<User> {
    ...

enum UserByDebit implements Comparator<User> {
    ...

Implementation note: compare(left, right) and compareTo(other) are both instance methods, which means you need to create an instance of the appropriate Comparator in order to use them. Most comparators are immutable and unchanging. This means they should be implemented as Enums with a single immutable instance shared and reused by all clients.

Various sort orders are often combined. Two accounts equally in debit could be sorted to show up next to each other on an expense report. Their names would only be used as the "tie breaker" in this ordering.

public int compare(User left, User right) {
    // Assuming getPromoClickMs returns Integer.MAX_VALUE
    // if the user didn't click on the promo.
    if (left.getPromoClickMs() > right.getPromoClickMs()) { return 1; }
    if (left.getPromoClickMs() < right.getPromoClickMs()) { return -1; }

    // In a tie, sort by name instead.
    return UserByName.compare(left, right);
}

The implementation of Objects.equals(left, right) performs a referential equality check for speed and to ensure that each object is equal to itself. This has the side effect that if someone passes compare(null, null) it returns true, which makes sense some: undefined is undefined. But should we ever compare anything to null? For a while I thought, "What harm could come from just sorting nulls last?"

What harm indeed! Really, there is no meaningful way to compare a value to null because null represents "undefined." Well, isGreaterThan(other) can meaningfully return false if other is null, but the general-purpose compare(left, right) method cannot. It must return less-than (any negative int), greater-than (any positive int), or equal (0). The only rational thing to do if either parameter to the compare(a, b) method is null is to throw an exception.

if (left == null) {
    throw new IllegalArgumentException("Can't compare nulls");
}
if (left == right) { return 0; } // Also ensures right != null

If your class can be represented as an integer, compare() could be implemented as a simple subtraction. Just keep in mind that if your subtraction could exceed Integer.MIN_VALUE, it can roll and give you a positive result instead of a negative one. So if that could happen, you must use greater-than/less-than comparasions instead of subtraction.

int ret = left.getPromoClickMs() - right.getPromoClickMs();
if (ret != 0) { return ret; }

// In a tie, sort by name instead.
return UserByName.compare(left, right);

You can have as many repetitions of this kind of thing as you want:

ret = Xxxxx.compare(this, that);
if (ret != 0) { return ret; }

Persistence and Proxy Objects

When implementing a Comparator (as opposed to Comparable), then neither object can be trusted to be initialized and get/set should be used instead of direct access to instance fields on both objects. If you implement your Comparator as a separate class and your fields are private, the compiler will enforce this for you. But for Comparable and for Comparators implemented in the same class file as the object they are comparing, the compiler will NOT warn you, so you must be very careful.

Sunday, September 29, 2013

Refactoring in Java, Scala, and Clojure

Update: Attend my presentation on this post at DevNexus February 25th 2014

Motivation

I don't use the words, Strongly-typed or Dynamic much in this post, but thinking about the relative costs and benefits of type safety was the primary inspiration behind it. You may want to keep your own mental tally of how type-safety helps and how it hurts in the following examples. Because type-safety means so many different things in different languages, I can only assess it's practical merits (or detriments) in the context of real-world examples - hence this post. But maybe after looking at these examples, you can infer general principals from them?

I personally find type safety to be useful at work because of the person-years of development that went into the system that I work on most. But I am not trying to convert people to type-safety, or away from it. My goal is to make the issues more visible so that we can all write better code more easily in the future. When I gave this talk to the Asheville Coders League, I felt some measure of satisfaction that one person told me afterwards that they were going to look into Clojure and another that they would look into Scala.

Problem

When I change an interface in Java, I then have to update all its implementations which can be difficult and time consuming. Making a wrapper class is often even more time consuming. This pain in the neck is sometimes called, "The Expression Problem" and it makes a good example for comparing these three languages.

For this comparison, I will use a class that models a Year/Month combination and a function, "addMonths" that takes the number of months to add (positive or negative) and returns a new YearMonth. Originally we used a data structure with 2 fields (year and month), but it complicated the database queries for ranges of months, so we changed it to a single int of the format YyyyMm. Now we can use > and < to compare YyyyMm fields in our SQL queries - and the raw data is still human readable.

Despite wild claims that Object Oriented Programming is all about mutation, I'm going to use immutable classes in all three languages (a few mutable local variables are used, but never exposed outside of the function that declares them).

Full source examples from this article are available on Github both before (xxxx1) and after (xxxx2) refactoring.

Here's the original Java interface, translated into Scala and Clojure:

Original Interface

Java Interface

public interface YearMonthInterface {
    public int getYear();
    public int getMonth();
}

Scala Trait

Scala has traits instead of interfaces. A Scala trait can include implementations (but not constructors) as we'll see later.

trait YearMonthTrait {
  def year:Int
  def month:Int
}

Clojure

No interface is necessary, but the compiler won't tell you if you fail to match your data to your functions. Protocols could be used, but that's probably not typical and it's certainly not needed for this simple example.

Base YearMonth Implementation

A few simple tests should provide the best overview:

Tests of Base Implementation

// Java
YearMonth.addMonths(YearMonth.of(2013, 7), 2);
// 2013-9
YearMonth.addMonths(YearMonth.of(2012, 12), 1);
// 2013-1
YearMonth.addMonths(YearMonth.of(2013, 1), -1);
// 2012-12

// Scala
YearMonth.addMonths(YearMonth(2013, 7), 2)
// YearMonth(2013,9)
YearMonth.addMonths(YearMonth(2012, 12), 1)
// YearMonth(2013,1)
YearMonth.addMonths(YearMonth(2013, 1), -1)
// YearMonth(2012,12)

;; Clojure
(addMonths {:year 2013, :month 7} 2)
;; {:year 2013, :month 9}
(addMonths {:year 2012, :month 12} 1)
;; {:year 2013, :month 1}
(addMonths {:year 2013, :month 1} -1)
;; {:year 2012, :month 12}

Java Class

public final class YearMonth implements YearMonthInterface {
    private final int year;
    private final int month;
    
    private YearMonth(int y, int m) { year = y; month = m; }

    public static YearMonth of(int y, int m) {
      if (m > 12) {
          // convert to zero-based months for math
          m--;
          // Carry any extra months over to the year
          y = y + (m / 12);
          // Adjust month to be within one year
          m = m % 12;
          // convert back to one-based months
          m++;
      } else if (m < 1) {
          // Carry any extra months over to the year, but the first year
          // in this case is still year-1
          y = y + (m / 12) - 1;
          // Adjust negative month to be within one year.
          // To get the positive month, subtract it from 12
          m = 12 + (m % 12);
      }
      return new YearMonth(y, m);
    }
    
    @Override
    public int getYear() { return year; }
    
    @Override
    public int getMonth() { return month; }
    
    public static YearMonth addMonths(YearMonthInterface ym,
                                      int addedMonths) {
        return of(ym.getYear(), ym.getMonth() + addedMonths);
    }
    
    @Override
    public String toString() {
        return new StringBuilder().append(year).append("-")
                                  .append(month).toString();
    }
}

Scala Case Class and Companion Object

A case class in Scala automates writing the similar Java code. Scala does not have static methods. Instead, everything Java would call a "static" method goes in the companion object in Scala. A companion object is a singleton instance with the same name and in the same file as the class it belongs to.

case class YearMonth(override val year:Int,
                     override val month:Int) extends YearMonthTrait

object YearMonth {
  def addMonths(ym:YearMonthTrait, addedMonths:Int):YearMonth = {
    val newMonth = ym.month + addedMonths
    if (newMonth > 12) {
      // convert to zero-based months for math
      val m = newMonth - 1
      // Carry any extra months over to the year
      new YearMonth(ym.year + (m / 12), (m % 12) + 1)
    } else if (newMonth < 1) {
      // Carry any extra months over to the year, but the
      // first year in this case is still year-1
      val y = ym.year + (newMonth / 12) - 1
      // Adjust negative month to be within one year.
      // To get the positive month, subtract it from 12
      val m = 12 + (newMonth % 12)
      new YearMonth(y, m)
    } else {
      new YearMonth(ym.year, newMonth)
    }
  }
}

Clojure Function

Instead of declaring data types as classes, Clojure prefers to use immutable maps (hash maps). So we skip all data definition steps above and write a function that assumes a map with certain keys. These keys are analogous to the fields in Java and Scala.

(defn addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth))))

Add an Implementing Class

Here we add a second implementing class that contains an additional field.

Test Implementing Class

// Java
YearMonth.addMonths(MonthlyA.of("One", 2013, 7), 2)
// 2013-9

// Scala
YearMonth.addMonths(MonthlyA("One", 2013, 7), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField1 "One", :year 2013, :month 7} 2)
;; {:otherField1 "One", :year 2013, :month 9}

Java

public class MonthlyA implements YearMonthInterface {
    private final  String otherField1;
    private final int year;
    private final int month;

    private MonthlyA(String s, int y, int m) {
        otherField1 = s; year = y; month = m;
    }

    public static MonthlyA of(String s, int y, int m) {
        return new MonthlyA(s, y, m);
    }

    public String getOtherField1() { return otherField1; }

    @Override
    public int getYear() { return year; }

    @Override
    public int getMonth() { return month; }
}

Scala

case class MonthlyA(otherField1:String,
                    override val year:Int,
                    override val month:Int) extends YearMonthTrait

Clojure

No custom data structure is needed because Clojure leverages a Map.

Change Data Representation From Year & Month to YyyyMm

Test

// Java
YearMonth.addMonths(YearMonth.of(201307), 2)
// 2013-9

// Scala
YearMonth.addMonths(YearMonth(201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:yyyyMm 201307} 2)
;; {:yyyyMm 201309}

Java

Java requires that you manually update all the old code to be compatible with new data format. While I'm at it, I'm going to add a convenience static factory method to the base implementation that takes the new data format.

public interface YearMonthInterface {

    ... old methods unchanged ...

    // New! @return yyyyMm or YearMonth.of(year, month).getYyyyMm()
    public int getYyyyMm();
}

public class YearMonth implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    public static YearMonth of(int YyyyMm) {
        return new YearMonth(YyyyMm / 100, YyyyMm % 100);
    }

    // New!
    @Override
    public int getYyyyMm() {
        return (year * 100) + month;
    }
}

public class MonthlyA implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    @Override
    public int getYyyyMm() {
        return YearMonth.of(year, month).getYyyyMm();
    }
}

Scala

Scala lets you add your implementation logic right in the trait instead of touching any implementing classes, but I'm going to add a new factory method to the base class that accepts the new data format the same way I did in Java.

Additional constructors in Scala are implemented as factory methods in the companion object. apply() is the default name for a method, so you don't need to specify it in your client code. You can use it just like a normal factory/constructor (except for constructor pattern matching) as shown in the "Scala Test" example below.

trait YearMonthTrait {

    ... old methods unchanged ...

  // New!
  def yyyyMm:Int = (year * 100) + month
}

object YearMonth {
  // Add yyyyMm factory method to the YearMonth companion object
  // This is like the extra "of" method we just added to the Java version.
  def apply(yyyyMm:Int) = new YearMonth((yyyyMm / 100), (yyyyMm % 100))

    ... old methods unchanged ...

}

Clojure

I found it easiest/clearest to add two conversion methods to make the Clojure function handle both the new and old data formats.

(defn ymToOld [ym] (dissoc (assoc ym :year (quot (:yyyyMm ym) 100)
                                     :month (rem (:yyyyMm ym) 100))
                           :yyyyMm))

(defn ymToNew [ym] (dissoc (assoc ym :yyyyMm (+ (* (:year ym) 100)
                                                (:month ym)))
                           :year :month))

(defn addMonths [ym, addedMonths]
      (if (contains? ym :yyyyMm)
                     (ymToNew (addMonths (ymToOld ym), addedMonths))
          (let [newMonth (+ (:month ym) addedMonths)]

               ... same code from before ...

Add a New Implementing Class Using the New Data Format

Now that all the old code is working with the new data format, we can add a new class, MonthlyB that makes use of the new format internally.

Test New Class

All the old tests pass, even with the new code. I am only showing a few new tests for brevity.

// Java
YearMonth.addMonths(MonthlyB.of(1.1, 201307), 2);
// 2013-9

// Scala
YearMonth.addMonths(MonthlyB(1.1, 201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField2 1.1 :yyyyMm 201307} 2)
;; {:otherField2 1.1, :yyyyMm 201309}

Java

In Java, we have to manually add support for the old data format to the new class.

public class MonthlyB implements YearMonthInterface {
    private final double otherField2;
    private final int yyyyMm;

    private MonthlyB(double d, int yyM) {
        otherField2 = d; yyyyMm = yyM;
    }

    public static MonthlyB of(double d, int yyM) {
        return new MonthlyB(d, yyM);
    }

    public double getOtherField2() { return otherField2; }

    @Override
    public int getYear() { return yyyyMm / 100; }

    @Override
    public int getMonth() { return (yyyyMm % 100); }

    @Override
    public int getYyyyMm() { return yyyyMm; }
}

Scala

In Scala, an adapter trait makes any new classes play nicely with the old code. It only needs to be specified once and can be "mixed in" to as many new classes as necessary.

trait YearMonthNew extends YearMonthTrait {
  def yyyyMm:Int
  def year:Int = yyyyMm / 100
  def month:Int = (yyyyMm % 100)
}

case class MonthlyB(otherField1:Double,
                    override val yyyyMm:Int) extends YearMonthNew

Clojure

Clojure does not specify data types.

Additional Considerations

Compiling Scala is slow, taking 2-3x as long as Java. Sbt, on the other hand, is extremely clever, maximizing processor usage and deciding not to compile everything unless it needs to, so compiling Scala with SBT may effectively be faster than compiling Java with Ant.
Compiled Clojure code executes about 50% slower than comparable Scala/Java code
Both Scala and Clojure require a few small jar files to compile which hold their specific APIs.

Conclusions

All three languages got the job done. All three required specifying the logic for data transformation - the addMonths function. Data structures (user-defined types) showed the biggest difference between the three languages.

In Java, more work is spent defining and updating the types than defining functions that work on them. In Scala, the up-front work of defining types is small and elegant; a minor distraction from the "real work" of transforming that data. In Clojure, all attention is placed on the functions while data structures almost disappear altogether. This is a very beautiful and fast way to code that yields a very simple system, but it lacks the safety guarantee that type-safety gives the other languages. Comprehensive unit test coverage can mitigate this risk, but that is another form of complexity with its own maintenance cost.

In the beginning, Java solved virtually every major issue with C++ and created the JVM which these other two languages are built on. But it's showing its age. I really have trouble finding a situation where Java would win. I suppose if a small jar file size was critical... Really, the biggest advantage Java has over Scala is faster compile times. Maybe if you write in Clojure, you could use small amounts of Java for performance in critical areas? Still, I'd rather use Scala for that than Java.

Both Scala and Clojure seem to eliminate a lot of work that is required in Java, though they take fundamentally different approaches to doing so.

Tuesday, July 9, 2013

Immutable Java: Lists and Other Collections

Update 2017-09-14

While the techniques in this article still work, I prefer to use Paguro. Here's how:

List

public static final List<String> UNSAFE_ZONES =
        vec("Africa/Cairo",
            "Africa/Johannesburg",
            "America/Anchorage");

Set

private static final Set<String> validModes =
        set(MODE_PREADD, MODE_ADD,
            MODE_PREUPDATE, MODE_UPDATE)));

Map

private static final Map<String,String> shortNameHash =
        map(tup(SERVER_NAME_DEMO, "demo"),
            tup(SERVER_NAME_UAT, "uat"),
            tup(SERVER_NAME_INTEGRATION, "integration"),
            tup(SERVER_NAME_DEV, "dev"));

Paguro

The above have many advantages over the original suggestions in this article:

Brevity and Clarity - no extra junk, just declare your collection.
No static initializer blocks for maps. If you've ever created a dependency loop inside these blocks you'll appreciate this.
Immutability - These Paguro collections are just as safe as Collections.unmodifiables. You can't modify them in place, but you can modify them, producing an entirely new collection with items added, removed, or changed. Java's unmodifiable collections require copying the entire collection in order to change it - an O(n) operation. Paguro collections are designed to copy-on-write with maximum sharing between versions of a collection, so that modifications are O(log n) (usually with a high base, often approaching O(1)). Paguro also provides mutable builders to create immutable collections even faster (if you care about that).

Original Post

Lists

What happens to the UNSAFE_ZONES list in the following code when it is passed to doSomething()?

public static final List<String> UNSAFE_ZONES = Arrays.asList(
        "Africa/Cairo",
        "Africa/Johannesburg",
        "America/Anchorage",
        ...);

displayZones(UNSAFE_ZONES);

A method named displayZones(List zones) sounds pretty safe and it probably won't change the contents of the list you pass it. But unless you dig into that method very carefully, you can't know that for sure. Even if displayZones() is very simple and safe today, someone could use it to wrap third-party logic tomorrow, or make it pass your UNSAFE_ZONES to another method that changes it.

Maybe everything works perfectly, but you don't sleep well at night because someone could, at any time, intentionally or accidentally change your list. Maybe it's already happening somewhere, but it's caused by such a rare data condition that you just haven't noticed yet. Maybe you sleep well, but keeping track of every place that everything could possibly be modified in your application is using valuable thought-space, distracting you from more interesting problems.

Time zones don't change very often. This list probably only needs to be updated once or twice a year. If you make releases more often than that, there is no reason that a list like this needs to change *ever* except at compile time, or maybe when it is read from a file on application start-up. Any attempt by your program to change it on the fly is an error. So make it immutable.

Here is an immutable List in Java 5 or later.

Immutable List

public static final List<String> TIME_ZONES =
        Collections.unmodifiableList(Arrays.asList(
                "Africa/Cairo",
                "Africa/Johannesburg",
                "America/Anchorage",
                ...));

People always bring up concurrency with relation to immutable data structures, and immutability is a godsend with regard to concurrency. You can pass immutable objects and collections around freely between any number of threads without any locking, synchronization, defensive copies, or contention which is a big win both in terms of performance and simplicity. But on a day-to-day level, even with a single thread, the huge benefit to this style of coding is that you aren't left wondering if something changed your collection, because that is just not possible.

Imagine now that you are the author of displayZones() and that you want to communicate to people who use your API that displayZones() will never modify the list that it is sent. One way to do this is to write a comment in the JavaDoc like, "I promise never to modify your list inside displayZones()." For this to be effective, people have to 1. read the JavaDoc and 2. believe you. If I wrote that, would you believe me? Heck, I wouldn't believe myself if I wrote it last month!

It would be fantastic if the compiler would throw a nice fat warning if someone ever updated displayZones() to modify the list it was sent? Unfortunately (thank you alexandroid) even iterator has a remove() method and even in Java 8, streams have a toIterator() method, so there is no safe interface to pass to a function.

// Unsafe - iterator has the remove() method
public void displayZones(Iterable zones) { ...

If Iterable provided only read-only methods for traversing it in order, the caller would could pass a modifiable list without worrying about you changing it, because you could't. No defensive copies, no worries. They could pass any Collection, either mutable or immutable and it wouldn't matter. Alas, this it not the case.

There is still one benefit to passing an immutable collection. If you ever change displayZones to take a mutable List and the caller is already passing a mutable List, they won't get a compiler warning. Won't they be surprised when their list is modified! Immutability here future-proofs the caller's code. If the caller is passing a mutable list though, maybe it means they don't care if it is modified? In any case, unnecessary use of mutable data is a dangerous trap.

Other Collections

List is far from the only type of collection. Let's take a moment to look at what it's like to create an immutable Set or Map in Java (I think it works in Java 5 and 6 if you just fill in the empty <>s). If anyone knows a briefer/better way to do this, please leave a comment:

Immutable Set

private static final Set<String> validModes = Collections.unmodifiableSet(
        new HashSet<>(Arrays.asList(MODE_PREADD, MODE_ADD,
                                    MODE_PREUPDATE, MODE_UPDATE)));

EnumSet is preferred for enums because it executes faster and has a briefer syntax. EnumSets are ordered according to their "natural ordering" which is the order in which the enum constants are declared.

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.of(Mode.PREADD, Mode.ADD, Mode.PREUPDATE, Mode.UPDATE));

If you are using all values of an enum in order:

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.allOf(Mode.class));

Immutable Map

private static final Map<String,String> shortNameHash;
static {
    Map<String,String> m = new HashMap<>();
    m.put(SERVER_NAME_DEMO, "demo");
    m.put(SERVER_NAME_UAT, "uat");
    m.put(SERVER_NAME_INTEGRATION, "integration");
    m.put(SERVER_NAME_DEV, "dev");
    shortNameHash = Collections.unmodifiableMap(m);
}

It is critical in the Map example that the temporary Map m be scoped inside a dedicated block so that it passes out of scope and can never be accessed by anything after the immutable version of it has been created. Map m remains mutable forever, and a lexical closure (block) is the simplest way to keep any user accessible code from maintaining a direct reference to it.

Type Casting

As I've said before, it can be a pain to cast a collection in Java, especially compared to Scala. Unlike (invariant) generic collections, you can painlessly cast an array to its super-type. No need to suppress any unchecked or rawtypes warnings because this is just how arrays "work" (they are covariant). Fortunately, we happen to start with an array in most of the above examples. The example below shows an enum that implements an interface and provides an immutable list of its members (with the type of the interface):

public enum TimeFrame implements DropDownItemInterface {
    ...
    public static final List<DropDownItemInterface> ddiVals =
            Collections.unmodifiableList(Arrays.asList(
                (DropDownItemInterface[]) values()));

If you try casting the resulting list instead, you will see why I bothered to point this out (and why so many people prefer dynamic languages).

Effectiveness

Up to this point, Java is a little wordy, but effective. Where it really breaks down is that List is the only collection which extends an immutable interface. If you had a getShortName(Map m) method, there is no effective way to tell the caller that this method cannot modify the map you pass it. Google Guava falls short here too because its ImmutableMap data structure inherits from Map instead of the other way around. This needs to be fixed at the Java API level, or else, people need to start importing from some new collections API instead of java.util.

Scala and Clojure both make all their collections immutable by default. The mutable version of each type of collection is a sub-class of the immutable one. In either language, you could say getShortName(ImmutableMap m) (or similar) and have the benefits I outlined above. Java could do this too, and I feel very strongly that they should.

The reason why Scala and Clojure collections can be immutable by default is that they are implemented to allow very lightweight copies to be made very quickly. The immutable collections in these languages still have add() and put() methods on them (or equivalent). They just return an entirely new collection which includes the modification of the old one. In a hash-table based collection, only the hash bucket which is changed even needs to be copied over to the new collection. The other buckets can be shared because the collection is immutable!

Java could allow the same kind of fast, shallow copies of collections, but it has the ball and chain of 15-year-old add() and put() methods that return a boolean value instead of the underlying collection. Because this pollutes the namespace for those methods, new method names (like append() and prepend()) would have to be made for all modification operations so that they could return an immutable modified copy of the original collection. The old methods could be deprecated over time. This could lead to some short-term confusion, with people wondering why their immutable collection wasn't changed by an append() call, but I personally believe that it would be worth it in the long run.

Sometimes you need mutable data structures and by all means, use them. But when you don't, prefer immutability and you'll sleep better, think clearer, and write more robust code.

Monday, July 1, 2013

Using Scala Collections (or Guava) from Java

I've been finding more and more reasons to use Scala instead of Java, but migrating a large existing project (e.g. at work) is hard and takes time. So I've been looking into ways to use more and more Scala-type-thinking in Java. Specifically preferring immutability for simplicity sake (it's not just for multithreading any more).

Here's a quick test-run of creating an immutable map the way I would in Scala, but doing it in Java. I had to use 6 initial elements to make this fair because the Guava Map has a 5-element constructor (Scala has a 4-element constructor). It amuses me that the plain Java solution, the one this is most deeply rooted in imperative programming, is the only one to explicitly rely on a lexical closure to ensure immutability of the underlying elements in the map.

// Scala
val OP_MAP = Map(("+", "Plus"),
                 ("-", "Minus"),
                 ("*", "Times"),
                 ("/", "Divided By"),
                 ("%", "Modulo"),
                 ("^", "To the power of"))

// Google Guava
private static final ImmutableMap<String,String> OP_MAP =
        new ImmutableMap.Builder<String,String>()
                .put("+", "Plus"),
                .put("-", "Minus"),
                .put("*", "Times"),
                .put("/", "Divided By"),
                .put("%", "Modulo"),
                .put("^", "To the power of")
                .build();

// Java
private static final Map<String,String> OP_MAP;
static {
    Map<String,String> m = new HashMap<>();
    m.put("+", "Plus");
    m.put("-", "Minus");
    m.put("*", "Times");
    m.put("/", "Divided By");
    m.put("%", "Modulo");
    m.put("^", "To the power of");
    OP_MAP = Collections.unmodifiableMap(m);
}

// Scala collections from Java (based on decompiling the above Scala example)
@SuppressWarnings({"unchecked", "rawtypes"})
private static final WrappedArray<Tuple2<String,String>> wa =
        Predef.wrapRefArray(
                (Tuple2<String,String>[]) new Tuple2[] {
                        new Tuple2<>("+", "Plus"),
                        new Tuple2<>("-", "Minus"),
                        new Tuple2<>("*", "Times"),
                        new Tuple2<>("/", "Divided By"),
                        new Tuple2<>("%", "Modulo"),
                        new Tuple2<>("^", "To the power of") });

@SuppressWarnings("unchecked")
private static final scala.collection.immutable.Map<String,String> OP_MAP =
        (scala.collection.immutable.Map<String,String>)
                scala.Predef$.MODULE$.Map().apply(wa);

Conclusions

Wow. Using Scala from Java can get ugly! Of course, if I was willing to use a mutable Map, the Scala conversion would be simple and painless. But immutability was my goal.

Guava's Builder makes it slightly more elegant than the plain Java solution. But I often found it useful to have a collection that is the same as another with just one additional element. Unfortunately, there is no put() method on Map that returns a new immutable Map. No union(), intersect(), or difference() set operations. There is no forEach(), flatmap(), foldL(), foldR(), map() or other iterator.

While Guava is the winner at solving this particular problem, it's only a baby-step forward from the straight Java solution. I see that a lot of thought was put into the copyOf() method of Guava collections, but I'd much prefer internal iterators etc. that work together as an immutable API as described above. Let me know what I'm missing in the comments.

Sadly, Java 8 seems to only reinforce the assumption of mutability in their new interfaces. All the new iterators modify the underlying collection instead of returning a shallowly modified copy the way the Scala collections do. I'll probably be sticking with the straight-forward Java solution until I make the move to Scala. I might be tempted to wrap some Scala collections in Java-friendly classes that do all of the above that I mentioned, but of course, that's work...

Thanks to the folks on Stack Overflow for their invaluable help decompiling the Scala code!

Saturday, June 1, 2013

Shortcut for creating a typesfe List or Set in Java

Given a method:

weekPlanner(List<DayOfWeek> dowl)

I've been writing the following code to pass a typesafe list to it:

List<DayOfWeek> favDays = new ArrayList<>();
favDays.add(DayOfWeek.FRIDAY);
favDays.add(DayOfWeek.SATURDAY);
favDays.add(DayOfWeek.SUNDAY);
weekPlanner(favDays);

Today I finally realized that I can do this:

weekPlanner(Arrays.asList(DayOfWeek.FRIDAY,
                          DayOfWeek.SATURDAY,
                          DayOfWeek.SUNDAY));

It even says so in the JavaDoc for Arrays.asList(), but who knew? You can also wrap your new list in Collections.unmodifiableList()! I was very excited about using this to make an ImmutableList class so that I could make a call like the following:

weekPlanner(ImList.of(DayOfWeek.FRIDAY,
                      DayOfWeek.SATURDAY,
                      DayOfWeek.SUNDAY));

But then I realized that the first thing I'd want to do is to declare my weekPlanner method so that the caller would know that the method didn't modify the list or have side-effects. If weekPlanner() only uses sequential access to the list, I could declare it as an Iterable:

weekPlanner(Iterable<DayOfWeek> favDays)

Probably though I should really be using an immutable Set:

weekPlanner(ImSet<DayOfWeek> favDays)

But now I can't pass a mutable Set to it anymore because the java.util.Set interface would have to extend an immutable set to make that possible. This mostly explains why Scala collections do not extend the Java Collections API. Another reason is that the List.add() method in Java (and other similar methods) returns a boolean instead of a new or modified List. Having to avoid these bogus Java methods in Scala would be error prone. But that doesn't explain why the Java API doesn't tack on methods like ImmutableList<E> append(E... args). I guess the remove() method would be difficult to retrofit, but I'd get a lot of mileage from just the append method...

I have loved Java for years and am tremendously appreciative of all the effort that went into making such a great language and making it basically freely available. But smarter people than I must have found the mutable skeletons in Java's closet years ago. Why did their voices go unheard by Sun and later Oracle? What is the down-side of adding a few Immutabile parent interfaces to the Java Collections API interfaces (Collection, List, Map, Set, etc.)?

I'd like to think that Martin Odersky would have argued for immutability before he left Sun and focused instead on making Scala. But Scala was conceived in 2001 and released in 2003. I don't know if immutability was a key feature from the beginning, but that's a long time for Java to ignore it. Maybe someone who knows more about this will enlighten me with a comment...

At least for me, the default assumption of public mutability of objects in Java is my primary motivation for moving to a language like Scala. The newer JVM languages have other great perks, but that's probably the one that will make me take action.

Tuesday, April 23, 2013

Casting a Collection: Java vs. Scala

I'm always stubbing my toes against this in Java - I have a collection of one type that I want to cast to a collection of a super-type (or interface) instead so that I can add more varied things to it.

Java 7:

public class CollectionCastTest {
    class B implements Runnable {
        @Override
        public void run() { ; }
    }

    public void main(String... args) {

        // First we'll try with a list that looks like it's allowed
        // to contain anything between B and Object.
        List<B> bList = new ArrayList<>();
        bList.add(new B());

        // Java error: incompatible types
        // List<Runnable> rList1 = bList;

        // Java error: unchecked cast
        // List<Runnable> rList2 = (List<Runnable>) bList;

        // Works, but has a time cost
        List<Runnable> rList3 = new ArrayList<>();
        rList3.addAll(bList);

        // Just works!
        List<? extends Runnable> rList4 = bList;

        // Interestingly, this doesn't work.
        // Java error: addAll in List cannot be applied to...
        // List<? extends Runnable> rList5 = new ArrayList<>();
        // rList5.addAll(bList);

        // Primitive arrays can cast most things, but using them takes time.
        List<Runnable> rList6 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));
        List<? extends Runnable> rList7 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));


    }
}

Scala

Scalas collections are covariant, meaning that you can substitute a sub-type and get away with it. It's certainly powerful.

object CollectionCastTest {
  trait A {
    def hello:String
  }

  class B extends A {
    override def hello = "Hello World!"
  }

  def main(args: Array[String]) {
    val bList = List[B](new B())

    // Works
    val aList1:List[A] = bList

    // Works
    val aList2:List[A] = bList match {
      case l:List[A] => l
      case _ => throw new ClassCastException
    }
    
    val aList3:List[A] = List[A]() ++ bList
  }
}

But does Scala know that these lists are the same?

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = false

Whoops - no. This is because it is using referential equality for B. Why? Because I didn't define an equals() method on A or B so it uses the default equals(). Let's use the same B() in both lists and see what happens:

scala> val b = new B()
b: B = B@4ffe5c31

scala> val bList = List(b)
bList: List[B] = List(B@4ffe5c31)

scala> val aList:List[A] = List[A](b)
aList: List[A] = List(B@4ffe5c31)

scala> aList == bList
res0: Boolean = true

Using different objects for B() are possible too, if we make Scala use an equality test based on the fields of the object instead of it's address. We could define equals() methods that inspect the object's fields, but this is the default implementation of equals() for case classes, so we can make this simple change:

-   class B extends A {
+   case class B extends A {

then

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = true

Wednesday, January 30, 2013

The Language After Java: Reflections on my first JavaOne

My thoughts around my first JavaOne conference centered around, "What is the next big thing going to be?" in the sense of "what do I need to learn to stay current?" I definitely came back with a short list of very promising, but currently underdog technologies:

JUnit (or unit testing in general) is easy and important enough to be part of every project. I even wonder if good unit tests are easier and more beneficial than type-safety in your programming language!
Scala (successor to Java)
Git (successor to Subversion)
Maven (successor to Ant)
Functional Programming (it's finally ready for prime-time)
Algorithms (more than Data Structures)

These thoughts motivated me to make some tool changes and to take Martin Odersky's "Functional Programming Principals in Scala" (which was mind-blowing).

I've seen the rise of Data Structures, then Object Oriented Programming, and now Functional Programming and Algorithms. Functional Programming was really a dark horse from my perspective because Lisp is the second-oldest programming language, yet FP failed to become mainstream until very recently.

My functional experience before Scala was confined to e-lisp and XSLT, neither of which make a particularly good general-purpose programming language. The recent death of Moore's Law and the triumph of concurrent processing in cloud services like AWS have given Functional programming better performance and code-comprehension characteristics than imperative programming. I definitely didn't see that coming.

Sometimes the Don Quixote principal makes bad popular ideas successful - if enough people believe in something it becomes true. I wondered about that and Functional Programming. "Computer Science" has lived in the Math department of most universities. If you get enough math geeks doing programming, of course they will start to favor a style that looks and behaves like functions from the field of math. So what, I thought, if Functional Programming was based on Lambda Calculus - is it practical for solving business problems?

But this was not the right question. The people asking, "What can a program possibly do" and "What is the best way to do X?" are the ones creating the technological innovations that ultimately drive the rest of the industry. The limitations of what programs can do (and how efficiently those tasks can be accomplished) constitute a good part of the study of algorithms and the inspiration for new languages. Sure, you can do most things in most general-purpose languages, but as James Roper said and Havoc Pennington quoted in his recent talk (slides available here), just because you can do something in a language doesn't mean that you will. Some things are just too difficult in some languages. This is why knowing both Imperative and Functional styles is very helpful in addition to studying Algorithms.

The easy problems may continue to be solved in imperative languages without much thinking about algorithms at all: transform one kind of text to another, deliver the right data for a given request, or generate a report from a database. But the hard problems are increasingly going to require concurrent processing with the most effective algorithms. Sure, it's easier to do some things in a functional language, but a great algorithm in a limited language beats a crummy algorithm in a concise and beautiful language any day.

Java is a great language, but it is showing its age:

Primitives and API classes (like List and Map) are all designed to be mutable. Concurrent programming with this inherent unsafety is a real bear. Even without concurrency, it's just easier to reason about things that don't change. Just using Java collections properly requires immutability!
Collections are managed through iteration which puts the burden of concurrent programming on the user of those collections. This will be fixed by Lambdas in Java 8.

Scala is the language most similar to Java that overcomes these limitations, so you may find more of that and less Java going forward. In any case, my search for "The Language After Java" has led me instead to a short list of really promising tools, and a rapidly growing interest in Functional Programming and Algorithms. I only wish I had hit on these ideas earlier in my career. Thank you JavaOne - and thank you John for getting me to go there!

Friday, January 4, 2013

Ternary Operator (?:) in Java

Overview

Like the if/else statement, the ternary operator creates a logical branching in the code where it is used. Unlike if/else statements, the ternary operator is an expression, meaning that it produces (returns?) a value. Expressions like (a * b) can produce a numeric (int, long, float, double) value and (a || b) can produce a boolean value. But only functions and the ternary operator can yield any type of object or primitive.

Here are two simple examples. Each creates a String s and sets its value depending on the value of x:

// Set salutation using 'if'
String s = null;
if (isFriend) { s = "Hey, " + fName; } else { s = "Hello, " + fName; }

// Set salutation using '? :'
String s = (isFriend ? "Hey, " : "Hello ") + fName;

It's called the "The Ternary Operator" because it is the only operator (in the languages that use it) that takes three operands. Perhaps it would be better to call it "Evaluative-If" or "Question-mark Colon" but I'm not campaigning to rename it today.

Functional programmers like the ternary operator because it creates a block of code that produces a value - much like a closure.

Known Good and Bad Uses

There are countless ways to write bad code, and some language features have more potential for abuse than others. Because of its brevity, the ternary operator can be a particular nightmare, especially when used with statements that rely on operator precedence:

// DO NOT DO THIS - USE PARENTHESIS!
a == b || c & d ? e ^ b : d | b

Also, the ? : is so tiny on the page that it becomes very hard to see when used with lengthy (multi-line) operands. It's designed for short things that generally fit on one line and produce a value.

I have yet to find a significant timing difference between ?: and if/else statements, so we can rule out performance as a reason to choose one syntax over the other.

There was a discussion about the ternary operator on StackExchange recently and one person provided a great example of how NOT to use the ternary operator:

// Nesting the ternary operator is EVIL - DO NOT DO THIS!
int median1(int a, int b, int c) {
    return
        (a < b)
        ?
            (b < c)
            ? b
            :
                (a < c)
                ? c
                : a
        :
            (a < c)
            ? a
            :
                (b < c)
                ? c
                : b;
}

Nesting the ternary operator (? ? : :) is always evil because it's not clear which question-mark goes with which colon. But chaining (? : ? : ? :) with proper indentation and short blocks of code can be very clear. Note: the examples below take advantage of the short-circuiting nature of return statements to remove the else clause for brevity:

// With one 'if' to remove the nesting
int median2(int a, int b, int c) {
  if (a < b) {
    return (b < c) ? b :
           (a < c) ? c : a
  }
  return (a < c) ? a :
         (b < c) ? c : b;
}

I find that easier to read than using if statements:

// All 'if' statements
int median3(int a, int b, int c) {
  if (a < b) {
    if (b < c) { return b; }
    if (a < c) { return c; }
    return a;
  }
  if (a < c) { return a; }
  if (b < c) { return c; }
  return b;
}

In Java, you can call a super-class constructor from a sub-class constructor only in the first line of that sub-class constructor, before any other methods are called. The ternary operator is the only way to change your input data before passing it to the super-class constructor.

public class MyClass extends MySuperclass {
  public MyClass(int a) {
    super(a > 0 ? true : false);
    ...
  }

Anywhere that a value is needed from a very short calculation, the ternary operator can be useful. Especially for preventing NullPointerExceptions:

out.print(name == null ? "" : name);

Conclusion

While the ternary operator has potential for misuse (particularly when nested, poorly indented, covering large blocks of code, or where operator precedence is critical), it also has potential for some good as shown above. Just be careful to use it only for good, ideally in situations that take advantage of it's evaluative nature. Many thanks to Programmers.stackexchange.com for inspiration for this post.

Monday, September 17, 2012

Java Closures and The Start-End Problem

I use the term, "Start-End Problem" to describe a resource that needs to be opened and closed, or a header and footer that needs to be printed in a way that would tempt you to make a class with start() and end() methods, that are meant to be called as a pair with other code in-between. Some time around 1997, my mentor Jack Follansbee told me to avoid this pattern whenever practical, because it's too easy to forget to call the end() method. Another problem would be if the code in-between performs a return, continue, break, throws an Exception, or otherwise avoids reaching the end() method.

Java's try-catch-finally block solves the return and Exception problems, but does not ensure that you will remember to call the end() method in the finally block. There are three traditional Java approaches to this problem:

Pre-evaluating the middle code somehow (e.g. with a buffer) and passing it to a procedure which appends to the Start and End of the buffer.
```
public void doStartEnd(StringBuilder sB) {
    sB.insert(0, "Start");
    sB.append("End");
}
```
This has limited applications, but is nice when it works.
Dependency Injection: Creating a special-purpose procedure with variables to adapt it to multiple related purposes. If the number of variables is small (or 0) this works very well. But sometimes multiple variables need to be passed to this procedure and even a complex data structure must be created to hold the updated values for the return type. It seems a waste of time and energy to load all the variables into the procedure arguments like passengers on a bus then unloaded their updated values from the return data-type one by one.
```
public class ReturnObject {
    public int count;
    public boolean showedAnything;
}
public ReturnObject doStartEnd(int count, String middle, boolean showedAnything) {
    System.out.println("Start");
    if (!showedAnything) { System.out.println("New stuff"); }
    for (; count < 5; count++) {
       showedAnything = true;
       System.out.println(middle);
    }
    System.out.println("End");
    ReturnObject ro = new ReturnObject();
    ro.count = count;
    ro.showedAnything = showedAnything;
    return ro;
}
```
Real world examples can get complicated very quickly. Someone on StackExchange: Programmers recently said they routinely found procedures with 10 or 20 parameters where they worked and that they "died a little bit inside" every time they found one. There are numerous ways to cause bugs with this approach: transposing values, confusing Java's pass-by-value for default types with pass-by-reference... just to name a few. The coding effort involved can be impressive.
Create an interface for the middle code being executed and have an object implement that interface. A well designed object, possibly with a Builder pattern can mitigate some of the Bus Station issues, but since this pattern is just the Dependency Injection patter turned inside-out it sometimes just pushes the dependency injection overhead of the above solution from a procedure to an object. This used to be the most general solution to the start-end problem. The first example below uses an abstract class and an anonymous implementation but that is just a minor variation on this technique.

A functional programmer would say that a lexical closure would be the obvious solution to this problem. It allows an arbitrary amount of code (including the caller's local variables) to be executed in the middle of some other code. No pre-evaluation, no Bus Station, no extra objects or interfaces. The outer "enclosing" code can do the start() and end() logic with the "enclosed" code in a little magic closure envelope in the middle. Java doesn't have closures or function pointers, but an anonymous inner class looks a little like a closure and the following code compiles, works, and even solves the Start-End problem for some special cases, though it might not win many beauty contests:

public class ClosureTest {
    private static abstract class StartEnd {
        public void doStartEnd() {
            System.out.println("Start");
            middle();
            System.out.println("End");
        }
        public abstract void middle();
    }

    public static void main(String[] args) {
        new StartEnd() {
            @Override
            public void middle() {
                System.out.println("Middle");
            }
        }.doStartEnd();
    }
}

Output:

Start
Middle
End

A sad limitation of this technique is that Java tries to prevent you from updating the main() method's local variables inside the the anonymous inner class by forcing you to make them final (immutable). The following WILL NOT COMPILE:

public static void main(String[] args) {
    int count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            // ERROR: local variable count is accessed from within
            // inner class; needs to be declared final
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + count);
}

A mutable wrapper class will work around this restriction. Uglier, but it works:

private static class MutableRef<T> {
    public T count;
}
public static void main(String[] args) {
    final MutableRef<Integer> mr = new MutableRef<Integer>();
    mr.count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            for (; mr.count < 5; mr.count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + mr.count);
}

Output:

Start
Middle
Middle
Middle
Middle
Middle
End
Total Count 5

Java 7's try-with-resources feature provides my favorite solution to this particular issue. Not a true general-purpose lexical closure, but sure looks like one for solving this particular problem:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        } // end of StartEnd
        System.out.println("Total Count " + count);
    }
}

One more detail... If you compile with -Xlint it may complain, "warning: [try] auto-closeable resource se is never referenced in body of corresponding try statement." I have found that using this pattern in my own code I usually use the se variable within the corresponding try block. But for the times that I don't, a preventCompilerWarning() method eliminates the warning:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
        public void preventCompilerWarning() { }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            se.preventCompilerWarning();
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
        System.out.println("Total Count " + count);
    }
}

Voilà - the Start-End problem solved! Unlike a lexical closure in a functional language, this technique executes the close() method even when a return statement is reached or an Exception is thrown. This may be good or bad depending on your situation.

Notice that the local variable count is being used "inside" the StartEnd block without being visible to that code? No dependency injection, interfaces, objects, etc. This is the essence of a lexical closure - a little bubble of extra variable scope without otherwise violating the privacy of the enclosed code. Hopefully you can imagine some of the versatile coding paradigms which the new try-with-resources block makes available in Java 7.

For a more general solution to the problem of closures in Java before Java 8, check out this post.

Wednesday, September 14, 2011

Automatic POJO Generation From a Database

Like JPA (or presumably Spring), Hibernate "reverse engineering" tools can generate POJOs (Plain Old Java Objects) from database tables and vice-versa. Generating database tables from Java code is probably best used as a one-time short-cut, suitable for rapid prototyping. Because everything in an application is dependent on the database (and not vice-versa), future changes must be made in the database first (and any existing data migrated there first as well), then propagated to all affected parts of the application.

I have found that anything which the "database reverse engineering" process does not generate for me breaks, usually sooner rather than later. Also that the hardest part of managing maintenance of a large system is the constant refactoring. To that end, I have developed 2 goals for the database reverse-engineering process:

It should keep my application in synch with any database changes, automatically updating as much of my application as is practical.
Where #1 is not possible, it should cause any affected areas of my application to generate a compile-time error.

Imagine a database table:

CREATE TABLE `user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `last_modifier_id` bigint(20) unsigned DEFAULT NULL,
  `company_id` bigint(20) unsigned NOT NULL,
  `identifier` varchar(64) NOT NULL COMMENT 'PIN.  Unique within a company',
  `first_name_c` varchar(40) DEFAULT NULL,
etc.

Database generation tools normally generate the following sort of fields:

public class User implements java.io.Serializable {

private long id;
private User lastModifier;
private Company company;
private String identifier;
private String firstNameC;
etc.

public long getId() { return id; }
public void setId(long x) { id = x; }

public User getLastModifier() { return lastModifier; }
public void setLastModifier(User x) { lastModifier = x; }

public Company getCompany() { return company; }
public void setCompany(Company x) { company = x; }

public String getIdentifier() { return identifier; }
public void setIdentifier(String x) { identifier = x; }

public String getFirstNameC() { return firstNameC; }
public void setFirstNameC(String x) { firstNameC = x; }
etc.

I have modified the generator to also generate the following fields for each column in the table (prefixed with SQL_) and for each field in the Java object (prefixed with HQL_):

public static final String SQL_user = "user"; // table name
public static final String SQL_id = "id";
public static final String SQL_last_modifier_id = "last_modifier_id";
public static final String SQL_company_id = "company_id";
public static final String SQL_identifier = "identifier";
public static final String SQL_first_name_c = "first_name_c";
etc.

public static final String HQL_User = "User"; // class name
public static final String HQL_id = "id";
public static final String HQL_lastModifier = "lastModifier";
public static final String HQL_company = "company";
public static final String HQL_identifier = "identifier";
public static final String HQL_firstNameC = "firstNameC";
etc.

This allows me to write code like the following:

Crit<User> dupIdentCrit = Crit.create(User.class);
dupIdentCrit.add(Restrictions.eq(User.HQL_company, this));
dupIdentCrit.add(Restrictions.eq(User.HQL_identifier, identifier));
List<User> dupIdentifierUsers = dupIdentCrit.list();

Here's an example of building a SQL statement for use with JDBC:

StringBuilder sqlB = new StringBuilder("select * from " + User.SQL_user +
                                       " where " + User.SQL_company_id +
                                       " = ?");
if (!includeDeleted) {
    sqlB.append(" and " + User.SQL_is_deleted + " is false");
}
sqlB.append(" order by " + User.SQL_last_name_c + ", " +
            User.SQL_first_name_c + ", " +
            User.SQL_middle_name_c);

When the database changes and I rebuild my objects, I get compile-time errors, showing me every piece of code I need to fix.

I am working on modifying the generator to also generate the following for any varchar or char fields because their maximum length is defined in the database columns:

public static final int MAX_identifier = 64;
public static final int MAX_firstNameC = 40;
etc.

I am considering parsing comments on each column to look for something like: 'min: 6' and generate the following from it:

public static final int MIN_identifier = 6;

That same technique could be used with int columns to define minimum and maximum values which could be used in validation, in the GUI, and in the help of an appliction. I'm also experimenting with using the HQL_ tokens as the names of the input fields on the GUI screens.

I'm not sure how or even if this would work with stored procedures.

Special thanks to J. Michael Greata, whose interest and suggestions over the past several years have encouraged and shaped the development of this project. Also to Arash Tavakoli whose post in the LinkedIn Java Architects group inspired me to organize my thoughts into this posting.

Thursday, September 8, 2011

Checked vs. Unchecked Exceptions

Two-Sentence Review: Checked exceptions have to be declared in the method signature and dealt with by the calling code; Unchecked, RuntimeExceptions don't. The Java compiler enforces this rule with a compile-time error.

I've been enjoying, "Java: The Good Parts" by Jim Waldo and just finished chapter 3: "Exceptions." At the end of the chapter, Mr. Waldo takes a humorously firm stance against RuntimeExceptions. Indeed, RuntimeExceptions can be evilly misused. Yet I believe there is a time and a place for these exceptions. Knowing when to use them requires an understanding of who is at fault for a particular problem, and whether the problem is recoverable or not.

The following code sample bit me the other day:

// EVIL - NEVER DO THIS!
public static void write(String s) {
    try {
        out.write(s);
    } catch (IOException ioe) {
        throw new IllegalStateException();
    }
}

It is evil (as Mr. Waldo says) for several reasons:

This method is not being responsible for handling its own problems. IOExceptions happen for good reasons other than coding errors - users can delete files, network connections can close, etc. Better than hiding this exception this method might retry the write() or recover some other way. If recovery were not practical, this method would do better to let the original exception bubble up to the caller. Another improvement might be to have this method report the error some other suitable way (e.g. to the user).
Wrapping a checked exception with a RuntimeException means that this method signature is missing vital information that the caller really needs to know about. It hides critical details (any problem with the write() now causes an unexpected exception).
Absentmindedly wrapping an exception with another exception serves only to complicate the stack trace - it further hides the cause of the problem. It's good to wrap an exception if you have critical information to add (e.g. the method handles two streams and you wrap an exception with the message of which stream it applies to). In the example above, nothing is added. In fact, this code doesn't wrap an exception, it throws it away and substitutes another which is even worse.
The IllegalStateException is blank, giving the unfortunate caller no idea what went wrong.

The following shows proper usage of RuntimeExceptions:

/** Creates or returns an existing immutable YearMonth object.
@param year a valid year
@param month a one-based month between 1-12
@return the relevant YearMonth object
*/
public static YearMonth valueOf(int year, int month) {
    if ( (month < 1) || (month > 12) ) {
        throw new IllegalArgumentException("Month must be a positive" +
                                           " integer such that 0 < n < 13");
    }
...

This is good because:

Valid input values are documented clearly for the caller in the JavaDoc.
The inputs are checked at the beginning of the method, before any processing is done, to fail-fast and fail-loudly if passed garbage.
It throws a RuntimeException to inform the caller that they have made a coding error.
The exception provides information about what the caller did wrong.

You wouldn't want to use a checked exception because it would force the responsible caller's code to check their input values twice:

YearMonth ym;
// Responsibly check inputs before calling valueOf()
if (m > 12) {
    out.write("Month too big");
} else if (m < 1) {
    out.write("Month too small");
} else {
    // I already checked my inputs.  Why do I have to check for an exception
    // too?  Doing so doesn't benefit me in any way. It just makes the
    // interface unnecessarily difficult to use!
    try {
        ym = YearMonth.valueOf(y, m);
    } catch (Exception e) {
        out.println("month still invalid: " + e.getMessage());
    }
}

The critical nuance here is that RuntimeExceptions should be used to indicate a programming error on the part of the caller of the function that throws it. They are generally used in the first few lines of the function as a defensive check for invalid input values. Each RuntimeException should include a description of the problem (not be blank).

RuntimeExceptions can also be used to catch invalid state as follows:

enum Numb { ONE, TWO; }
private Numb num = null;

public void init(Numb n) {
    if (n == null) {
        throw new IllegalArgumentException("init cannot take a null Numb");
    }
    num = n;
}

public void showNum() {
    if (ONE == num) {
        out.println("1");
    } else if (TWO == num) {
        out.println("2");
    } else {
         throw new IllegalStateException("Unhandled value of Numb or called" +
                                         " showNum without initializing the" +
                                         " num");
    }
}

When some programmer adds THREE to Numb and doesn't account for the new possible value, it fails hard and fast, making the problem easy to find and fix.

RuntimeExceptions are a good way to make code fail hard and fast without forcing the caller to check their input values twice. Used properly they can make sneaky coding errors obvious. Used improperly, they can make and obvious errors sneaky.

Wednesday, August 31, 2011

Object Mutability

This post is part of a series on Comparing Objects.

I've read a lot lately about making objects immutable whenever possible. "Programming in Scala" lists Immutable Object Tradeoffs as follows:

Advantages of Immutable Objects

Often easier to reason about because they do not have complex state.
Can be passed freely (without making defensive copies) to things that might try to modify them
Impossible for two threads accessing the same immutable object to corrupt it.
They make safe Hashtable keys (if you put a mutable object in a Hashtable, then change it in a way that changes its hashcode, the Hashtable will no longer be able to use it as a key because it will look for that object in the wrong bucket and not find it).

Disadvantages

Sometimes require a large object graph to be copied (to create a new, modified version of the object). This can cause performance and garbage collection bottlenecks.

For most purposes, an object representing a month can be made immutable - February 2003 will never become anything other than what it is. But a User record is not immutable. People get married or change their name for other reasons. Phone numbers, addresses, hair color, height, weight, and virtually every other aspect of a person can change. Yet the person is still the same person. This is what surrogate keys model in a database - that everything about a record can change, yet it can still be meaningfully the same record.

In order to use an object in a hash-backed Collection (in Java), its hashcode must NOT change. The simplest way to accomplish this is to make the hashcode of a mutable persistent object its surrogate key and to use that key as a primary comparison in the equals method as well (see my older post on Implementing equals(), hashcode(), and compareTo()).

To make an immutable object, you sometimes need a mutable constructor object, like StringBuilder and String. StringBuilder allows you to change your object as many times as you want, then get an immutable version by calling toString(). This is clean and safe, but has some small costs in time and memory (transforming the immutable StringBuilder into a new immutable String object, then throwing away the StringBuilder). An alternative that I have not seen much is to create an immutable interface, extend a mutable interface from it, and then extend your object from that.

Here's an example based on java.util.List. Pretend each interface or class is in its own file:

// All the immutable-friendly methods from java.util.List.
// Interfaces like these could easily be retrofitted into
// the existing Java collections framework
public interface ImmutableList {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean containsAll(Collection<?> c);
    boolean equals(Object o);
    int hashCode();
    E get(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
}

// This interface adds the mutators
public interface java.util.List extends ImmutableList {
    boolean add(E e);
    boolean remove(Object o);
    boolean addAll(Collection<? extends E> c);
    boolean addAll(int index, Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    boolean retainAll(Collection<?> c);
    void clear();
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
}

public class java.util.ArrayList implements List {
    // just as it is now...
}

public class MyClass {
    someMethod(ImmutableList<String> ils) {
        // can't change the list
    }

    public static void main(String[] args) {
        List<String> myStrings = new ArrayList<String>();
        myStrings.add("hello");
        myStrings.add("world");
        someMethod(myStrings);
        // Totally safe:
        System.out.println(myStrings.get(1));
    }
}

This doesn't solve the problem of passing a list to existing untrusted code that might try to change it. It also doesn't prevent the calling code from modifying myStrings from a separate thread while someMethod() is working on it. But it does provide a way (going forward) for a method like someMethod() to declare that it cannot modify the list. The programmer of someMethod() cannot compile her code if she tries to modify the list (well, short of using reflection).

Guaranteed immutability can be critical in writing concurrent code and for keys in hashtables. Not all objects can be made immutable, but many of those objects have immutable surrogate keys that, if used properly, work around the pitfalls of mutability. Limiting mutability and avoiding common mutable object pitfalls can lead to fewer bugs, easier readability, and improved maintainability.