Monday, March 28, 2016

Safe Harbor replacement allows the US Federal Trade Commission to spy on European citizens

There are some limits, but it's ironic that "Safe Harbor" was declared invalid by the highest European court for supposedly allowing this. Now the replacement explicitly allows it!

How so? The fourth bullet point of the EU-US Privacy Sheild Framework Fact Sheet reads:

The U.S. Federal Trade Commission (FTC) has committed to work closely with the DPA to provide enforcement assistance, which, in appropriate cases, could include information sharing and investigative assistance pursuant to the U.S. SAFE WEB ACT.

The US Safe Web Act

[A]llows increased cooperation with foreign law enforcement authorities through confidential information sharing, provision of investigative assistance, and enhanced staff exchanges. In certain limited circumstances it enables the FTC to obtain information in domestic or foreign consumer protection matters from third parties without tipping off investigative targets.

So the US Federal Trade Commission can obtain information without alerting the people being observed and share it. That's a pretty good definition of "spying." Having the one agreement reference the other by name means that the EU-US Privacy Shield doesn't have this language in it anywhere, but still effectively guarantees the right to spy.

As an aside, if someone knows where the now defunct "Safe Harbor" agreement allows spying, I hope you will point me to it. I thought I was familiar with that legislation and was not aware of any explicit "spying allowed" clause. I always thought the agreement that allowed the US and UK to spy on each other was the "Five Eyes" alliance created in the 1940's, but that didn't seem to get any press.

IANAL

Thursday, June 12, 2014

Comparators

This post is part of a series on Comparing Objects.

Some things lend themselves to default ordering, such as sorting users by their alphabetized last names. Yet users are often sorted by other criteria, such as the number of purchases they make per year, the amount their account is in debit, or what time they clicked the "purchase" button during a limited promotion. To represent this in Java, simply define a comparator for each context:
enum UserByPromoClick implements Comparator<User> {
    INSTANCE;
    public int compare(User left, User right) {
        ...
    }
}

enum UserByPurchases implements Comparator<User> {
    ...

enum UserByDebit implements Comparator<User> {
    ...
Implementation note: compare(left, right) and compareTo(other) are both instance methods, which means you need to create an instance of the appropriate Comparator in order to use them.  Most comparators are immutable and unchanging.  This means they should be implemented as Enums with a single immutable instance shared and reused by all clients.

Various sort orders are often combined. Two accounts equally in debit could be sorted to show up next to each other on an expense report. Their names would only be used as the "tie breaker" in this ordering.
public int compare(User left, User right) {
    // Assuming getPromoClickMs returns Integer.MAX_VALUE
    // if the user didn't click on the promo.
    if (left.getPromoClickMs() > right.getPromoClickMs()) { return 1; }
    if (left.getPromoClickMs() < right.getPromoClickMs()) { return -1; }

    // In a tie, sort by name instead.
    return UserByName.compare(left, right);
}
The implementation of Objects.equals(left, right) performs a referential equality check for speed and to ensure that each object is equal to itself.  This has the side effect that if someone passes compare(null, null) it returns true, which makes sense some: undefined is undefined.  But should we ever compare anything to null? For a while I thought, "What harm could come from just sorting nulls last?"

Sortorder

What harm indeed!  Really, there is no meaningful way to compare a value to null because null represents "undefined." Well, isGreaterThan(other) can meaningfully return false if other is null, but the general-purpose compare(left, right) method cannot.  It must return less-than (any negative int), greater-than (any positive int), or equal (0). The only rational thing to do if either parameter to the compare(a, b) method is null is to throw an exception.
if (left == null) {
    throw new IllegalArgumentException("Can't compare nulls");
}
if (left == right) { return 0; } // Also ensures right != null
If your class can be represented as an integer, compare() could be implemented as a simple subtraction.  Just keep in mind that if your subtraction could exceed Integer.MIN_VALUE, it can roll and give you a positive result instead of a negative one. So if that could happen, you must use greater-than/less-than comparasions instead of subtraction.
int ret = left.getPromoClickMs() - right.getPromoClickMs();
if (ret != 0) { return ret; }

// In a tie, sort by name instead.
return UserByName.compare(left, right);
You can have as many repetitions of this kind of thing as you want:
ret = Xxxxx.compare(this, that);
if (ret != 0) { return ret; }

Persistence and Proxy Objects

When implementing a Comparator (as opposed to Comparable), then neither object can be trusted to be initialized and get/set should be used instead of direct access to instance fields on both objects. If you implement your Comparator as a separate class and your fields are private, the compiler will enforce this for you. But for Comparable and for Comparators implemented in the same class file as the object they are comparing, the compiler will NOT warn you, so you must be very careful.

Thursday, February 6, 2014

Degrees of Lazy Evaluation

The topic of lazy evaluation often comes up when thinking about transforming collections of data. Lazy Evaluation is particularly useful for values that are expensive to compute, but it is also useful as a way of chaining functions to apply in a single pass through the underlying data. Laziness can be coded manually, or baked into an API such as a collection or transformation. Implementing some kinds of laziness is more expensive than others. Sometimes laziness may be inappropriate, such as when a database connection is open and waiting to be read from. I couldn't find degrees of laziness with a quick web search, so I'm defining some here.

Levels of Evaluation Laziness

0: Eager
The entire data source is evaluated immediately. Most statements in most languages work this way most of the time, because that's the way the underlying bytecodes and hardware are designed to work. Eager evaluation of a collection can be done concurrently from inside the collection. The reduce function is implemented this way in Clojure.
1: Delayed
The entire source data is evaluated on the first usage of the result. Examples: builder patterns, isTrueForAll(), forEach(), Java 8 collection transformations, and default Hibernate joined table loading. Delayed evaluation of a collection can be done concurrently from inside the collection.
2: Chunked
The source data is evaluated in the minimum size chunks for each call: filter(), contains(). This level is defined by asymptotic minimum and maximum laziness of levels 1 and 3.
3: Incremental
A maximum of one item is evaluated per call: map(). Incremental evaluation cannot be done concurrently from inside the collection because that would require evaluating more than one item at a time.

Level 2 is weird. A filter that matches nothing, or only the last element is effectively level 1. A filter that matches every element is effectively level 3. I call it level 2 in both cases because of its range. I think it's important to acknowledge level 2 as a unique level because it's chunked. A balanced hashtable of a certain size and load could evaluate one bucket at a time for contains() and be very reliably level 2 without ever being level 1 or level 3. Concurrent processing of Level 2 may be possible inside a collection in some circumstances, but its practicality may be limited when the chunk sizes are not known in advance.

For a single element value, Lazy levels 1, 2, and 3 are indistinguishable - it's either Eager or it's Lazy - period. Two elements can be either Lazy 1 or Lazy 3. With 3 or more elements, Lazy 2 becomes a meaningful description of the laziness.

The results of lazy evaluation can be cached, particularly when that evaluation is expensive and the values will be reused. Clojure Sequences are based on a linked-list data structure which lends itself to (Lazy level 3) incremental evaluation. They are also cached/memoized by default, making them very safe, but somewhat resource intensive.

Using Clojure Sequences requires care with some data sources in order to process a finite subset of the data that fits in memory, or to iterate without holding onto the head of the sequence (thus making the head eligible for garbage collection before the end of the sequence is processed)1. Clojure evaluates (some?) sequences in chunks for efficiency, but incremental evaluation can be forced with a little cleverness.1

Implementation Concerns

Lazy level 0-3 can be represented as a lightweight, one-pass disposable view*, or in a heavier memoized persistent data structure like a linked list. Since views and linked lists traverse the underlying data incrementally, neither is suitable for internal concurrency. Either of these can be processed concurrently externally if necessary, but doing so seems like a design failure at some level - why eagerly process something you went to the trouble of making lazy?

What's a View?

The Iterator interface in Java and some other languages is not suitable for using concurrently. It has 2 methods: hasNext() and next() and lets you enumerate an underlying data source once. Some data sources are ordered, and some are not - the iterator respects that.

The problem is that even if you synchronize both of these methods, two threads can both call hasNext() when there is only one element left. At any time, a broken client can call next() without calling hasNext(). In each of these cases, a thread that calls hasNext() and gets "true" can still call next() and get NoSuchElementException. Yuck!

A better interface, which I'm calling a View (possibly because I misunderstood Paul Phillips using that word). It can be lazily evaluated or iterate over a previously realized collection, but it has a single method: next() which can return the next element (whether or not its null) or a sentinal value: USED_UP. This is 100% thread safe. An arbitrary number of threads can safely and quickly call:

T item = next();
while (item != USED_UP) {
    // do something with item
    item = next();
}

Footnotes

1 Many thanks to my friend Jeff Dik for pointing this out and correcting my earlier comments.

Sunday, January 26, 2014

Upgrading From Windows XP Before April 8th 2014

The Threat

Security experts have been predicting that malware creators all over the world are finding exploits [in Windows XP] and holding on to them. They know if they unleash an exploit now, it will be fixed. But if they are patient and wait, and hope Microsoft doesn't find the vulnerability, then they can use it for maximum gain come April 9.
The same holds true for Office 2003. Support for it ends on April 15, one week later.

Source: http://www.networkworld.com/community/blog/why-april-9th-might-be-its-worst-day-2014

Upgrade Options: Windows, Mac, Android, or Linux

What kind of computer or operating system you use is determined by what software you need to run.

Office

The first question you should ask yourself is exactly how compatible you need to be with the latest version of Microsoft Office. This is not a yes-or-no question. We are all somewhere on a sliding scale of MS Office compatibility. Very few people require full compatibility with the most obscure features of Microsoft Office. MS Office isn't even compatible with different versions of itself!

100% compatibility with the very latest version of MS Office requires Windows 7 or 8 which probably won't run on your old XP machine. You can purchase the necessary hardware and software from any reputable computer store except a Mac store.

You can run the latest MS Office on Mac, Android, or Linux, but only by installing a virtual machine, then installing Windows, then installing MS Office. This is a pain in the neck, (both the install and the ongoing maintenance) but it can be done and is an expensive but effective way to meet an occasional need for the latest MS Office. If bleeding edge MS Office is the primary reason for having the computer, it's easier and cheaper to buy a Windows computer and be done with it.

Many home users meet their basic office needs with Google Docs which comes free with your home Gmail account. It is compatible only with very basic MS Office documents - no fancy templates, embedded objects, or macros, but it's also much safer from virus threats as a result. It may not be advanced enough for sharing documents with customers and prospects, but for home use it works great with letters, posters, simple spreadsheets, and for sharing them with friends.

LibreOffice is free, runs on any operating system (comes pre-installed on popular Linux flavors), and is roughly equivalent to being one version behind MS Office which is the best you can do natively on the Mac anyway. It's easy to use, powerful, and has Visio-like tools and PDF conversion built-in. This is what I use almost exclusively, even though I have several versions of MS Office installed in virtual Windows machines. Try it out to see if you can use it instead of paying for Microsoft Office:

  • Install LibreOffice
  • Tools -> Options -> Load/Save -> General
    • Unckeck "Warn when not saving in ODF"
    • Document Type: Text, Always Save As: Word 97...2003 (NOT Template)
    • Document Type: Spreadsheet, Always Save As: Word 97...2003 (NOT Template)
  • Tools -> Options -> Load/Save -> General -> Microsoft Office
    • Check all the boxes.

Other Software

GoToMeeting/GotoWebinar works on Windows, Mac, Android, and now Linux. I haven't tried the Linux version yet, but it only allows you to attend meetings - not share your screen or use a web-cam. Screen sharing on Linux also works well using Skype (or TeamViewer).

Photoshop is Windows and Mac only, but I have found that a combination of Darktable and GIMP meets 100% of my needs (though Photoshop is more convenient and user-friendly if you can afford it).

Linux

Given the above limitations, Linux is a great way to turbocharge an old computer. I switched to Linux about 5 years ago because of the reliability, ease of use, security, and availability of free software. I hope I never have to go back. The only thing I have used Windows for in the past 6 months is GoToMeeting. I've been using Ubuntu Linux 13.10 which is a more Mac-like experience and a little easier to upgrade. Mint Cinnamon Linux is more like Windows 7.

Try either one out by burning a Live CD and booting from it. That will show you if you need to purchase an nVidia graphics card (on a desktop) or a new wireless card (on a laptop) for compatibility reasons, but these can be acquired very cheaply. If you are buying hardware, a Solid State Drive (SSD) can be a miracle for an aging computer. My 7-year-old Ubuntu laptop with an SSD boots in less than 8 seconds and shuts down in less than 3.

I recommend Mint/Cinnamon Linux or Ubuntu over Xubuntu and Lubuntu. The former are just as lightweight but have more usability features than the latter. Otherwise, this article is pretty good and fills in more details

Monday, October 28, 2013

Expression Problem: Discussion

logaan was kind enough to leave some comments on twitter regarding my recent post on Refactoring in Java, Scala, and Clojure. I needed more than 140 characters so I am responding here.

Discussion

Glen replied: (Untyped) Clojure wraps the function with converters instead of wrapping a user-defined data type. Functional view of same issue?

logaan replied: My understanding is that the expression problem is linked to polymorphism. Clojure's solution is protocols. Protocols allow you to add more types to existing behaviour and more behaviour to existing types. In Scala it needs implicits.

Reply

Thank you Logaan for your insightful comments. I really appreciate you taking the time to read my article and offer constructive criticism. Forming this response was very educational for me.

Chris Houser's Solution to the Expression Problem on InfoQ was a very interesting talk and, I assume, the basis for your criticism. It turns out that there was a debate on Lambda the Ultimate about the nature of The Expression Problem in response to Chouser's presentation.

Since Clojure rejects type safety as a design goal, applying Clojure to the Expression problem seemed somewhat of a stretch to me. On the other hand, Martin Odersky's paper points out:

The term expression problem was originally coined by Phil Wadler in a post on the Java-Genericity mailing list, in which he also proposed a solution written in an extended version of Generic Java. Only later it appeared that Wadler’s solution could not be typed.

If Philip Wadler's solution to his 1998 problem was not type-safe, that made me think the door was open to applying this problem to a dynamic language like Clojure.

In terms of needing Scala's implicits to solve the Expression problem, there may be some aspect to this that I was not understanding. It was actually Odersky's paper that made me think traits were the solution to this problem. I am impressed with how well they solve it. Especially compared to all the typing Java requires when I have 8 implementing classes of an interface that needs to change!

Ultimately, the exact definition of Wadler's original problem is much less interesting to me than solving or side-stepping this general type of problem. In response to your insightful criticism, I have renamed my post, "Refactoring in 3 Languages" and provided an alternate Clojure solution below. Thank you again for taking the time to respond.

Alternate Solution

The Clojure page on protocols actually mentions the Expression Problem. But protocols look like more complication than my little example requires. I can imagine how more complicated "Expression-Like Problems" are well served by records and protocols, yet I feel that Clojure's sweetest spot involves side-stepping these issues when practical by ignoring data specification as much as possible (by leveraging maps).

Here is an alternate "solution" to my earlier post using records and protocols:

(defrecord YearMonth [year month])

(defprotocol MONTH_ADDIBLE 
  (addMonths [MONTH_ADDIBLE mos]))

(extend-type YearMonth
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth)))))
                 
;; Tests
(addMonths (YearMonth. 2013, 7) 2)
;; #user.YearMonth{:year 2013, :month 9}
(addMonths (YearMonth. 2012, 12) 1)
;; #user.YearMonth{:year 2013, :month 1}
(addMonths (YearMonth. 2013, 1) -1)
;; #user.YearMonth{:year 2012, :month 12}

;; With an additional field
(addMonths (assoc (YearMonth. 2013, 7) :otherField1 "One") 2)
;; #user.YearMonth{:year 2013, :month 9, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2012, 12) :otherField1 "One") 1)
;; #user.YearMonth{:year 2013, :month 1, :otherField1 "One"}
(addMonths (assoc (YearMonth. 2013, 1) :otherField1 "One") -1)
;; #user.YearMonth{:year 2012, :month 12, :otherField1 "One"}

(defrecord YearMonth2 [yyyyMm])

(defn yearAndMonthToYm [year month] (+ (* year 100) month))

(extend-type YearMonth2
  MONTH_ADDIBLE
  (addMonths [ym, addedMonths]
      (let [year (quot (:yyyyMm ym) 100),
            newMonth (+ (rem (:yyyyMm ym) 100) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :yyyyMm (yearAndMonthToYm (+ year (quot m 12)),
                                                             (+ (rem m 12) 1))))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ year (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :yyyyMm (yearAndMonthToYm y, m)))
                 :else (assoc ym :yyyyMm (yearAndMonthToYm year, newMonth))))))

(addMonths (YearMonth2. 201307) 2)
;; #user.YearMonth2{:yyyyMm 201309}

;; With an additional field
(addMonths (assoc (YearMonth2. 201307) :otherField2 1.1) 2)
;; #user.YearMonth2{:yyyyMm 201309, :otherField2 1.1}

Sunday, September 29, 2013

Refactoring in Java, Scala, and Clojure

Update: Attend my presentation on this post at DevNexus February 25th 2014

Motivation

I don't use the words, Strongly-typed or Dynamic much in this post, but thinking about the relative costs and benefits of type safety was the primary inspiration behind it. You may want to keep your own mental tally of how type-safety helps and how it hurts in the following examples. Because type-safety means so many different things in different languages, I can only assess it's practical merits (or detriments) in the context of real-world examples - hence this post. But maybe after looking at these examples, you can infer general principals from them?

I personally find type safety to be useful at work because of the person-years of development that went into the system that I work on most. But I am not trying to convert people to type-safety, or away from it. My goal is to make the issues more visible so that we can all write better code more easily in the future. When I gave this talk to the Asheville Coders League, I felt some measure of satisfaction that one person told me afterwards that they were going to look into Clojure and another that they would look into Scala.

Problem

When I change an interface in Java, I then have to update all its implementations which can be difficult and time consuming. Making a wrapper class is often even more time consuming. This pain in the neck is sometimes called, "The Expression Problem" and it makes a good example for comparing these three languages.

For this comparison, I will use a class that models a Year/Month combination and a function, "addMonths" that takes the number of months to add (positive or negative) and returns a new YearMonth. Originally we used a data structure with 2 fields (year and month), but it complicated the database queries for ranges of months, so we changed it to a single int of the format YyyyMm. Now we can use > and < to compare YyyyMm fields in our SQL queries - and the raw data is still human readable.

Despite wild claims that Object Oriented Programming is all about mutation, I'm going to use immutable classes in all three languages (a few mutable local variables are used, but never exposed outside of the function that declares them).

Full source examples from this article are available on Github both before (xxxx1) and after (xxxx2) refactoring.

Here's the original Java interface, translated into Scala and Clojure:

Original Interface

Java Interface

public interface YearMonthInterface {
    public int getYear();
    public int getMonth();
}

Scala Trait

Scala has traits instead of interfaces. A Scala trait can include implementations (but not constructors) as we'll see later.

trait YearMonthTrait {
  def year:Int
  def month:Int
}

Clojure

No interface is necessary, but the compiler won't tell you if you fail to match your data to your functions. Protocols could be used, but that's probably not typical and it's certainly not needed for this simple example.

Base YearMonth Implementation

A few simple tests should provide the best overview:

Tests of Base Implementation

// Java
YearMonth.addMonths(YearMonth.of(2013, 7), 2);
// 2013-9
YearMonth.addMonths(YearMonth.of(2012, 12), 1);
// 2013-1
YearMonth.addMonths(YearMonth.of(2013, 1), -1);
// 2012-12

// Scala
YearMonth.addMonths(YearMonth(2013, 7), 2)
// YearMonth(2013,9)
YearMonth.addMonths(YearMonth(2012, 12), 1)
// YearMonth(2013,1)
YearMonth.addMonths(YearMonth(2013, 1), -1)
// YearMonth(2012,12)

;; Clojure
(addMonths {:year 2013, :month 7} 2)
;; {:year 2013, :month 9}
(addMonths {:year 2012, :month 12} 1)
;; {:year 2013, :month 1}
(addMonths {:year 2013, :month 1} -1)
;; {:year 2012, :month 12}

Java Class

public final class YearMonth implements YearMonthInterface {
    private final int year;
    private final int month;
    
    private YearMonth(int y, int m) { year = y; month = m; }

    public static YearMonth of(int y, int m) {
      if (m > 12) {
          // convert to zero-based months for math
          m--;
          // Carry any extra months over to the year
          y = y + (m / 12);
          // Adjust month to be within one year
          m = m % 12;
          // convert back to one-based months
          m++;
      } else if (m < 1) {
          // Carry any extra months over to the year, but the first year
          // in this case is still year-1
          y = y + (m / 12) - 1;
          // Adjust negative month to be within one year.
          // To get the positive month, subtract it from 12
          m = 12 + (m % 12);
      }
      return new YearMonth(y, m);
    }
    
    @Override
    public int getYear() { return year; }
    
    @Override
    public int getMonth() { return month; }
    
    public static YearMonth addMonths(YearMonthInterface ym,
                                      int addedMonths) {
        return of(ym.getYear(), ym.getMonth() + addedMonths);
    }
    
    @Override
    public String toString() {
        return new StringBuilder().append(year).append("-")
                                  .append(month).toString();
    }
}

Scala Case Class and Companion Object

A case class in Scala automates writing the similar Java code. Scala does not have static methods. Instead, everything Java would call a "static" method goes in the companion object in Scala. A companion object is a singleton instance with the same name and in the same file as the class it belongs to.

case class YearMonth(override val year:Int,
                     override val month:Int) extends YearMonthTrait

object YearMonth {
  def addMonths(ym:YearMonthTrait, addedMonths:Int):YearMonth = {
    val newMonth = ym.month + addedMonths
    if (newMonth > 12) {
      // convert to zero-based months for math
      val m = newMonth - 1
      // Carry any extra months over to the year
      new YearMonth(ym.year + (m / 12), (m % 12) + 1)
    } else if (newMonth < 1) {
      // Carry any extra months over to the year, but the
      // first year in this case is still year-1
      val y = ym.year + (newMonth / 12) - 1
      // Adjust negative month to be within one year.
      // To get the positive month, subtract it from 12
      val m = 12 + (newMonth % 12)
      new YearMonth(y, m)
    } else {
      new YearMonth(ym.year, newMonth)
    }
  }
}

Clojure Function

Instead of declaring data types as classes, Clojure prefers to use immutable maps (hash maps). So we skip all data definition steps above and write a function that assumes a map with certain keys. These keys are analogous to the fields in Java and Scala.

(defn addMonths [ym, addedMonths]
      (let [newMonth (+ (:month ym) addedMonths)]
           (cond (> newMonth 12)
                    ;; convert to zero-based months for math
                    (let [m (- newMonth 1)]
                         ;; Carry any extra months over to the year
                         (assoc ym :year (+ (:year ym) (quot m 12)),
                                   :month (+ (rem m 12) 1)))
                 (< newMonth 1)
                    ;; Carry any extra months over to the year, but the
                    ;; first year in this case is still year-1
                    (let [y (dec (+ (:year ym) (quot newMonth 12))),
                          ;; Adjust negative month to be within one year.
                          ;; To get the positive month, subtract it from 12
                          m (+ 12 (rem newMonth 12))]
                       (assoc ym :year y :month m))
                 :else (assoc ym :month newMonth))))

Add an Implementing Class

Here we add a second implementing class that contains an additional field.

Test Implementing Class

// Java
YearMonth.addMonths(MonthlyA.of("One", 2013, 7), 2)
// 2013-9

// Scala
YearMonth.addMonths(MonthlyA("One", 2013, 7), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField1 "One", :year 2013, :month 7} 2)
;; {:otherField1 "One", :year 2013, :month 9}

Java

public class MonthlyA implements YearMonthInterface {
    private final  String otherField1;
    private final int year;
    private final int month;

    private MonthlyA(String s, int y, int m) {
        otherField1 = s; year = y; month = m;
    }

    public static MonthlyA of(String s, int y, int m) {
        return new MonthlyA(s, y, m);
    }

    public String getOtherField1() { return otherField1; }

    @Override
    public int getYear() { return year; }

    @Override
    public int getMonth() { return month; }
}

Scala

case class MonthlyA(otherField1:String,
                    override val year:Int,
                    override val month:Int) extends YearMonthTrait

Clojure

No custom data structure is needed because Clojure leverages a Map.


Change Data Representation From Year & Month to YyyyMm

Test

// Java
YearMonth.addMonths(YearMonth.of(201307), 2)
// 2013-9

// Scala
YearMonth.addMonths(YearMonth(201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:yyyyMm 201307} 2)
;; {:yyyyMm 201309}

Java

Java requires that you manually update all the old code to be compatible with new data format. While I'm at it, I'm going to add a convenience static factory method to the base implementation that takes the new data format.

public interface YearMonthInterface {

    ... old methods unchanged ...

    // New! @return yyyyMm or YearMonth.of(year, month).getYyyyMm()
    public int getYyyyMm();
}

public class YearMonth implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    public static YearMonth of(int YyyyMm) {
        return new YearMonth(YyyyMm / 100, YyyyMm % 100);
    }

    // New!
    @Override
    public int getYyyyMm() {
        return (year * 100) + month;
    }
}

public class MonthlyA implements YearMonthInterface {

    ... old methods unchanged ...

    // New!
    @Override
    public int getYyyyMm() {
        return YearMonth.of(year, month).getYyyyMm();
    }
}

Scala

Scala lets you add your implementation logic right in the trait instead of touching any implementing classes, but I'm going to add a new factory method to the base class that accepts the new data format the same way I did in Java.

Additional constructors in Scala are implemented as factory methods in the companion object. apply() is the default name for a method, so you don't need to specify it in your client code. You can use it just like a normal factory/constructor (except for constructor pattern matching) as shown in the "Scala Test" example below.

trait YearMonthTrait {

    ... old methods unchanged ...

  // New!
  def yyyyMm:Int = (year * 100) + month
}

object YearMonth {
  // Add yyyyMm factory method to the YearMonth companion object
  // This is like the extra "of" method we just added to the Java version.
  def apply(yyyyMm:Int) = new YearMonth((yyyyMm / 100), (yyyyMm % 100))

    ... old methods unchanged ...

}

Clojure

I found it easiest/clearest to add two conversion methods to make the Clojure function handle both the new and old data formats.

(defn ymToOld [ym] (dissoc (assoc ym :year (quot (:yyyyMm ym) 100)
                                     :month (rem (:yyyyMm ym) 100))
                           :yyyyMm))

(defn ymToNew [ym] (dissoc (assoc ym :yyyyMm (+ (* (:year ym) 100)
                                                (:month ym)))
                           :year :month))

(defn addMonths [ym, addedMonths]
      (if (contains? ym :yyyyMm)
                     (ymToNew (addMonths (ymToOld ym), addedMonths))
          (let [newMonth (+ (:month ym) addedMonths)]

               ... same code from before ...

Add a New Implementing Class Using the New Data Format

Now that all the old code is working with the new data format, we can add a new class, MonthlyB that makes use of the new format internally.

Test New Class

All the old tests pass, even with the new code. I am only showing a few new tests for brevity.

// Java
YearMonth.addMonths(MonthlyB.of(1.1, 201307), 2);
// 2013-9

// Scala
YearMonth.addMonths(MonthlyB(1.1, 201307), 2)
// YearMonth(2013,9)

;; Clojure
(addMonths {:otherField2 1.1 :yyyyMm 201307} 2)
;; {:otherField2 1.1, :yyyyMm 201309}

Java

In Java, we have to manually add support for the old data format to the new class.

public class MonthlyB implements YearMonthInterface {
    private final double otherField2;
    private final int yyyyMm;

    private MonthlyB(double d, int yyM) {
        otherField2 = d; yyyyMm = yyM;
    }

    public static MonthlyB of(double d, int yyM) {
        return new MonthlyB(d, yyM);
    }

    public double getOtherField2() { return otherField2; }

    @Override
    public int getYear() { return yyyyMm / 100; }

    @Override
    public int getMonth() { return (yyyyMm % 100); }

    @Override
    public int getYyyyMm() { return yyyyMm; }
}

Scala

In Scala, an adapter trait makes any new classes play nicely with the old code. It only needs to be specified once and can be "mixed in" to as many new classes as necessary.

trait YearMonthNew extends YearMonthTrait {
  def yyyyMm:Int
  def year:Int = yyyyMm / 100
  def month:Int = (yyyyMm % 100)
}

case class MonthlyB(otherField1:Double,
                    override val yyyyMm:Int) extends YearMonthNew

Clojure

Clojure does not specify data types.

Additional Considerations

  • Compiling Scala is slow, taking 2-3x as long as Java. Sbt, on the other hand, is extremely clever, maximizing processor usage and deciding not to compile everything unless it needs to, so compiling Scala with SBT may effectively be faster than compiling Java with Ant.
  • Compiled Clojure code executes about 50% slower than comparable Scala/Java code
  • Both Scala and Clojure require a few small jar files to compile which hold their specific APIs.

Conclusions

All three languages got the job done. All three required specifying the logic for data transformation - the addMonths function. Data structures (user-defined types) showed the biggest difference between the three languages.

In Java, more work is spent defining and updating the types than defining functions that work on them. In Scala, the up-front work of defining types is small and elegant; a minor distraction from the "real work" of transforming that data. In Clojure, all attention is placed on the functions while data structures almost disappear altogether. This is a very beautiful and fast way to code that yields a very simple system, but it lacks the safety guarantee that type-safety gives the other languages. Comprehensive unit test coverage can mitigate this risk, but that is another form of complexity with its own maintenance cost.

In the beginning, Java solved virtually every major issue with C++ and created the JVM which these other two languages are built on. But it's showing its age. I really have trouble finding a situation where Java would win. I suppose if a small jar file size was critical... Really, the biggest advantage Java has over Scala is faster compile times. Maybe if you write in Clojure, you could use small amounts of Java for performance in critical areas? Still, I'd rather use Scala for that than Java.

Both Scala and Clojure seem to eliminate a lot of work that is required in Java, though they take fundamentally different approaches to doing so.

Wednesday, September 18, 2013

Comparing Objects is Relative

This post is part of a series on Comparing Objects.

Equality

For those of us who are implementation minded, that means that boolean equals(Object other) is a flawed API because there is no one definition of equality that will work in every context. In a previous post on Using Java Collections Effectively by Implementing equals() and hashCode(), I quoted Josh Bloch and said that "The behavior of equals(), hashCode(), and compareTo() must be consistent." Java subtly encourages us to define a single definition of equality and comparison per class.

Java programmers have been "ensuring symmetry by controlling only one side of the equation" for years. There's nothing wrong with defining a *default* context for comparison and equality.  Just don't mistake default equality and compareTo() as the only context for comparing objects.

A small potato and a slice of bread might have an equal number of calories (they are equivalent in one context), but the potato takes much more energy to heat up (they are different in another context). So Ideally, we'd like Heating equality and Caloric equality to be different for small potatoes vs. bread.

A context-relative definition of equality might look like this

boolean equals(Object left, Object right)

Hmm... equality and sorting have a good deal of overlap. The Comparator interface already defines something like equality - it returns 0 when things are sorted the same ("equally"), something else otherwise. Better still, any implementation of Comparator provides a context for that comparison.

interface Sugars {
    int gramsSugar();
    public static final Comparator<Sugars> COMPARATOR = (left, right) ->
            left.gramsSugar() - right.gramsSugar();
}

interface Heating {
    int cookingEnergy();
    public static final Comparator<Heating> COMPARATOR = (left, right) ->
            left.cookingEnergy() - right.cookingEnergy();
}

Could we use this for context-relative equality?

The ubiquitous ArrayList, Set, and Map use equals(), hashCode(), and compareTo() are all fixed in their one-sided ways. But the classes implementing SortedSet and SortedMap behave very differently from the other collections when passed a separate Comparator.

Using Context-Relative Equality (in Java)

When you pass a Comparator to TreeSet or TreeMap, you are using context-relative equality. All comparisons for get(), put(), and contains() are made with the Comparator NOT with the methods that the other standard collections use! With a little care you can use this to your advantage:

static class Food implements Heating, Sugars {
    private final int gramsSugar;
    private final int cookingEnergy;
    private Food(int g, int c) { gramsSugar = g; cookingEnergy = c; }
    @Override public int gramsSugar() { return gramsSugar; }
    @Override public int cookingEnergy() { return cookingEnergy; }
    @Override public String toString() {
        return "Food(" + gramsSugar + "," + cookingEnergy + ")";
    }
}

public static void main(String[] args) {
    List<Food> foods = Arrays.asList(new Food(5,3), new Food(4,4));
    SortedSet<Food> foodsByHeat = new TreeSet<>(Heating.COMPARATOR);
    foodsByHeat.addAll(foods);
    System.out.println("Foods by heat:");
    for (Food f : foodsByHeat) { System.out.println(f); }

    SortedSet<Food> foodsBySugar = new TreeSet<>(Sugars.COMPARATOR);
    foodsBySugar.addAll(foods);
    System.out.println("Foods by sugar:");
    for (Food f : foodsBySugar) { System.out.println(f); }
}

Output

Foods by heat:
Food(5,3)
Food(4,4)
Foods by sugar:
Food(4,4)
Food(5,3)

More

Here is the Source for the above.

Here is my earlier SetInterface test.