Thursday, June 12, 2014

Comparators

This post is part of a series on Comparing Objects.

Some things lend themselves to default ordering, such as sorting users by their alphabetized last names. Yet users are often sorted by other criteria, such as the number of purchases they make per year, the amount their account is in debit, or what time they clicked the "purchase" button during a limited promotion. To represent this in Java, simply define a comparator for each context:
enum UserByPromoClick implements Comparator<User> {
    INSTANCE;
    public int compare(User left, User right) {
        ...
    }
}

enum UserByPurchases implements Comparator<User> {
    ...

enum UserByDebit implements Comparator<User> {
    ...
Implementation note: compare(left, right) and compareTo(other) are both instance methods, which means you need to create an instance of the appropriate Comparator in order to use them.  Most comparators are immutable and unchanging.  This means they should be implemented as Enums with a single immutable instance shared and reused by all clients.

Various sort orders are often combined. Two accounts equally in debit could be sorted to show up next to each other on an expense report. Their names would only be used as the "tie breaker" in this ordering.
public int compare(User left, User right) {
    // Assuming getPromoClickMs returns Integer.MAX_VALUE
    // if the user didn't click on the promo.
    if (left.getPromoClickMs() > right.getPromoClickMs()) { return 1; }
    if (left.getPromoClickMs() < right.getPromoClickMs()) { return -1; }

    // In a tie, sort by name instead.
    return UserByName.compare(left, right);
}
The implementation of Objects.equals(left, right) performs a referential equality check for speed and to ensure that each object is equal to itself.  This has the side effect that if someone passes compare(null, null) it returns true, which makes sense some: undefined is undefined.  But should we ever compare anything to null? For a while I thought, "What harm could come from just sorting nulls last?"

Sortorder

What harm indeed!  Really, there is no meaningful way to compare a value to null because null represents "undefined." Well, isGreaterThan(other) can meaningfully return false if other is null, but the general-purpose compare(left, right) method cannot.  It must return less-than (any negative int), greater-than (any positive int), or equal (0). The only rational thing to do if either parameter to the compare(a, b) method is null is to throw an exception.
if (left == null) {
    throw new IllegalArgumentException("Can't compare nulls");
}
if (left == right) { return 0; } // Also ensures right != null
If your class can be represented as an integer, compare() could be implemented as a simple subtraction.  Just keep in mind that if your subtraction could exceed Integer.MIN_VALUE, it can roll and give you a positive result instead of a negative one. So if that could happen, you must use greater-than/less-than comparasions instead of subtraction.
int ret = left.getPromoClickMs() - right.getPromoClickMs();
if (ret != 0) { return ret; }

// In a tie, sort by name instead.
return UserByName.compare(left, right);
You can have as many repetitions of this kind of thing as you want:
ret = Xxxxx.compare(this, that);
if (ret != 0) { return ret; }

Persistence and Proxy Objects

When implementing a Comparator (as opposed to Comparable), then neither object can be trusted to be initialized and get/set should be used instead of direct access to instance fields on both objects. If you implement your Comparator as a separate class and your fields are private, the compiler will enforce this for you. But for Comparable and for Comparators implemented in the same class file as the object they are comparing, the compiler will NOT warn you, so you must be very careful.

Thursday, February 6, 2014

Degrees of Lazy Evaluation

The topic of lazy evaluation often comes up when thinking about transforming collections of data. Lazy Evaluation is particularly useful for values that are expensive to compute, but it is also useful as a way of chaining functions to apply in a single pass through the underlying data. Laziness can be coded manually, or baked into an API such as a collection or transformation. Implementing some kinds of laziness is more expensive than others. Sometimes laziness may be inappropriate, such as when a database connection is open and waiting to be read from. I couldn't find degrees of laziness with a quick web search, so I'm defining some here.

Levels of Evaluation Laziness

0: Eager
The entire data source is evaluated immediately. Most statements in most languages work this way most of the time, because that's the way the underlying bytecodes and hardware are designed to work. Eager evaluation of a collection can be done concurrently from inside the collection. The reduce function is implemented this way in Clojure.
1: Delayed
The entire source data is evaluated on the first usage of the result. Examples: builder patterns, isTrueForAll(), forEach(), Java 8 collection transformations, and default Hibernate joined table loading. Delayed evaluation of a collection can be done concurrently from inside the collection.
2: Chunked
The source data is evaluated in the minimum size chunks for each call: filter(), contains(). This level is defined by asymptotic minimum and maximum laziness of levels 1 and 3.
3: Incremental
A maximum of one item is evaluated per call: map(). Incremental evaluation cannot be done concurrently from inside the collection because that would require evaluating more than one item at a time.

Level 2 is weird. A filter that matches nothing, or only the last element is effectively level 1. A filter that matches every element is effectively level 3. I call it level 2 in both cases because of its range. I think it's important to acknowledge level 2 as a unique level because it's chunked. A balanced hashtable of a certain size and load could evaluate one bucket at a time for contains() and be very reliably level 2 without ever being level 1 or level 3. Concurrent processing of Level 2 may be possible inside a collection in some circumstances, but its practicality may be limited when the chunk sizes are not known in advance.

For a single element value, Lazy levels 1, 2, and 3 are indistinguishable - it's either Eager or it's Lazy - period. Two elements can be either Lazy 1 or Lazy 3. With 3 or more elements, Lazy 2 becomes a meaningful description of the laziness.

The results of lazy evaluation can be cached, particularly when that evaluation is expensive and the values will be reused. Clojure Sequences are based on a linked-list data structure which lends itself to (Lazy level 3) incremental evaluation. They are also cached/memoized by default, making them very safe, but somewhat resource intensive.

Using Clojure Sequences requires care with some data sources in order to process a finite subset of the data that fits in memory, or to iterate without holding onto the head of the sequence (thus making the head eligible for garbage collection before the end of the sequence is processed)1. Clojure evaluates (some?) sequences in chunks for efficiency, but incremental evaluation can be forced with a little cleverness.1

Implementation Concerns

Lazy level 0-3 can be represented as a lightweight, one-pass disposable view*, or in a heavier memoized persistent data structure like a linked list. Since views and linked lists traverse the underlying data incrementally, neither is suitable for internal concurrency. Either of these can be processed concurrently externally if necessary, but doing so seems like a design failure at some level - why eagerly process something you went to the trouble of making lazy?

What's a View?

The Iterator interface in Java and some other languages is not suitable for using concurrently. It has 2 methods: hasNext() and next() and lets you enumerate an underlying data source once. Some data sources are ordered, and some are not - the iterator respects that.

The problem is that even if you synchronize both of these methods, two threads can both call hasNext() when there is only one element left. At any time, a broken client can call next() without calling hasNext(). In each of these cases, a thread that calls hasNext() and gets "true" can still call next() and get NoSuchElementException. Yuck!

A better interface, which I'm calling a View (possibly because I misunderstood Paul Phillips using that word). It can be lazily evaluated or iterate over a previously realized collection, but it has a single method: next() which can return the next element (whether or not its null) or a sentinal value: USED_UP. This is 100% thread safe. An arbitrary number of threads can safely and quickly call:

T item = next();
while (item != USED_UP) {
    // do something with item
    item = next();
}

Footnotes

1 Many thanks to my friend Jeff Dik for pointing this out and correcting my earlier comments.

Sunday, January 26, 2014

Upgrading From Windows XP Before April 8th 2014

The Threat

Security experts have been predicting that malware creators all over the world are finding exploits [in Windows XP] and holding on to them. They know if they unleash an exploit now, it will be fixed. But if they are patient and wait, and hope Microsoft doesn't find the vulnerability, then they can use it for maximum gain come April 9.
The same holds true for Office 2003. Support for it ends on April 15, one week later.

Source: http://www.networkworld.com/community/blog/why-april-9th-might-be-its-worst-day-2014

Upgrade Options: Windows, Mac, Android, or Linux

What kind of computer or operating system you use is determined by what software you need to run.

Office

The first question you should ask yourself is exactly how compatible you need to be with the latest version of Microsoft Office. This is not a yes-or-no question. We are all somewhere on a sliding scale of MS Office compatibility. Very few people require full compatibility with the most obscure features of Microsoft Office. MS Office isn't even compatible with different versions of itself!

100% compatibility with the very latest version of MS Office requires Windows 7 or 8 which probably won't run on your old XP machine. You can purchase the necessary hardware and software from any reputable computer store except a Mac store.

You can run the latest MS Office on Mac, Android, or Linux, but only by installing a virtual machine, then installing Windows, then installing MS Office. This is a pain in the neck, (both the install and the ongoing maintenance) but it can be done and is an expensive but effective way to meet an occasional need for the latest MS Office. If bleeding edge MS Office is the primary reason for having the computer, it's easier and cheaper to buy a Windows computer and be done with it.

Many home users meet their basic office needs with Google Docs which comes free with your home Gmail account. It is compatible only with very basic MS Office documents - no fancy templates, embedded objects, or macros, but it's also much safer from virus threats as a result. It may not be advanced enough for sharing documents with customers and prospects, but for home use it works great with letters, posters, simple spreadsheets, and for sharing them with friends.

LibreOffice is free, runs on any operating system (comes pre-installed on popular Linux flavors), and is roughly equivalent to being one version behind MS Office which is the best you can do natively on the Mac anyway. It's easy to use, powerful, and has Visio-like tools and PDF conversion built-in. This is what I use almost exclusively, even though I have several versions of MS Office installed in virtual Windows machines. Try it out to see if you can use it instead of paying for Microsoft Office:

  • Install LibreOffice
  • Tools -> Options -> Load/Save -> General
    • Unckeck "Warn when not saving in ODF"
    • Document Type: Text, Always Save As: Word 97...2003 (NOT Template)
    • Document Type: Spreadsheet, Always Save As: Word 97...2003 (NOT Template)
  • Tools -> Options -> Load/Save -> General -> Microsoft Office
    • Check all the boxes.

Other Software

GoToMeeting/GotoWebinar works on Windows, Mac, Android, and now Linux. I haven't tried the Linux version yet, but it only allows you to attend meetings - not share your screen or use a web-cam. Screen sharing on Linux also works well using Skype (or TeamViewer).

Photoshop is Windows and Mac only, but I have found that a combination of Darktable and GIMP meets 100% of my needs (though Photoshop is more convenient and user-friendly if you can afford it).

Linux

Given the above limitations, Linux is a great way to turbocharge an old computer. I switched to Linux about 5 years ago because of the reliability, ease of use, security, and availability of free software. I hope I never have to go back. The only thing I have used Windows for in the past 6 months is GoToMeeting. I've been using Ubuntu Linux 13.10 which is a more Mac-like experience and a little easier to upgrade. Mint Cinnamon Linux is more like Windows 7.

Try either one out by burning a Live CD and booting from it. That will show you if you need to purchase an nVidia graphics card (on a desktop) or a new wireless card (on a laptop) for compatibility reasons, but these can be acquired very cheaply. If you are buying hardware, a Solid State Drive (SSD) can be a miracle for an aging computer. My 7-year-old Ubuntu laptop with an SSD boots in less than 8 seconds and shuts down in less than 3.

I recommend Mint/Cinnamon Linux or Ubuntu over Xubuntu and Lubuntu. The former are just as lightweight but have more usability features than the latter. Otherwise, this article is pretty good and fills in more details