Showing posts with label Java Collections. Show all posts
Showing posts with label Java Collections. Show all posts

Thursday, June 12, 2014

Comparators

This post is part of a series on Comparing Objects.

Some things lend themselves to default ordering, such as sorting users by their alphabetized last names. Yet users are often sorted by other criteria, such as the number of purchases they make per year, the amount their account is in debit, or what time they clicked the "purchase" button during a limited promotion. To represent this in Java, simply define a comparator for each context:
enum UserByPromoClick implements Comparator<User> {
    INSTANCE;
    public int compare(User left, User right) {
        ...
    }
}

enum UserByPurchases implements Comparator<User> {
    ...

enum UserByDebit implements Comparator<User> {
    ...
Implementation note: compare(left, right) and compareTo(other) are both instance methods, which means you need to create an instance of the appropriate Comparator in order to use them.  Most comparators are immutable and unchanging.  This means they should be implemented as Enums with a single immutable instance shared and reused by all clients.

Various sort orders are often combined. Two accounts equally in debit could be sorted to show up next to each other on an expense report. Their names would only be used as the "tie breaker" in this ordering.
public int compare(User left, User right) {
    // Assuming getPromoClickMs returns Integer.MAX_VALUE
    // if the user didn't click on the promo.
    if (left.getPromoClickMs() > right.getPromoClickMs()) { return 1; }
    if (left.getPromoClickMs() < right.getPromoClickMs()) { return -1; }

    // In a tie, sort by name instead.
    return UserByName.compare(left, right);
}
The implementation of Objects.equals(left, right) performs a referential equality check for speed and to ensure that each object is equal to itself.  This has the side effect that if someone passes compare(null, null) it returns true, which makes sense some: undefined is undefined.  But should we ever compare anything to null? For a while I thought, "What harm could come from just sorting nulls last?"

Sortorder

What harm indeed!  Really, there is no meaningful way to compare a value to null because null represents "undefined." Well, isGreaterThan(other) can meaningfully return false if other is null, but the general-purpose compare(left, right) method cannot.  It must return less-than (any negative int), greater-than (any positive int), or equal (0). The only rational thing to do if either parameter to the compare(a, b) method is null is to throw an exception.
if (left == null) {
    throw new IllegalArgumentException("Can't compare nulls");
}
if (left == right) { return 0; } // Also ensures right != null
If your class can be represented as an integer, compare() could be implemented as a simple subtraction.  Just keep in mind that if your subtraction could exceed Integer.MIN_VALUE, it can roll and give you a positive result instead of a negative one. So if that could happen, you must use greater-than/less-than comparasions instead of subtraction.
int ret = left.getPromoClickMs() - right.getPromoClickMs();
if (ret != 0) { return ret; }

// In a tie, sort by name instead.
return UserByName.compare(left, right);
You can have as many repetitions of this kind of thing as you want:
ret = Xxxxx.compare(this, that);
if (ret != 0) { return ret; }

Persistence and Proxy Objects

When implementing a Comparator (as opposed to Comparable), then neither object can be trusted to be initialized and get/set should be used instead of direct access to instance fields on both objects. If you implement your Comparator as a separate class and your fields are private, the compiler will enforce this for you. But for Comparable and for Comparators implemented in the same class file as the object they are comparing, the compiler will NOT warn you, so you must be very careful.

Tuesday, July 9, 2013

Immutable Java: Lists and Other Collections

Update 2017-09-14

While the techniques in this article still work, I prefer to use Paguro. Here's how:

List

public static final List<String> UNSAFE_ZONES =
        vec("Africa/Cairo",
            "Africa/Johannesburg",
            "America/Anchorage");

Set

private static final Set<String> validModes =
        set(MODE_PREADD, MODE_ADD,
            MODE_PREUPDATE, MODE_UPDATE)));

Map

private static final Map<String,String> shortNameHash =
        map(tup(SERVER_NAME_DEMO, "demo"),
            tup(SERVER_NAME_UAT, "uat"),
            tup(SERVER_NAME_INTEGRATION, "integration"),
            tup(SERVER_NAME_DEV, "dev"));

Paguro

The above have many advantages over the original suggestions in this article:

  • Brevity and Clarity - no extra junk, just declare your collection.
  • No static initializer blocks for maps. If you've ever created a dependency loop inside these blocks you'll appreciate this.
  • Immutability - These Paguro collections are just as safe as Collections.unmodifiables. You can't modify them in place, but you can modify them, producing an entirely new collection with items added, removed, or changed. Java's unmodifiable collections require copying the entire collection in order to change it - an O(n) operation. Paguro collections are designed to copy-on-write with maximum sharing between versions of a collection, so that modifications are O(log n) (usually with a high base, often approaching O(1)). Paguro also provides mutable builders to create immutable collections even faster (if you care about that).

Original Post

Lists

What happens to the UNSAFE_ZONES list in the following code when it is passed to doSomething()?

public static final List<String> UNSAFE_ZONES = Arrays.asList(
        "Africa/Cairo",
        "Africa/Johannesburg",
        "America/Anchorage",
        ...);

displayZones(UNSAFE_ZONES);

A method named displayZones(List zones) sounds pretty safe and it probably won't change the contents of the list you pass it. But unless you dig into that method very carefully, you can't know that for sure. Even if displayZones() is very simple and safe today, someone could use it to wrap third-party logic tomorrow, or make it pass your UNSAFE_ZONES to another method that changes it.

Maybe everything works perfectly, but you don't sleep well at night because someone could, at any time, intentionally or accidentally change your list. Maybe it's already happening somewhere, but it's caused by such a rare data condition that you just haven't noticed yet. Maybe you sleep well, but keeping track of every place that everything could possibly be modified in your application is using valuable thought-space, distracting you from more interesting problems.

Time zones don't change very often. This list probably only needs to be updated once or twice a year. If you make releases more often than that, there is no reason that a list like this needs to change *ever* except at compile time, or maybe when it is read from a file on application start-up. Any attempt by your program to change it on the fly is an error. So make it immutable.

Here is an immutable List in Java 5 or later.

Immutable List

public static final List<String> TIME_ZONES =
        Collections.unmodifiableList(Arrays.asList(
                "Africa/Cairo",
                "Africa/Johannesburg",
                "America/Anchorage",
                ...));

People always bring up concurrency with relation to immutable data structures, and immutability is a godsend with regard to concurrency. You can pass immutable objects and collections around freely between any number of threads without any locking, synchronization, defensive copies, or contention which is a big win both in terms of performance and simplicity. But on a day-to-day level, even with a single thread, the huge benefit to this style of coding is that you aren't left wondering if something changed your collection, because that is just not possible.

Imagine now that you are the author of displayZones() and that you want to communicate to people who use your API that displayZones() will never modify the list that it is sent. One way to do this is to write a comment in the JavaDoc like, "I promise never to modify your list inside displayZones()." For this to be effective, people have to 1. read the JavaDoc and 2. believe you. If I wrote that, would you believe me? Heck, I wouldn't believe myself if I wrote it last month!

It would be fantastic if the compiler would throw a nice fat warning if someone ever updated displayZones() to modify the list it was sent? Unfortunately (thank you alexandroid) even iterator has a remove() method and even in Java 8, streams have a toIterator() method, so there is no safe interface to pass to a function.

// Unsafe - iterator has the remove() method
public void displayZones(Iterable zones) { ...

If Iterable provided only read-only methods for traversing it in order, the caller would could pass a modifiable list without worrying about you changing it, because you could't. No defensive copies, no worries. They could pass any Collection, either mutable or immutable and it wouldn't matter. Alas, this it not the case.

There is still one benefit to passing an immutable collection. If you ever change displayZones to take a mutable List and the caller is already passing a mutable List, they won't get a compiler warning. Won't they be surprised when their list is modified! Immutability here future-proofs the caller's code. If the caller is passing a mutable list though, maybe it means they don't care if it is modified? In any case, unnecessary use of mutable data is a dangerous trap.

Other Collections

List is far from the only type of collection. Let's take a moment to look at what it's like to create an immutable Set or Map in Java (I think it works in Java 5 and 6 if you just fill in the empty <>s). If anyone knows a briefer/better way to do this, please leave a comment:

Immutable Set

private static final Set<String> validModes = Collections.unmodifiableSet(
        new HashSet<>(Arrays.asList(MODE_PREADD, MODE_ADD,
                                    MODE_PREUPDATE, MODE_UPDATE)));

EnumSet is preferred for enums because it executes faster and has a briefer syntax. EnumSets are ordered according to their "natural ordering" which is the order in which the enum constants are declared.

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.of(Mode.PREADD, Mode.ADD, Mode.PREUPDATE, Mode.UPDATE));

If you are using all values of an enum in order:

private static final Set<Mode> validModes = Collections.unmodifiableSet(
        EnumSet.allOf(Mode.class));

Immutable Map

private static final Map<String,String> shortNameHash;
static {
    Map<String,String> m = new HashMap<>();
    m.put(SERVER_NAME_DEMO, "demo");
    m.put(SERVER_NAME_UAT, "uat");
    m.put(SERVER_NAME_INTEGRATION, "integration");
    m.put(SERVER_NAME_DEV, "dev");
    shortNameHash = Collections.unmodifiableMap(m);
}

It is critical in the Map example that the temporary Map m be scoped inside a dedicated block so that it passes out of scope and can never be accessed by anything after the immutable version of it has been created. Map m remains mutable forever, and a lexical closure (block) is the simplest way to keep any user accessible code from maintaining a direct reference to it.

Type Casting

As I've said before, it can be a pain to cast a collection in Java, especially compared to Scala. Unlike (invariant) generic collections, you can painlessly cast an array to its super-type. No need to suppress any unchecked or rawtypes warnings because this is just how arrays "work" (they are covariant). Fortunately, we happen to start with an array in most of the above examples. The example below shows an enum that implements an interface and provides an immutable list of its members (with the type of the interface):

public enum TimeFrame implements DropDownItemInterface {
    ...
    public static final List<DropDownItemInterface> ddiVals =
            Collections.unmodifiableList(Arrays.asList(
                (DropDownItemInterface[]) values()));

If you try casting the resulting list instead, you will see why I bothered to point this out (and why so many people prefer dynamic languages).

Effectiveness

Up to this point, Java is a little wordy, but effective. Where it really breaks down is that List is the only collection which extends an immutable interface. If you had a getShortName(Map m) method, there is no effective way to tell the caller that this method cannot modify the map you pass it. Google Guava falls short here too because its ImmutableMap data structure inherits from Map instead of the other way around. This needs to be fixed at the Java API level, or else, people need to start importing from some new collections API instead of java.util.

Scala and Clojure both make all their collections immutable by default. The mutable version of each type of collection is a sub-class of the immutable one. In either language, you could say getShortName(ImmutableMap m) (or similar) and have the benefits I outlined above. Java could do this too, and I feel very strongly that they should.

The reason why Scala and Clojure collections can be immutable by default is that they are implemented to allow very lightweight copies to be made very quickly. The immutable collections in these languages still have add() and put() methods on them (or equivalent). They just return an entirely new collection which includes the modification of the old one. In a hash-table based collection, only the hash bucket which is changed even needs to be copied over to the new collection. The other buckets can be shared because the collection is immutable!

Java could allow the same kind of fast, shallow copies of collections, but it has the ball and chain of 15-year-old add() and put() methods that return a boolean value instead of the underlying collection. Because this pollutes the namespace for those methods, new method names (like append() and prepend()) would have to be made for all modification operations so that they could return an immutable modified copy of the original collection. The old methods could be deprecated over time. This could lead to some short-term confusion, with people wondering why their immutable collection wasn't changed by an append() call, but I personally believe that it would be worth it in the long run.

Sometimes you need mutable data structures and by all means, use them. But when you don't, prefer immutability and you'll sleep better, think clearer, and write more robust code.

Monday, July 1, 2013

Using Scala Collections (or Guava) from Java

I've been finding more and more reasons to use Scala instead of Java, but migrating a large existing project (e.g. at work) is hard and takes time. So I've been looking into ways to use more and more Scala-type-thinking in Java. Specifically preferring immutability for simplicity sake (it's not just for multithreading any more).

Here's a quick test-run of creating an immutable map the way I would in Scala, but doing it in Java. I had to use 6 initial elements to make this fair because the Guava Map has a 5-element constructor (Scala has a 4-element constructor). It amuses me that the plain Java solution, the one this is most deeply rooted in imperative programming, is the only one to explicitly rely on a lexical closure to ensure immutability of the underlying elements in the map.

// Scala
val OP_MAP = Map(("+", "Plus"),
                 ("-", "Minus"),
                 ("*", "Times"),
                 ("/", "Divided By"),
                 ("%", "Modulo"),
                 ("^", "To the power of"))

// Google Guava
private static final ImmutableMap<String,String> OP_MAP =
        new ImmutableMap.Builder<String,String>()
                .put("+", "Plus"),
                .put("-", "Minus"),
                .put("*", "Times"),
                .put("/", "Divided By"),
                .put("%", "Modulo"),
                .put("^", "To the power of")
                .build();

// Java
private static final Map<String,String> OP_MAP;
static {
    Map<String,String> m = new HashMap<>();
    m.put("+", "Plus");
    m.put("-", "Minus");
    m.put("*", "Times");
    m.put("/", "Divided By");
    m.put("%", "Modulo");
    m.put("^", "To the power of");
    OP_MAP = Collections.unmodifiableMap(m);
}

// Scala collections from Java (based on decompiling the above Scala example)
@SuppressWarnings({"unchecked", "rawtypes"})
private static final WrappedArray<Tuple2<String,String>> wa =
        Predef.wrapRefArray(
                (Tuple2<String,String>[]) new Tuple2[] {
                        new Tuple2<>("+", "Plus"),
                        new Tuple2<>("-", "Minus"),
                        new Tuple2<>("*", "Times"),
                        new Tuple2<>("/", "Divided By"),
                        new Tuple2<>("%", "Modulo"),
                        new Tuple2<>("^", "To the power of") });

@SuppressWarnings("unchecked")
private static final scala.collection.immutable.Map<String,String> OP_MAP =
        (scala.collection.immutable.Map<String,String>)
                scala.Predef$.MODULE$.Map().apply(wa);

Conclusions

Wow. Using Scala from Java can get ugly! Of course, if I was willing to use a mutable Map, the Scala conversion would be simple and painless. But immutability was my goal.

Guava's Builder makes it slightly more elegant than the plain Java solution. But I often found it useful to have a collection that is the same as another with just one additional element. Unfortunately, there is no put() method on Map that returns a new immutable Map. No union(), intersect(), or difference() set operations. There is no forEach(), flatmap(), foldL(), foldR(), map() or other iterator.

While Guava is the winner at solving this particular problem, it's only a baby-step forward from the straight Java solution. I see that a lot of thought was put into the copyOf() method of Guava collections, but I'd much prefer internal iterators etc. that work together as an immutable API as described above. Let me know what I'm missing in the comments.

Sadly, Java 8 seems to only reinforce the assumption of mutability in their new interfaces. All the new iterators modify the underlying collection instead of returning a shallowly modified copy the way the Scala collections do. I'll probably be sticking with the straight-forward Java solution until I make the move to Scala. I might be tempted to wrap some Scala collections in Java-friendly classes that do all of the above that I mentioned, but of course, that's work...

Thanks to the folks on Stack Overflow for their invaluable help decompiling the Scala code!

Tuesday, April 23, 2013

Casting a Collection: Java vs. Scala

I'm always stubbing my toes against this in Java - I have a collection of one type that I want to cast to a collection of a super-type (or interface) instead so that I can add more varied things to it.

Java 7:

public class CollectionCastTest {
    class B implements Runnable {
        @Override
        public void run() { ; }
    }

    public void main(String... args) {

        // First we'll try with a list that looks like it's allowed
        // to contain anything between B and Object.
        List<B> bList = new ArrayList<>();
        bList.add(new B());

        // Java error: incompatible types
        // List<Runnable> rList1 = bList;

        // Java error: unchecked cast
        // List<Runnable> rList2 = (List<Runnable>) bList;

        // Works, but has a time cost
        List<Runnable> rList3 = new ArrayList<>();
        rList3.addAll(bList);

        // Just works!
        List<? extends Runnable> rList4 = bList;

        // Interestingly, this doesn't work.
        // Java error: addAll in List cannot be applied to...
        // List<? extends Runnable> rList5 = new ArrayList<>();
        // rList5.addAll(bList);

        // Primitive arrays can cast most things, but using them takes time.
        List<Runnable> rList6 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));
        List<? extends Runnable> rList7 = Arrays.asList(
                bList.toArray(new Runnable[bList.size()]));


    }
}

Scala

Scalas collections are covariant, meaning that you can substitute a sub-type and get away with it. It's certainly powerful.

object CollectionCastTest {
  trait A {
    def hello:String
  }

  class B extends A {
    override def hello = "Hello World!"
  }

  def main(args: Array[String]) {
    val bList = List[B](new B())

    // Works
    val aList1:List[A] = bList

    // Works
    val aList2:List[A] = bList match {
      case l:List[A] => l
      case _ => throw new ClassCastException
    }
    
    val aList3:List[A] = List[A]() ++ bList
  }
}

But does Scala know that these lists are the same?

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = false

Whoops - no. This is because it is using referential equality for B. Why? Because I didn't define an equals() method on A or B so it uses the default equals(). Let's use the same B() in both lists and see what happens:

scala> val b = new B()
b: B = B@4ffe5c31

scala> val bList = List(b)
bList: List[B] = List(B@4ffe5c31)

scala> val aList:List[A] = List[A](b)
aList: List[A] = List(B@4ffe5c31)

scala> aList == bList
res0: Boolean = true

Using different objects for B() are possible too, if we make Scala use an equality test based on the fields of the object instead of it's address. We could define equals() methods that inspect the object's fields, but this is the default implementation of equals() for case classes, so we can make this simple change:

-   class B extends A {
+   case class B extends A {

then

scala> val bList = List(new B())
bList: List[B] = List(B@555301c4)

scala> val aList:List[A] = List(new B())
aList: List[A] = List(B@23014460)

scala> aList == bList
res6: Boolean = true

Wednesday, August 31, 2011

Object Mutability

This post is part of a series on Comparing Objects.

I've read a lot lately about making objects immutable whenever possible. "Programming in Scala" lists Immutable Object Tradeoffs as follows:

Advantages of Immutable Objects

  1. Often easier to reason about because they do not have complex state.
  2. Can be passed freely (without making defensive copies) to things that might try to modify them
  3. Impossible for two threads accessing the same immutable object to corrupt it.
  4. They make safe Hashtable keys (if you put a mutable object in a Hashtable, then change it in a way that changes its hashcode, the Hashtable will no longer be able to use it as a key because it will look for that object in the wrong bucket and not find it).

Disadvantages

  1. Sometimes require a large object graph to be copied (to create a new, modified version of the object). This can cause performance and garbage collection bottlenecks.

For most purposes, an object representing a month can be made immutable - February 2003 will never become anything other than what it is. But a User record is not immutable. People get married or change their name for other reasons. Phone numbers, addresses, hair color, height, weight, and virtually every other aspect of a person can change. Yet the person is still the same person. This is what surrogate keys model in a database - that everything about a record can change, yet it can still be meaningfully the same record.

In order to use an object in a hash-backed Collection (in Java), its hashcode must NOT change. The simplest way to accomplish this is to make the hashcode of a mutable persistent object its surrogate key and to use that key as a primary comparison in the equals method as well (see my older post on Implementing equals(), hashcode(), and compareTo()).

To make an immutable object, you sometimes need a mutable constructor object, like StringBuilder and String. StringBuilder allows you to change your object as many times as you want, then get an immutable version by calling toString(). This is clean and safe, but has some small costs in time and memory (transforming the immutable StringBuilder into a new immutable String object, then throwing away the StringBuilder). An alternative that I have not seen much is to create an immutable interface, extend a mutable interface from it, and then extend your object from that.

Here's an example based on java.util.List. Pretend each interface or class is in its own file:

// All the immutable-friendly methods from java.util.List.
// Interfaces like these could easily be retrofitted into
// the existing Java collections framework
public interface ImmutableList {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean containsAll(Collection<?> c);
    boolean equals(Object o);
    int hashCode();
    E get(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
}

// This interface adds the mutators
public interface java.util.List extends ImmutableList {
    boolean add(E e);
    boolean remove(Object o);
    boolean addAll(Collection<? extends E> c);
    boolean addAll(int index, Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    boolean retainAll(Collection<?> c);
    void clear();
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
}

public class java.util.ArrayList implements List {
    // just as it is now...
}

public class MyClass {
    someMethod(ImmutableList<String> ils) {
        // can't change the list
    }

    public static void main(String[] args) {
        List<String> myStrings = new ArrayList<String>();
        myStrings.add("hello");
        myStrings.add("world");
        someMethod(myStrings);
        // Totally safe:
        System.out.println(myStrings.get(1));
    }
}

This doesn't solve the problem of passing a list to existing untrusted code that might try to change it. It also doesn't prevent the calling code from modifying myStrings from a separate thread while someMethod() is working on it. But it does provide a way (going forward) for a method like someMethod() to declare that it cannot modify the list. The programmer of someMethod() cannot compile her code if she tries to modify the list (well, short of using reflection).

Guaranteed immutability can be critical in writing concurrent code and for keys in hashtables. Not all objects can be made immutable, but many of those objects have immutable surrogate keys that, if used properly, work around the pitfalls of mutability. Limiting mutability and avoiding common mutable object pitfalls can lead to fewer bugs, easier readability, and improved maintainability.

Wednesday, December 29, 2010

Using Java Collections Effectively by Implementing equals() and hashCode()

IMPORTANT: The techniques in this post, while interesting, are outdated and sub-optimal. In short, follow standard equals() and hashCode() practice, but TEST your classes using something like TestUtils. I find a bug almost every time I use that.

This post is the first in a series on Comparing Objects.

These three methods must be implemented correctly in order for the Java collections to work properly.  Even though popular IDEs automatically generate stubs of some of these methods, you should still understand how they work, particularly how the three methods work together because I don't see many IDE's writing meaningful compareTo() methods yet. For much of what follows, I am endebted to Joshua Bloch and his book, Effective Java. Buy it, read it, live it.
  1. The behavior of equals(), hashCode(), and compareTo() must be consistent.
  2. You must base these methods on fields whose values do not change while they are in a collection.
If you store an object in a collection (e.g. as a key in a HashMap) and it's hashcode changes, you generally won't be able to retrieve it from that hashtable anymore! Thanks to "Programming in Scala" Section 30.2 p. 657. See also my later post on Object Mutability. You can use collections effectively with mutable objects so long as those objects use surrogate keys. In these examples I store my surrogate key in a private long id field with public getId() and setId() methods as many popular frameworks expect.

hashCode()

hashCode() is meant to provide a very cheap "can-equal" test.  It allows the put() and contains() methods on hashtables to run blazingly fast.  In small hashtables, the low bits from hashCode() determine which hash bucket an object belongs in.  In larger hashtables, all the bits are used.  The (presumably more expensive) equals() test is then applied against all the other objects already in that bucket.  If you had all your objects return some number, say, 31 for their hashCode(), this would completely destroy the performance of any hashtable based collections, since all objects would go in the same hash bucket and each object would have to be compared to all other objects using equals().

Bloch's Item 9 states, "Always override hashCode() when you override equals()". The following are specifically required (see: Object.hashCode()):
  1. x.hashCode() must always equal y.hashCode() when x.equals(y).
  2. It's OK for x.hashCode() to equal y.hashCode() when x.equals(y) is false, but it's good to minimize this.
Truncating the database row number from a long to an int is an ideal way to ensure efficient, equal distribution of values. If you don't use surrogate keys, you need to construct an int from the "significant" fields (the ones that uniquely identify this object):

@Override
public int hashCode() {
    if (id == 0) {
        return intField1 + intField2 + objField3.hashCode();
    }
    // return (possibly truncated) surrogate key
    return (int) id;
}

If your object does not have a surrogate key, then the field-by-field comparison in this solution is correct, though not quite as fast. If you like playing with bits, you can sometimes or and shift various fields into your hashcode in a way that is very efficient and not too hard to read.

equals()

a.equals(b) should return true only when a and b represent the same object. Bloch (Item 8) says that the equals() method must be reflexive, symmetric, transitive and a few other things as well which I won't cover here. For any non-null value:
  • x.equals(x) must be true.
  • If x.equals(y) then y.equals(x) must be true.
  • If x.equals(y) and y.equals(z) then x.equals(z) must also be true.
The following should get you off to a good start in writing an equals method that is all of the above.  Checking the hashCode() should be cheap and guarantees that two objects can't equal each other if their hashCodes are different.

@Override
public boolean equals(Object other) {
    // Cheapest operation first...
    if (this == other) { return true; }

    if ( (other == null) ||
         !(other instanceof MyClass) ||
         (this.hashCode() != other.hashCode()) ) {
        return false;
    }
    // Details...
    final MyClass that = (MyClass) other;

    // If this is a database object and both have the same surrogate key (id),
    // they are the same.
    if ( (id != 0) && (that.getId() != 0) ) {
        return (id == that.getId());
    }

    // If this is not a database object, compare significant fields here.
    // Return true only if they are all sufficiently the same.
    if (!this.getParent().equals(that.getParent())) {
        return false;
    }

    if (description == null) {
        if (that.getDescription() != null) {
            return false;
        }
    } else if (that.getDescription() == null) {
        return false;
    } else {
        // For each test, check and only return a non-zero result
        int ret = description.compareTo(that.getDescription());
        if (ret != 0) { return false; }
    }

    // Compare other fields
    // If all the same, return true
    return true;
}

Both objects must be valid before you compare them.  Your equals() method should either compare significant fields OR surrogate keys - not both! The danger of providing a field-by-field equals comparison for a database object is that it will work in some cases with invalid objects, but it not always.  This is a case where it's much better to fail fast, than to be scratching your head when an intermittent bug crops up in production. For database objects, using surrogate keys acknowledges that everything about an object can change over time, yet it is still essentially the same object (The Artist Formerly Known as Prince). For non-database objects, (including those that just haven't been given a surrogate key yet) you must compare individual fields.

With care, you can ensure consistency of equals() and compareTo() by defining one in terms of the other, but be careful not to create an infinite loop by defining them both in terms of each other!

Persistence/Hibernate

Persistence or communication frameworks create temporary surrogate objects in order to avoid fetching any extra objects from the database before they are needed.  Hibernate replaces a surrogate object with the actual object the first time a field other than id is accessed, or any methods other than persistent field accessors are accessed.  All of the above examples are designed to work with a persistence framework like Hibernate.

So your object can trust itself to be initialized inside equals(), hashcode(), and compareTo(). It should NOT trust that the other object being compared to is initialized! You can access the this.whatever fields directly, but always use that.getWhatever().

Scala's Case Classes

Declaring your class as a "case" class in Scala takes care of all the above items for you. It prevents inheritance, but for simple classes, it saves a ton of thought and typing! For non-case classes, you must do more work in Scala than in Java in order to support meaningful equals comparisons with inheritance. You have to implement a canEqual() method as well to support the idea that a parent class might think it was "close enough" to a child class, but the child might think they were different (because it defines extra fields relevant to the equals() method), so the child must implement canEqual() and the parent must check it in order to block the parent from thinking they are equal. I've never been bitten by this in Java, but I don't immediately see what prevents it.

Clojure

All Clojure's common built-in datatypes are immutable and implement the above methods for you, making them extremely easy to work with.

SerialVersionUID

I have not verified this, but it stands to reason that if you change hashCode() you probably need to update the SerialVersionUID just as you would if you changed any persistent field. Otherwise, you may end up with two of the same objects in a set (one with the old hashCode and one with the new). I'm not sure if this can happen in practice or not. Maybe someone will post test code in the comments that proves it one way or the other?