Showing posts with label hashing. Show all posts
Showing posts with label hashing. Show all posts

Wednesday, September 18, 2013

Comparing Objects is Relative

This post is part of a series on Comparing Objects.

Equality

For those of us who are implementation minded, that means that boolean equals(Object other) is a flawed API because there is no one definition of equality that will work in every context. In a previous post on Using Java Collections Effectively by Implementing equals() and hashCode(), I quoted Josh Bloch and said that "The behavior of equals(), hashCode(), and compareTo() must be consistent." Java subtly encourages us to define a single definition of equality and comparison per class.

Java programmers have been "ensuring symmetry by controlling only one side of the equation" for years. There's nothing wrong with defining a *default* context for comparison and equality.  Just don't mistake default equality and compareTo() as the only context for comparing objects.

A small potato and a slice of bread might have an equal number of calories (they are equivalent in one context), but the potato takes much more energy to heat up (they are different in another context). So Ideally, we'd like Heating equality and Caloric equality to be different for small potatoes vs. bread.

A context-relative definition of equality might look like this

boolean equals(Object left, Object right)

Hmm... equality and sorting have a good deal of overlap. The Comparator interface already defines something like equality - it returns 0 when things are sorted the same ("equally"), something else otherwise. Better still, any implementation of Comparator provides a context for that comparison.

interface Sugars {
    int gramsSugar();
    public static final Comparator<Sugars> COMPARATOR = (left, right) ->
            left.gramsSugar() - right.gramsSugar();
}

interface Heating {
    int cookingEnergy();
    public static final Comparator<Heating> COMPARATOR = (left, right) ->
            left.cookingEnergy() - right.cookingEnergy();
}

Could we use this for context-relative equality?

The ubiquitous ArrayList, Set, and Map use equals(), hashCode(), and compareTo() are all fixed in their one-sided ways. But the classes implementing SortedSet and SortedMap behave very differently from the other collections when passed a separate Comparator.

Using Context-Relative Equality (in Java)

When you pass a Comparator to TreeSet or TreeMap, you are using context-relative equality. All comparisons for get(), put(), and contains() are made with the Comparator NOT with the methods that the other standard collections use! With a little care you can use this to your advantage:

static class Food implements Heating, Sugars {
    private final int gramsSugar;
    private final int cookingEnergy;
    private Food(int g, int c) { gramsSugar = g; cookingEnergy = c; }
    @Override public int gramsSugar() { return gramsSugar; }
    @Override public int cookingEnergy() { return cookingEnergy; }
    @Override public String toString() {
        return "Food(" + gramsSugar + "," + cookingEnergy + ")";
    }
}

public static void main(String[] args) {
    List<Food> foods = Arrays.asList(new Food(5,3), new Food(4,4));
    SortedSet<Food> foodsByHeat = new TreeSet<>(Heating.COMPARATOR);
    foodsByHeat.addAll(foods);
    System.out.println("Foods by heat:");
    for (Food f : foodsByHeat) { System.out.println(f); }

    SortedSet<Food> foodsBySugar = new TreeSet<>(Sugars.COMPARATOR);
    foodsBySugar.addAll(foods);
    System.out.println("Foods by sugar:");
    for (Food f : foodsBySugar) { System.out.println(f); }
}

Output

Foods by heat:
Food(5,3)
Food(4,4)
Foods by sugar:
Food(4,4)
Food(5,3)

More

Here is the Source for the above.

Here is my earlier SetInterface test.

Wednesday, August 31, 2011

Object Mutability

This post is part of a series on Comparing Objects.

I've read a lot lately about making objects immutable whenever possible. "Programming in Scala" lists Immutable Object Tradeoffs as follows:

Advantages of Immutable Objects

  1. Often easier to reason about because they do not have complex state.
  2. Can be passed freely (without making defensive copies) to things that might try to modify them
  3. Impossible for two threads accessing the same immutable object to corrupt it.
  4. They make safe Hashtable keys (if you put a mutable object in a Hashtable, then change it in a way that changes its hashcode, the Hashtable will no longer be able to use it as a key because it will look for that object in the wrong bucket and not find it).

Disadvantages

  1. Sometimes require a large object graph to be copied (to create a new, modified version of the object). This can cause performance and garbage collection bottlenecks.

For most purposes, an object representing a month can be made immutable - February 2003 will never become anything other than what it is. But a User record is not immutable. People get married or change their name for other reasons. Phone numbers, addresses, hair color, height, weight, and virtually every other aspect of a person can change. Yet the person is still the same person. This is what surrogate keys model in a database - that everything about a record can change, yet it can still be meaningfully the same record.

In order to use an object in a hash-backed Collection (in Java), its hashcode must NOT change. The simplest way to accomplish this is to make the hashcode of a mutable persistent object its surrogate key and to use that key as a primary comparison in the equals method as well (see my older post on Implementing equals(), hashcode(), and compareTo()).

To make an immutable object, you sometimes need a mutable constructor object, like StringBuilder and String. StringBuilder allows you to change your object as many times as you want, then get an immutable version by calling toString(). This is clean and safe, but has some small costs in time and memory (transforming the immutable StringBuilder into a new immutable String object, then throwing away the StringBuilder). An alternative that I have not seen much is to create an immutable interface, extend a mutable interface from it, and then extend your object from that.

Here's an example based on java.util.List. Pretend each interface or class is in its own file:

// All the immutable-friendly methods from java.util.List.
// Interfaces like these could easily be retrofitted into
// the existing Java collections framework
public interface ImmutableList {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean containsAll(Collection<?> c);
    boolean equals(Object o);
    int hashCode();
    E get(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
}

// This interface adds the mutators
public interface java.util.List extends ImmutableList {
    boolean add(E e);
    boolean remove(Object o);
    boolean addAll(Collection<? extends E> c);
    boolean addAll(int index, Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    boolean retainAll(Collection<?> c);
    void clear();
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
}

public class java.util.ArrayList implements List {
    // just as it is now...
}

public class MyClass {
    someMethod(ImmutableList<String> ils) {
        // can't change the list
    }

    public static void main(String[] args) {
        List<String> myStrings = new ArrayList<String>();
        myStrings.add("hello");
        myStrings.add("world");
        someMethod(myStrings);
        // Totally safe:
        System.out.println(myStrings.get(1));
    }
}

This doesn't solve the problem of passing a list to existing untrusted code that might try to change it. It also doesn't prevent the calling code from modifying myStrings from a separate thread while someMethod() is working on it. But it does provide a way (going forward) for a method like someMethod() to declare that it cannot modify the list. The programmer of someMethod() cannot compile her code if she tries to modify the list (well, short of using reflection).

Guaranteed immutability can be critical in writing concurrent code and for keys in hashtables. Not all objects can be made immutable, but many of those objects have immutable surrogate keys that, if used properly, work around the pitfalls of mutability. Limiting mutability and avoiding common mutable object pitfalls can lead to fewer bugs, easier readability, and improved maintainability.