Glen on Programming: 2011

Wednesday, September 14, 2011

Automatic POJO Generation From a Database

Like JPA (or presumably Spring), Hibernate "reverse engineering" tools can generate POJOs (Plain Old Java Objects) from database tables and vice-versa. Generating database tables from Java code is probably best used as a one-time short-cut, suitable for rapid prototyping. Because everything in an application is dependent on the database (and not vice-versa), future changes must be made in the database first (and any existing data migrated there first as well), then propagated to all affected parts of the application.

I have found that anything which the "database reverse engineering" process does not generate for me breaks, usually sooner rather than later. Also that the hardest part of managing maintenance of a large system is the constant refactoring. To that end, I have developed 2 goals for the database reverse-engineering process:

It should keep my application in synch with any database changes, automatically updating as much of my application as is practical.
Where #1 is not possible, it should cause any affected areas of my application to generate a compile-time error.

Imagine a database table:

CREATE TABLE `user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `last_modifier_id` bigint(20) unsigned DEFAULT NULL,
  `company_id` bigint(20) unsigned NOT NULL,
  `identifier` varchar(64) NOT NULL COMMENT 'PIN.  Unique within a company',
  `first_name_c` varchar(40) DEFAULT NULL,
etc.

Database generation tools normally generate the following sort of fields:

public class User implements java.io.Serializable {

private long id;
private User lastModifier;
private Company company;
private String identifier;
private String firstNameC;
etc.

public long getId() { return id; }
public void setId(long x) { id = x; }

public User getLastModifier() { return lastModifier; }
public void setLastModifier(User x) { lastModifier = x; }

public Company getCompany() { return company; }
public void setCompany(Company x) { company = x; }

public String getIdentifier() { return identifier; }
public void setIdentifier(String x) { identifier = x; }

public String getFirstNameC() { return firstNameC; }
public void setFirstNameC(String x) { firstNameC = x; }
etc.

I have modified the generator to also generate the following fields for each column in the table (prefixed with SQL_) and for each field in the Java object (prefixed with HQL_):

public static final String SQL_user = "user"; // table name
public static final String SQL_id = "id";
public static final String SQL_last_modifier_id = "last_modifier_id";
public static final String SQL_company_id = "company_id";
public static final String SQL_identifier = "identifier";
public static final String SQL_first_name_c = "first_name_c";
etc.

public static final String HQL_User = "User"; // class name
public static final String HQL_id = "id";
public static final String HQL_lastModifier = "lastModifier";
public static final String HQL_company = "company";
public static final String HQL_identifier = "identifier";
public static final String HQL_firstNameC = "firstNameC";
etc.

This allows me to write code like the following:

Crit<User> dupIdentCrit = Crit.create(User.class);
dupIdentCrit.add(Restrictions.eq(User.HQL_company, this));
dupIdentCrit.add(Restrictions.eq(User.HQL_identifier, identifier));
List<User> dupIdentifierUsers = dupIdentCrit.list();

Here's an example of building a SQL statement for use with JDBC:

StringBuilder sqlB = new StringBuilder("select * from " + User.SQL_user +
                                       " where " + User.SQL_company_id +
                                       " = ?");
if (!includeDeleted) {
    sqlB.append(" and " + User.SQL_is_deleted + " is false");
}
sqlB.append(" order by " + User.SQL_last_name_c + ", " +
            User.SQL_first_name_c + ", " +
            User.SQL_middle_name_c);

When the database changes and I rebuild my objects, I get compile-time errors, showing me every piece of code I need to fix.

I am working on modifying the generator to also generate the following for any varchar or char fields because their maximum length is defined in the database columns:

public static final int MAX_identifier = 64;
public static final int MAX_firstNameC = 40;
etc.

I am considering parsing comments on each column to look for something like: 'min: 6' and generate the following from it:

public static final int MIN_identifier = 6;

That same technique could be used with int columns to define minimum and maximum values which could be used in validation, in the GUI, and in the help of an appliction. I'm also experimenting with using the HQL_ tokens as the names of the input fields on the GUI screens.

I'm not sure how or even if this would work with stored procedures.

Special thanks to J. Michael Greata, whose interest and suggestions over the past several years have encouraged and shaped the development of this project. Also to Arash Tavakoli whose post in the LinkedIn Java Architects group inspired me to organize my thoughts into this posting.

Thursday, September 8, 2011

Checked vs. Unchecked Exceptions

Two-Sentence Review: Checked exceptions have to be declared in the method signature and dealt with by the calling code; Unchecked, RuntimeExceptions don't. The Java compiler enforces this rule with a compile-time error.

I've been enjoying, "Java: The Good Parts" by Jim Waldo and just finished chapter 3: "Exceptions." At the end of the chapter, Mr. Waldo takes a humorously firm stance against RuntimeExceptions. Indeed, RuntimeExceptions can be evilly misused. Yet I believe there is a time and a place for these exceptions. Knowing when to use them requires an understanding of who is at fault for a particular problem, and whether the problem is recoverable or not.

The following code sample bit me the other day:

// EVIL - NEVER DO THIS!
public static void write(String s) {
    try {
        out.write(s);
    } catch (IOException ioe) {
        throw new IllegalStateException();
    }
}

It is evil (as Mr. Waldo says) for several reasons:

This method is not being responsible for handling its own problems. IOExceptions happen for good reasons other than coding errors - users can delete files, network connections can close, etc. Better than hiding this exception this method might retry the write() or recover some other way. If recovery were not practical, this method would do better to let the original exception bubble up to the caller. Another improvement might be to have this method report the error some other suitable way (e.g. to the user).
Wrapping a checked exception with a RuntimeException means that this method signature is missing vital information that the caller really needs to know about. It hides critical details (any problem with the write() now causes an unexpected exception).
Absentmindedly wrapping an exception with another exception serves only to complicate the stack trace - it further hides the cause of the problem. It's good to wrap an exception if you have critical information to add (e.g. the method handles two streams and you wrap an exception with the message of which stream it applies to). In the example above, nothing is added. In fact, this code doesn't wrap an exception, it throws it away and substitutes another which is even worse.
The IllegalStateException is blank, giving the unfortunate caller no idea what went wrong.

The following shows proper usage of RuntimeExceptions:

/** Creates or returns an existing immutable YearMonth object.
@param year a valid year
@param month a one-based month between 1-12
@return the relevant YearMonth object
*/
public static YearMonth valueOf(int year, int month) {
    if ( (month < 1) || (month > 12) ) {
        throw new IllegalArgumentException("Month must be a positive" +
                                           " integer such that 0 < n < 13");
    }
...

This is good because:

Valid input values are documented clearly for the caller in the JavaDoc.
The inputs are checked at the beginning of the method, before any processing is done, to fail-fast and fail-loudly if passed garbage.
It throws a RuntimeException to inform the caller that they have made a coding error.
The exception provides information about what the caller did wrong.

You wouldn't want to use a checked exception because it would force the responsible caller's code to check their input values twice:

YearMonth ym;
// Responsibly check inputs before calling valueOf()
if (m > 12) {
    out.write("Month too big");
} else if (m < 1) {
    out.write("Month too small");
} else {
    // I already checked my inputs.  Why do I have to check for an exception
    // too?  Doing so doesn't benefit me in any way. It just makes the
    // interface unnecessarily difficult to use!
    try {
        ym = YearMonth.valueOf(y, m);
    } catch (Exception e) {
        out.println("month still invalid: " + e.getMessage());
    }
}

The critical nuance here is that RuntimeExceptions should be used to indicate a programming error on the part of the caller of the function that throws it. They are generally used in the first few lines of the function as a defensive check for invalid input values. Each RuntimeException should include a description of the problem (not be blank).

RuntimeExceptions can also be used to catch invalid state as follows:

enum Numb { ONE, TWO; }
private Numb num = null;

public void init(Numb n) {
    if (n == null) {
        throw new IllegalArgumentException("init cannot take a null Numb");
    }
    num = n;
}

public void showNum() {
    if (ONE == num) {
        out.println("1");
    } else if (TWO == num) {
        out.println("2");
    } else {
         throw new IllegalStateException("Unhandled value of Numb or called" +
                                         " showNum without initializing the" +
                                         " num");
    }
}

When some programmer adds THREE to Numb and doesn't account for the new possible value, it fails hard and fast, making the problem easy to find and fix.

RuntimeExceptions are a good way to make code fail hard and fast without forcing the caller to check their input values twice. Used properly they can make sneaky coding errors obvious. Used improperly, they can make and obvious errors sneaky.

Wednesday, August 31, 2011

Object Mutability

This post is part of a series on Comparing Objects.

I've read a lot lately about making objects immutable whenever possible. "Programming in Scala" lists Immutable Object Tradeoffs as follows:

Advantages of Immutable Objects

Often easier to reason about because they do not have complex state.
Can be passed freely (without making defensive copies) to things that might try to modify them
Impossible for two threads accessing the same immutable object to corrupt it.
They make safe Hashtable keys (if you put a mutable object in a Hashtable, then change it in a way that changes its hashcode, the Hashtable will no longer be able to use it as a key because it will look for that object in the wrong bucket and not find it).

Disadvantages

Sometimes require a large object graph to be copied (to create a new, modified version of the object). This can cause performance and garbage collection bottlenecks.

For most purposes, an object representing a month can be made immutable - February 2003 will never become anything other than what it is. But a User record is not immutable. People get married or change their name for other reasons. Phone numbers, addresses, hair color, height, weight, and virtually every other aspect of a person can change. Yet the person is still the same person. This is what surrogate keys model in a database - that everything about a record can change, yet it can still be meaningfully the same record.

In order to use an object in a hash-backed Collection (in Java), its hashcode must NOT change. The simplest way to accomplish this is to make the hashcode of a mutable persistent object its surrogate key and to use that key as a primary comparison in the equals method as well (see my older post on Implementing equals(), hashcode(), and compareTo()).

To make an immutable object, you sometimes need a mutable constructor object, like StringBuilder and String. StringBuilder allows you to change your object as many times as you want, then get an immutable version by calling toString(). This is clean and safe, but has some small costs in time and memory (transforming the immutable StringBuilder into a new immutable String object, then throwing away the StringBuilder). An alternative that I have not seen much is to create an immutable interface, extend a mutable interface from it, and then extend your object from that.

Here's an example based on java.util.List. Pretend each interface or class is in its own file:

// All the immutable-friendly methods from java.util.List.
// Interfaces like these could easily be retrofitted into
// the existing Java collections framework
public interface ImmutableList {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean containsAll(Collection<?> c);
    boolean equals(Object o);
    int hashCode();
    E get(int index);
    int indexOf(Object o);
    int lastIndexOf(Object o);
    ListIterator<E> listIterator();
    ListIterator<E> listIterator(int index);
    List<E> subList(int fromIndex, int toIndex);
}

// This interface adds the mutators
public interface java.util.List extends ImmutableList {
    boolean add(E e);
    boolean remove(Object o);
    boolean addAll(Collection<? extends E> c);
    boolean addAll(int index, Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    boolean retainAll(Collection<?> c);
    void clear();
    E set(int index, E element);
    void add(int index, E element);
    E remove(int index);
}

public class java.util.ArrayList implements List {
    // just as it is now...
}

public class MyClass {
    someMethod(ImmutableList<String> ils) {
        // can't change the list
    }

    public static void main(String[] args) {
        List<String> myStrings = new ArrayList<String>();
        myStrings.add("hello");
        myStrings.add("world");
        someMethod(myStrings);
        // Totally safe:
        System.out.println(myStrings.get(1));
    }
}

This doesn't solve the problem of passing a list to existing untrusted code that might try to change it. It also doesn't prevent the calling code from modifying myStrings from a separate thread while someMethod() is working on it. But it does provide a way (going forward) for a method like someMethod() to declare that it cannot modify the list. The programmer of someMethod() cannot compile her code if she tries to modify the list (well, short of using reflection).

Guaranteed immutability can be critical in writing concurrent code and for keys in hashtables. Not all objects can be made immutable, but many of those objects have immutable surrogate keys that, if used properly, work around the pitfalls of mutability. Limiting mutability and avoiding common mutable object pitfalls can lead to fewer bugs, easier readability, and improved maintainability.

Thursday, August 25, 2011

Typesafe List in Java 5+ part two...

Looks like John's comment on yesterday's post had us both up half the night working on the same thing. I'm posting my version here because it has a parameter that lets the caller determine the trade-off between speed and safety. When debugging, you can use Check.ALL (or leave it null). In performance-critical code, you can later set it to NONE. Check.FIRST is for situations where you just a sanity check - it's fast and better than nothing.

I've also included some test code and a routine that takes advantage of the String.valueOf() methods to coerce each element of the original list to a String, if necessary, much the way that many dynamically typed languages do. I wonder if it would be worthwhile to further optimize this method for long lists by dividing the original list into runs of items that do not need casting and copying those runs with stringList.addAll(list.subList(fromIdx, toIdx)) instead of copying one item at a time?

John's solution uses B.class.isAssignableFrom(a.getClass()) while mine uses a instanceof B. I think either works in this case because a List can only hold objects, not primitives. If we were dealing with primitives, John's solution would handle more cases than mine. Not sure if there are performance considerations, but I doubt they would be significant.

import java.util.ArrayList;
import java.util.List;

public class TypeSafe {

    public enum Check {
        NONE,
        FIRST,
        ALL;
    }

    @SuppressWarnings({"unchecked"})
    public static <T> List<T> typeSafeList(List list,
                                           Class<T> clazz,
                                           Check safety) {
        if (clazz == null) {
            throw new IllegalArgumentException("typeSafeList() requires a non-null class parameter");
        }
        
        // Should we perform any checks?
        if (safety != Check.NONE) {
            if ( (list != null) && (list.size() > 0) ) {
                for (Object item : list) {
                    if ( (item != null) &&
                         !(item instanceof String) ) {
                        throw new ClassCastException(
                                "List contained a(n) " +
                                item.getClass().getCanonicalName() +
                                " which is not a(n) " +
                                clazz.getCanonicalName());
                    }
                    // Should we stop on first success?
                    if (safety == Check.FIRST) {
                        break;
                    }
                    // Default (Check.ALL) checks every item in the list.
                }
            }
        } // end if perform any checks
        return (List<T>) list;
    } // end typeSafeList()

    public static List<String> coerceToStringList(List list) {
        if (list == null) {
            return null;
        }
        try {
            // Return the old list if it's already safe
            return typeSafeList(list, String.class, Check.ALL);
        } catch (ClassCastException cce) {
            // Old list is not safe.  Make new one.
            List<String> stringList = new ArrayList<String>();

            for (Object item : list) {
                if (item == null) {
                    stringList.add(null);
                } else if (item instanceof String) {
                    stringList.add((String) item);
                } else {
                    // If this throws a classCastException, so be it.
                    stringList.add(String.valueOf(item));
                }
            }
            return stringList;
        }
    } // end typeSafeList()

    @SuppressWarnings({"unchecked"})
    private static List makeTestList() {
        List l = new ArrayList();
        l.add("Hello");
        l.add(null);
        l.add(new Integer(3));
        l.add("world");
        return l;
    } // end makeTestList()

    public static void main(String[] args) {
        List unsafeList = makeTestList();
        List<String> stringList = coerceToStringList(unsafeList);

        System.out.println("Coerced strings:");
        for (String s : stringList) {
            System.out.println(s);
        }

        List<String> safeList = typeSafeList(unsafeList,
                                             String.class,
                                             Check.ALL);

        System.out.println("Safe-casted list:");
        for (String s : safeList) {
            System.out.println(s);
        }
    } // end main()
}

Wednesday, August 24, 2011

Checking an Unchecked Cast in Java 5 or Later

If query.list() returns an untyped List, then the following code will produce an ugly warning: "Unchecked Cast: 'java.util.List' to 'java.util.List<String>'

List<String> tags = null;
try {
    ...
    tags = (List<String>) query.list();
    ...

Unfortunately, you can only use @SuppressWarnings on a declaration. I don't want to add this annotation to my method declaration because my method does a lot more than this one cast and this is a useful warning to detect in the rest of the method. One solution is to add a new variable to make a declaration to use @SuppressWarnings on:

List<String> tags = null;
try {
    ...
    @SuppressWarnings("unchecked")
    List<String> tempTags = (List<String> query.list();
    tags = tempTags;
    ...

Now it produces an inspection warning in IDEA 10.5, "Local variable tempTags is redundant." I should probably just ignore this warning or turn it off altogether, but I wasn't quite comfortable with that either. In situations like these, I naturally turn to Joshua Bloch for guidance and his item #77 - Eliminate Unchecked Warnings has this advice in bold:

If you can't eliminate a warning, and you can prove that the code that provoked the warning is typesafe, then (and only then) suppress the warning with an @SuppressWarnings("unchecked") annotation.

I could probably prove that my query is only going to return Strings, but it occurred to me that there is a more general way to satisfy all the above requirements:

@SuppressWarnings({"unchecked"})
private List<String> getListOfStringsFromList(List list) {
    if ( (list != null) && (list.size() > 0) ) {
        for (Object item : list) {
            if ( (item != null) && !(item instanceof String) ) {
                throw new ClassCastException("List contained non-strings Elements: " + item.getClass().getCanonicalName());
            }
        }
    }
    return (List<String>) list;
}

The method does nothing but make a typesafe cast on a list, so I can @SuppressWarnings on the whole thing. It throws an undeclared runtime ClassCastException, but only if the caller makes a programming error and this exception would get thrown anyway wherever the list was eventually used if I didn't throw it here. In short, this method takes a programming error that cannot be caught by the compiler and does the next best thing: fail-fast, making the error easy to find. Using the above solution hides all that complexity and preserves the intent of the original line of code:

List<String> tags = null;
try {
    ...
    tags = getListOfStringsFromList(query.list());
    ...

I thought this was interesting enough to post. Well, at least interesting enough for people who are considering a Java logo tattoo. If you liked this, you probably want to check out John Yeary's Response and my Updated Generic Version (part 2 of this post).

Wednesday, March 9, 2011

Question about MySQL Update speed

I originally posted this as a question, but Eric Wood helped me solve it in email, so I've added the solution below. The minimum time MySQL with InnoDB tables takes to do an update on a 3Ghz Core 2 Duo runing 64-bit Ubuntu 10.10 is somewhere around 0.06 seconds, though I wonder if hard drive speed could be the gating factor? My average time using Hibernate is 0.15 seconds. I think JDBC would approach 0.06 seconds. This was done using the mysql command line client. Any thoughts would be appreciated. Also timings vs. other databases if you have a sense.

mysql> CREATE TABLE `test` (
    ->   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    ->   `last_click_date` datetime DEFAULT NULL,
    ->   `is_active` bit(1) NOT NULL DEFAULT b'0' COMMENT 'True if user is still logged in.',
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY `id` (`id`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.21 sec)

mysql> insert into test (last_click_date, is_active) values ('2011-03-07 14:18:16', b'1');
Query OK, 1 row affected (0.11 sec)

mysql> insert into test (last_click_date, is_active) values ('2011-03-07 14:18:16', b'1');
Query OK, 1 row affected (0.11 sec)

mysql> insert into test (last_click_date, is_active) values ('2011-03-07 14:18:16', b'1');
Query OK, 1 row affected (0.06 sec)

mysql> update test set last_click_date = '2011-03-07 14:18:16' where id = 1;
Query OK, 0 rows affected (0.07 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> update test set last_click_date = now() where id = 1;
Query OK, 1 row affected (0.06 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update test set is_active = b'0' where id = 1;
Query OK, 1 row affected (0.07 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update test set is_active = b'0' where id = 1;
Query OK, 0 rows affected (0.02 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> update test set is_active = b'0' where id = 1;
Query OK, 0 rows affected (0.10 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> update test set last_click_date = 20110307141816 where id = 1;
Query OK, 0 rows affected (0.06 sec)
Rows matched: 1  Changed: 0  Warnings: 0

It turns out that if I change the table to use the MyISAM engine, all the updates take 0.00 seconds.

Thursday, January 6, 2011

10 Most Important Password Manager Features

Maybe 1 in 60 of my accounts reports their passwords stolen every year. For every site that reports a break-in, a few others are probably broken into and don't know it or don't report it. I would guess that if you have accounts at 30 different sites, you should probably assume that one of them gets broken into every year. You can't stop people from discovering passwords this way, but if you use a unique, strong password, you can contain the damage so that a hacker cannot leverage the knowledge of one of your passwords to break into your other accounts.

I just watched How to choose a strong password and while that's good advice, most people can neither remember nor type a good password, or at least not more than one or two good passwords. The only practical way to use a unique, strong password for every site is to use a good password manager. As such, I'm proposing a Password Manager Feature Manifesto for people to use to compare password managers and decide which one is best for them.

Password Manager Feature Manifesto

A password manager needs to do certain things to be worthwhile:

1.) Store passwords securely, in one place so you can find them, change them, secure them as a unit. It always seemed to me that storing your passwords in your browser was a little bit like taping your wallet to the outside of your front door - you are putting your valuables in the most vulnerable place. KeePassX (without any plug-ins) is completely separate from your browser. Browser integration is not necessarily bad, but I think it loose some points from a security perspective. In any case, the passwords must be secured by a strong master password and encrypted on disk (and maybe in memory when possible too).

2.) Generate random passwords - people don't manually make strong passwords. Collecting entropy for the randomness is a huge plus. KeePassX and LastPass both generate strong passwords for you.

3.) Make it equally easy to store your credit-card number or license key for Photoshop as to store a password to a web site.

4.) Must be backed up every time you make a change. LastPass has this built-in. KeePassX must be used with something like Dropbox or SpiderOak and set to save automatically after every change.

5.) To be shared between multiple computers, e.g. LastPass or KeyPass/Dropbox

6.) Needs to be relatively easy to use

7.) To work on all major operating systems (Windows, OS-X, Linux). I look for this every time I choose software. I hate being tied to one vendor's operating system or browser.

If a password manager doesn't do all of those things, I'm not really interested in it. One thing that's not important yet, but I bet it will become critical for most people in the next few years:

8.) To work on your phone or other mobile device. Here is where LastPass may move ahead of KeePassX.

9.) Popular OpenSource software is recommended for security

And finally, not critical, but the icing on the cake:

10.) It's free, or at least a reasonable price.

That leaves KeePassX the clear #1 for me and LastPass #2. LastPass could threaten KeePassX if they keep improving on #6 and #8 - specifically, it is very hard to log into sites with LastPass that have the user ID on one screen and password on the next.

Sadly, no password manager can remember your operating-system login when you boot up your computer, so you have to remember that password yourself. Also, the master password for your manager. But for most people that's just 2 passwords to remember and type, and that's fairly do-able.

I have to thank DigitalMan for his contributions to this article by talking about this with me and sending articles about break-ins, security, and passwords for the past few years, and for encouraging me to improve my own password practices.

Update 2012-09-20: DigitalMan added the following excellent insights:

Re: Point 8: the iPhone now has an excellent, completely free, Open Source app which makes your KeePass database fully functional on the iPhone (and presumably the iPad as well): miniKeePass. To me, that further buries the case that Closed Source LastPass is a better option.

Lastly, Point 9 is crucial to me and not second tier. I'll leave you with my favorite Bruce Schneier quote:

As a cryptography and computer security expert, I have never understood the current fuss about the open source software movement. In the cryptography world, we consider open source necessary for good security; we have for decades. Public security is always more secure than proprietary security. It's true for cryptographic algorithms, security protocols, and security source code. For us, open source isn't just a business model; it's smart engineering practice.