Wednesday, September 14, 2011

Automatic POJO Generation From a Database

Like JPA (or presumably Spring), Hibernate "reverse engineering" tools can generate POJOs (Plain Old Java Objects) from database tables and vice-versa. Generating database tables from Java code is probably best used as a one-time short-cut, suitable for rapid prototyping. Because everything in an application is dependent on the database (and not vice-versa), future changes must be made in the database first (and any existing data migrated there first as well), then propagated to all affected parts of the application.

I have found that anything which the "database reverse engineering" process does not generate for me breaks, usually sooner rather than later. Also that the hardest part of managing maintenance of a large system is the constant refactoring. To that end, I have developed 2 goals for the database reverse-engineering process:

  1. It should keep my application in synch with any database changes, automatically updating as much of my application as is practical.
  2. Where #1 is not possible, it should cause any affected areas of my application to generate a compile-time error.

Imagine a database table:

CREATE TABLE `user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `last_modifier_id` bigint(20) unsigned DEFAULT NULL,
  `company_id` bigint(20) unsigned NOT NULL,
  `identifier` varchar(64) NOT NULL COMMENT 'PIN.  Unique within a company',
  `first_name_c` varchar(40) DEFAULT NULL,
etc.

Database generation tools normally generate the following sort of fields:

public class User implements java.io.Serializable {

private long id;
private User lastModifier;
private Company company;
private String identifier;
private String firstNameC;
etc.

public long getId() { return id; }
public void setId(long x) { id = x; }

public User getLastModifier() { return lastModifier; }
public void setLastModifier(User x) { lastModifier = x; }

public Company getCompany() { return company; }
public void setCompany(Company x) { company = x; }

public String getIdentifier() { return identifier; }
public void setIdentifier(String x) { identifier = x; }

public String getFirstNameC() { return firstNameC; }
public void setFirstNameC(String x) { firstNameC = x; }
etc.

I have modified the generator to also generate the following fields for each column in the table (prefixed with SQL_) and for each field in the Java object (prefixed with HQL_):

public static final String SQL_user = "user"; // table name
public static final String SQL_id = "id";
public static final String SQL_last_modifier_id = "last_modifier_id";
public static final String SQL_company_id = "company_id";
public static final String SQL_identifier = "identifier";
public static final String SQL_first_name_c = "first_name_c";
etc.

public static final String HQL_User = "User"; // class name
public static final String HQL_id = "id";
public static final String HQL_lastModifier = "lastModifier";
public static final String HQL_company = "company";
public static final String HQL_identifier = "identifier";
public static final String HQL_firstNameC = "firstNameC";
etc.

This allows me to write code like the following:

Crit<User> dupIdentCrit = Crit.create(User.class);
dupIdentCrit.add(Restrictions.eq(User.HQL_company, this));
dupIdentCrit.add(Restrictions.eq(User.HQL_identifier, identifier));
List<User> dupIdentifierUsers = dupIdentCrit.list();

Here's an example of building a SQL statement for use with JDBC:

StringBuilder sqlB = new StringBuilder("select * from " + User.SQL_user +
                                       " where " + User.SQL_company_id +
                                       " = ?");
if (!includeDeleted) {
    sqlB.append(" and " + User.SQL_is_deleted + " is false");
}
sqlB.append(" order by " + User.SQL_last_name_c + ", " +
            User.SQL_first_name_c + ", " +
            User.SQL_middle_name_c);

When the database changes and I rebuild my objects, I get compile-time errors, showing me every piece of code I need to fix.

I am working on modifying the generator to also generate the following for any varchar or char fields because their maximum length is defined in the database columns:

public static final int MAX_identifier = 64;
public static final int MAX_firstNameC = 40;
etc.

I am considering parsing comments on each column to look for something like: 'min: 6' and generate the following from it:

public static final int MIN_identifier = 6;

That same technique could be used with int columns to define minimum and maximum values which could be used in validation, in the GUI, and in the help of an appliction. I'm also experimenting with using the HQL_ tokens as the names of the input fields on the GUI screens.

I'm not sure how or even if this would work with stored procedures.

Special thanks to J. Michael Greata, whose interest and suggestions over the past several years have encouraged and shaped the development of this project. Also to Arash Tavakoli whose post in the LinkedIn Java Architects group inspired me to organize my thoughts into this posting.

Thursday, September 8, 2011

Checked vs. Unchecked Exceptions

Two-Sentence Review: Checked exceptions have to be declared in the method signature and dealt with by the calling code; Unchecked, RuntimeExceptions don't. The Java compiler enforces this rule with a compile-time error.

I've been enjoying, "Java: The Good Parts" by Jim Waldo and just finished chapter 3: "Exceptions." At the end of the chapter, Mr. Waldo takes a humorously firm stance against RuntimeExceptions. Indeed, RuntimeExceptions can be evilly misused. Yet I believe there is a time and a place for these exceptions. Knowing when to use them requires an understanding of who is at fault for a particular problem, and whether the problem is recoverable or not.

The following code sample bit me the other day:

// EVIL - NEVER DO THIS!
public static void write(String s) {
    try {
        out.write(s);
    } catch (IOException ioe) {
        throw new IllegalStateException();
    }
}

It is evil (as Mr. Waldo says) for several reasons:

  1. This method is not being responsible for handling its own problems. IOExceptions happen for good reasons other than coding errors - users can delete files, network connections can close, etc. Better than hiding this exception this method might retry the write() or recover some other way. If recovery were not practical, this method would do better to let the original exception bubble up to the caller. Another improvement might be to have this method report the error some other suitable way (e.g. to the user).
  2. Wrapping a checked exception with a RuntimeException means that this method signature is missing vital information that the caller really needs to know about. It hides critical details (any problem with the write() now causes an unexpected exception).
  3. Absentmindedly wrapping an exception with another exception serves only to complicate the stack trace - it further hides the cause of the problem. It's good to wrap an exception if you have critical information to add (e.g. the method handles two streams and you wrap an exception with the message of which stream it applies to). In the example above, nothing is added. In fact, this code doesn't wrap an exception, it throws it away and substitutes another which is even worse.
  4. The IllegalStateException is blank, giving the unfortunate caller no idea what went wrong.

The following shows proper usage of RuntimeExceptions:

/** Creates or returns an existing immutable YearMonth object.
@param year a valid year
@param month a one-based month between 1-12
@return the relevant YearMonth object
*/
public static YearMonth valueOf(int year, int month) {
    if ( (month < 1) || (month > 12) ) {
        throw new IllegalArgumentException("Month must be a positive" +
                                           " integer such that 0 < n < 13");
    }
...

This is good because:

  1. Valid input values are documented clearly for the caller in the JavaDoc.
  2. The inputs are checked at the beginning of the method, before any processing is done, to fail-fast and fail-loudly if passed garbage.
  3. It throws a RuntimeException to inform the caller that they have made a coding error.
  4. The exception provides information about what the caller did wrong.

You wouldn't want to use a checked exception because it would force the responsible caller's code to check their input values twice:

YearMonth ym;
// Responsibly check inputs before calling valueOf()
if (m > 12) {
    out.write("Month too big");
} else if (m < 1) {
    out.write("Month too small");
} else {
    // I already checked my inputs.  Why do I have to check for an exception
    // too?  Doing so doesn't benefit me in any way. It just makes the
    // interface unnecessarily difficult to use!
    try {
        ym = YearMonth.valueOf(y, m);
    } catch (Exception e) {
        out.println("month still invalid: " + e.getMessage());
    }
}

The critical nuance here is that RuntimeExceptions should be used to indicate a programming error on the part of the caller of the function that throws it. They are generally used in the first few lines of the function as a defensive check for invalid input values. Each RuntimeException should include a description of the problem (not be blank).

RuntimeExceptions can also be used to catch invalid state as follows:

enum Numb { ONE, TWO; }
private Numb num = null;

public void init(Numb n) {
    if (n == null) {
        throw new IllegalArgumentException("init cannot take a null Numb");
    }
    num = n;
}

public void showNum() {
    if (ONE == num) {
        out.println("1");
    } else if (TWO == num) {
        out.println("2");
    } else {
         throw new IllegalStateException("Unhandled value of Numb or called" +
                                         " showNum without initializing the" +
                                         " num");
    }
}

When some programmer adds THREE to Numb and doesn't account for the new possible value, it fails hard and fast, making the problem easy to find and fix.

RuntimeExceptions are a good way to make code fail hard and fast without forcing the caller to check their input values twice. Used properly they can make sneaky coding errors obvious. Used improperly, they can make and obvious errors sneaky.