Wednesday, July 30, 2008

Hibernate automatic POJO and mapping file generation from MySQL

Let's face it. Updating Hibernate .hbm.xml and .java files for every database change is phenomenally boring and a waste of time. It's the sort of thing that makes people switch to Ruby/ActiveRecord. I have recently undergone an effort to get the Hibernate reverse engineering tools to generate .hbm.xml and .java files as close as possible to what I want. In theory that should allow me to add new methods to the files in my source tree and still generally get hands-free updates by running the hibernate updater and diffing/patching. In practice, getting Hibernate to give me exactly what I want is very hard, so in the first week I moved from 3% of my changes being automatically applied to 80%. In the next month, I moved up to about 95% and it will probably slowly continue to improve.

The basic outline of the process is simple:

  • Save the generated files from the last time I ran the generator:

    mv generatedSchema generatedSchema.1
  • Run the generator again and diff the files it generates this time against the ones it generated previously

  • Patch those changes to my source tree.

I use the ant task, not the Eclipse integration because I'm in love with the command line. Assuming that you have read and absorbed all the documentation about hibernate reverse-engineering, I'll list a few key points from my hibernate.reveng.xml.

hibernate.reveng.xml

For some reason BIGINT was being reverse-engineered as Long (the object) instead of long (the primitive). I'm not so sure whether my VARCHAR mapping is a good idea...

I included an example of renaming a column, and a much more useful example of naming a parent-child relationship. The syntax of the foreign-key mapping was not obvious to me, so I hope you find this example helpful:

<hibernate-reverse-engineering>
 <type-mapping>
  <sql-type jdbc-type="BIGINT" hibernate-type="long" not-null="true"></sql-type>
  <sql-type jdbc-type="VARCHAR" length="1" hibernate-type="char" not-null="true"></sql-type>
 </type-mapping>
 <table-filter match-catalog="mydb" match-name="temp_user" exclude="true"/>
 <table catalog="mydb" name="company"> 
  <column name="identifier_c" property="coIdentifierC" type="string" />
 </table>
 <table catalog="mydb" name="tree"> 
  <foreign-key constraint-name="tree_ibfk_parent">
   <many-to-one property="parent" />
   <set property="children" />
  </foreign-key>
 </table>
</hibernate-reverse-engineering>

pojo/*.ftl Files

In addition to these customizations I've made extensive use of the Freemarker POJO templates. I made a number of very minor changes for personal preference, but one very important change for diff and patch to work. I added a two-line comment before and after the fields (in PojoFields.ftl) and the get/sets (in PojoPropertyAccessors.ftl):

// Begin Auto-generated Accessors (by myGenerator project)
// Manual editing is STRONGLY DISCOURAGED below this point!

...

// End Auto-generated Accessors (by myGenerator project)
// Manual editing is OK below this point

Having two lines of comments is very important because:

  • Hibernate reverse-engineering tools like to shuffle the field order even when you changed one of the related tables.

  • Diff uses the previous 2-3 lines for comparison to confirm it's found the right place in the file and a two-line comment plus one blank line makes it very happy.

diff and patch

After running the hibernate ant task (and tweaking the output with some canned perl regular expression replacements), I run diff (telling it to ignore the timestamped Hibernate comment):

diff -I '^\(//\|<!--\) Generated ' -druN generatedSchema.1/ generatedSchema/ >patch.txt

Often I check what changes were made:

less patch.txt

From my source tree, I do a dry-run of patching the differences.

cd WEB-INF/src
patch --dry-run -l -p1 <../../../../../scorecardSchemaGen/patch.txt

If there are any failures:

  • Patch those files manually performing any cleanup necessary to make patches more likely to succeed automatically next time
  • Copy the new generated version of the manually patched file from generatedSchema to generatedSchema.1
  • Re-run the diff command so that it will find no differences in the generated version(s) of the manually patched file(s).
  • Perform another dry-run patch
  • Repeat until there are no errors in the dry-run

Now there will be no failures when you patch for real:

cd WEB-INF/src
patch -l -p1 <../../../../../scorecardSchemaGen/patch.txt

Finally, clean up the backup files made by patch:

rm company/project/section/*.orig

MySQL Conventions

When choosing optimal types and variable names for the columns in MySQL, one should take into account how Hibernate will reverse-engineer it so as to make sure the resulting files are as close as possible to what you want. In addition, MySQL has no "check" function and the MySQL BOOLEAN becomes TINYINT(1) which allows values from -128 to 127. Here are the subset of data types that are really easy to use with all tools and provide some data integrity by eliminating possibilities for meaningless values.

  • Boolean: Start each name with is_ whenever it makes sense - the code reads more like English that way. Use positive column/variable names. E.g. instead of preventUpdate, isNotDirty, or doesntHaveErrors, use allowUpdate, isClean, or isErrorFree. Declare in MySQL as:
    is_something bit(1) NOT NULL DEFAULT FALSE

    Note: Java/Hibernate like this but PHP and some other tools may not. Here are two work-arounds for other tools:

    select if(is_something, 1, 0) from my_table;
    or
    select if(is_something, true, false) from my_table;
  • Integers:Try to only use MySQL (Java) types: INT (int), BIGINT (long), and maybe TINYINT (byte). The size (the number in parenthesis after the type in MySQL) has no effect that I can see. The SMALLINT (short) data type requires a trade-off between saving space and limiting possible values and ease of use in Java. It requires too much type casting for my taste when working with it in Java. You even have to use casts for situations like:
    i = (short) (i + 1);
    myObject.setShort( (short) 0 );

    Unless you require a smaller data type for data integrity or for space considerations, It's easiest to stick with INT. Stay away from MySQL's MEDIUMINT: It's a 3-byte integer which is probably why Hibernate does not recognize it. 64-bit (8-byte) integers like BIGINT (long) didn't used to even be available on hardware. The manuals don't make any note of whether they are supported by hardware or not. Anyone know?

  • Strings: Use VARCHAR with the appropriate length. That way, it's easier to support utf-8 or other variable-width character encodings if you want to.

  • Naming: I started off naming my columns description_c and year_i, but ended up dropping the type-specific suffix because Java and MySQL are both type-safe anyway and the extra letter made the code slightly harder to read. I still use the word _date for dates because that seems to make the code easier to read (to my eye).

  • Foreign Keys: MySQL uses a separate constraint name and foreign key name. The constraint name has to be unique within the database so it's automatically assigned the tablename_ibfk_1 where the "1" is a sequence number incremented for each new constraint. If I drop and recreate a table in development, but merely add a new column in test, then the sequence numbers could be different in different environments and issuing a command to change a constraint in one environment will change a *different* constraint in another environment. This would be really bad.

    To avoid this, I use the following template to declare a foreign key with MySQL InnoDB (the default database, MYISAM ignores the "on-delete cascade" clause):

    CREATE TABLE process ( 
        ... 
        company_id BIGINT UNSIGNED NOT NULL, 
        constraint `process_fk_company` FOREIGN KEY (company_id) 
            REFERENCES company(id) on delete cascade, 
        ... 
    ) ENGINE=InnoDB;

    That way I'm sure that the constraint name (which is what I have to reference in the Hibernate file) will not change. It's confusing because MySQL syntax mixes foreign key column names and constraint names. E.g. DROP FOREIGN KEY constraint-name. To find what constraint name is used for a given foreign key, use the SHOW CREATE TABLE command. Constraint names in mysql look like `tablename_ibfk_1`