Glen on Programming: functional programming

Showing posts with label functional programming. Show all posts

Wednesday, January 30, 2013

The Language After Java: Reflections on my first JavaOne

My thoughts around my first JavaOne conference centered around, "What is the next big thing going to be?" in the sense of "what do I need to learn to stay current?" I definitely came back with a short list of very promising, but currently underdog technologies:

JUnit (or unit testing in general) is easy and important enough to be part of every project. I even wonder if good unit tests are easier and more beneficial than type-safety in your programming language!
Scala (successor to Java)
Git (successor to Subversion)
Maven (successor to Ant)
Functional Programming (it's finally ready for prime-time)
Algorithms (more than Data Structures)

These thoughts motivated me to make some tool changes and to take Martin Odersky's "Functional Programming Principals in Scala" (which was mind-blowing).

I've seen the rise of Data Structures, then Object Oriented Programming, and now Functional Programming and Algorithms. Functional Programming was really a dark horse from my perspective because Lisp is the second-oldest programming language, yet FP failed to become mainstream until very recently.

My functional experience before Scala was confined to e-lisp and XSLT, neither of which make a particularly good general-purpose programming language. The recent death of Moore's Law and the triumph of concurrent processing in cloud services like AWS have given Functional programming better performance and code-comprehension characteristics than imperative programming. I definitely didn't see that coming.

Sometimes the Don Quixote principal makes bad popular ideas successful - if enough people believe in something it becomes true. I wondered about that and Functional Programming. "Computer Science" has lived in the Math department of most universities. If you get enough math geeks doing programming, of course they will start to favor a style that looks and behaves like functions from the field of math. So what, I thought, if Functional Programming was based on Lambda Calculus - is it practical for solving business problems?

But this was not the right question. The people asking, "What can a program possibly do" and "What is the best way to do X?" are the ones creating the technological innovations that ultimately drive the rest of the industry. The limitations of what programs can do (and how efficiently those tasks can be accomplished) constitute a good part of the study of algorithms and the inspiration for new languages. Sure, you can do most things in most general-purpose languages, but as James Roper said and Havoc Pennington quoted in his recent talk (slides available here), just because you can do something in a language doesn't mean that you will. Some things are just too difficult in some languages. This is why knowing both Imperative and Functional styles is very helpful in addition to studying Algorithms.

The easy problems may continue to be solved in imperative languages without much thinking about algorithms at all: transform one kind of text to another, deliver the right data for a given request, or generate a report from a database. But the hard problems are increasingly going to require concurrent processing with the most effective algorithms. Sure, it's easier to do some things in a functional language, but a great algorithm in a limited language beats a crummy algorithm in a concise and beautiful language any day.

Java is a great language, but it is showing its age:

Primitives and API classes (like List and Map) are all designed to be mutable. Concurrent programming with this inherent unsafety is a real bear. Even without concurrency, it's just easier to reason about things that don't change. Just using Java collections properly requires immutability!
Collections are managed through iteration which puts the burden of concurrent programming on the user of those collections. This will be fixed by Lambdas in Java 8.

Scala is the language most similar to Java that overcomes these limitations, so you may find more of that and less Java going forward. In any case, my search for "The Language After Java" has led me instead to a short list of really promising tools, and a rapidly growing interest in Functional Programming and Algorithms. I only wish I had hit on these ideas earlier in my career. Thank you JavaOne - and thank you John for getting me to go there!

Friday, January 4, 2013

Ternary Operator (?:) in Java

Overview

Like the if/else statement, the ternary operator creates a logical branching in the code where it is used. Unlike if/else statements, the ternary operator is an expression, meaning that it produces (returns?) a value. Expressions like (a * b) can produce a numeric (int, long, float, double) value and (a || b) can produce a boolean value. But only functions and the ternary operator can yield any type of object or primitive.

Here are two simple examples. Each creates a String s and sets its value depending on the value of x:

// Set salutation using 'if'
String s = null;
if (isFriend) { s = "Hey, " + fName; } else { s = "Hello, " + fName; }

// Set salutation using '? :'
String s = (isFriend ? "Hey, " : "Hello ") + fName;

It's called the "The Ternary Operator" because it is the only operator (in the languages that use it) that takes three operands. Perhaps it would be better to call it "Evaluative-If" or "Question-mark Colon" but I'm not campaigning to rename it today.

Functional programmers like the ternary operator because it creates a block of code that produces a value - much like a closure.

Known Good and Bad Uses

There are countless ways to write bad code, and some language features have more potential for abuse than others. Because of its brevity, the ternary operator can be a particular nightmare, especially when used with statements that rely on operator precedence:

// DO NOT DO THIS - USE PARENTHESIS!
a == b || c & d ? e ^ b : d | b

Also, the ? : is so tiny on the page that it becomes very hard to see when used with lengthy (multi-line) operands. It's designed for short things that generally fit on one line and produce a value.

I have yet to find a significant timing difference between ?: and if/else statements, so we can rule out performance as a reason to choose one syntax over the other.

There was a discussion about the ternary operator on StackExchange recently and one person provided a great example of how NOT to use the ternary operator:

// Nesting the ternary operator is EVIL - DO NOT DO THIS!
int median1(int a, int b, int c) {
    return
        (a < b)
        ?
            (b < c)
            ? b
            :
                (a < c)
                ? c
                : a
        :
            (a < c)
            ? a
            :
                (b < c)
                ? c
                : b;
}

Nesting the ternary operator (? ? : :) is always evil because it's not clear which question-mark goes with which colon. But chaining (? : ? : ? :) with proper indentation and short blocks of code can be very clear. Note: the examples below take advantage of the short-circuiting nature of return statements to remove the else clause for brevity:

// With one 'if' to remove the nesting
int median2(int a, int b, int c) {
  if (a < b) {
    return (b < c) ? b :
           (a < c) ? c : a
  }
  return (a < c) ? a :
         (b < c) ? c : b;
}

I find that easier to read than using if statements:

// All 'if' statements
int median3(int a, int b, int c) {
  if (a < b) {
    if (b < c) { return b; }
    if (a < c) { return c; }
    return a;
  }
  if (a < c) { return a; }
  if (b < c) { return c; }
  return b;
}

In Java, you can call a super-class constructor from a sub-class constructor only in the first line of that sub-class constructor, before any other methods are called. The ternary operator is the only way to change your input data before passing it to the super-class constructor.

public class MyClass extends MySuperclass {
  public MyClass(int a) {
    super(a > 0 ? true : false);
    ...
  }

Anywhere that a value is needed from a very short calculation, the ternary operator can be useful. Especially for preventing NullPointerExceptions:

out.print(name == null ? "" : name);

Conclusion

While the ternary operator has potential for misuse (particularly when nested, poorly indented, covering large blocks of code, or where operator precedence is critical), it also has potential for some good as shown above. Just be careful to use it only for good, ideally in situations that take advantage of it's evaluative nature. Many thanks to Programmers.stackexchange.com for inspiration for this post.

Monday, September 17, 2012

Java Closures and The Start-End Problem

I use the term, "Start-End Problem" to describe a resource that needs to be opened and closed, or a header and footer that needs to be printed in a way that would tempt you to make a class with start() and end() methods, that are meant to be called as a pair with other code in-between. Some time around 1997, my mentor Jack Follansbee told me to avoid this pattern whenever practical, because it's too easy to forget to call the end() method. Another problem would be if the code in-between performs a return, continue, break, throws an Exception, or otherwise avoids reaching the end() method.

Java's try-catch-finally block solves the return and Exception problems, but does not ensure that you will remember to call the end() method in the finally block. There are three traditional Java approaches to this problem:

Pre-evaluating the middle code somehow (e.g. with a buffer) and passing it to a procedure which appends to the Start and End of the buffer.
```
public void doStartEnd(StringBuilder sB) {
    sB.insert(0, "Start");
    sB.append("End");
}
```
This has limited applications, but is nice when it works.
Dependency Injection: Creating a special-purpose procedure with variables to adapt it to multiple related purposes. If the number of variables is small (or 0) this works very well. But sometimes multiple variables need to be passed to this procedure and even a complex data structure must be created to hold the updated values for the return type. It seems a waste of time and energy to load all the variables into the procedure arguments like passengers on a bus then unloaded their updated values from the return data-type one by one.
```
public class ReturnObject {
    public int count;
    public boolean showedAnything;
}
public ReturnObject doStartEnd(int count, String middle, boolean showedAnything) {
    System.out.println("Start");
    if (!showedAnything) { System.out.println("New stuff"); }
    for (; count < 5; count++) {
       showedAnything = true;
       System.out.println(middle);
    }
    System.out.println("End");
    ReturnObject ro = new ReturnObject();
    ro.count = count;
    ro.showedAnything = showedAnything;
    return ro;
}
```
Real world examples can get complicated very quickly. Someone on StackExchange: Programmers recently said they routinely found procedures with 10 or 20 parameters where they worked and that they "died a little bit inside" every time they found one. There are numerous ways to cause bugs with this approach: transposing values, confusing Java's pass-by-value for default types with pass-by-reference... just to name a few. The coding effort involved can be impressive.
Create an interface for the middle code being executed and have an object implement that interface. A well designed object, possibly with a Builder pattern can mitigate some of the Bus Station issues, but since this pattern is just the Dependency Injection patter turned inside-out it sometimes just pushes the dependency injection overhead of the above solution from a procedure to an object. This used to be the most general solution to the start-end problem. The first example below uses an abstract class and an anonymous implementation but that is just a minor variation on this technique.

A functional programmer would say that a lexical closure would be the obvious solution to this problem. It allows an arbitrary amount of code (including the caller's local variables) to be executed in the middle of some other code. No pre-evaluation, no Bus Station, no extra objects or interfaces. The outer "enclosing" code can do the start() and end() logic with the "enclosed" code in a little magic closure envelope in the middle. Java doesn't have closures or function pointers, but an anonymous inner class looks a little like a closure and the following code compiles, works, and even solves the Start-End problem for some special cases, though it might not win many beauty contests:

public class ClosureTest {
    private static abstract class StartEnd {
        public void doStartEnd() {
            System.out.println("Start");
            middle();
            System.out.println("End");
        }
        public abstract void middle();
    }

    public static void main(String[] args) {
        new StartEnd() {
            @Override
            public void middle() {
                System.out.println("Middle");
            }
        }.doStartEnd();
    }
}

Output:

Start
Middle
End

A sad limitation of this technique is that Java tries to prevent you from updating the main() method's local variables inside the the anonymous inner class by forcing you to make them final (immutable). The following WILL NOT COMPILE:

public static void main(String[] args) {
    int count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            // ERROR: local variable count is accessed from within
            // inner class; needs to be declared final
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + count);
}

A mutable wrapper class will work around this restriction. Uglier, but it works:

private static class MutableRef<T> {
    public T count;
}
public static void main(String[] args) {
    final MutableRef<Integer> mr = new MutableRef<Integer>();
    mr.count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            for (; mr.count < 5; mr.count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + mr.count);
}

Output:

Start
Middle
Middle
Middle
Middle
Middle
End
Total Count 5

Java 7's try-with-resources feature provides my favorite solution to this particular issue. Not a true general-purpose lexical closure, but sure looks like one for solving this particular problem:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        } // end of StartEnd
        System.out.println("Total Count " + count);
    }
}

One more detail... If you compile with -Xlint it may complain, "warning: [try] auto-closeable resource se is never referenced in body of corresponding try statement." I have found that using this pattern in my own code I usually use the se variable within the corresponding try block. But for the times that I don't, a preventCompilerWarning() method eliminates the warning:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
        public void preventCompilerWarning() { }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            se.preventCompilerWarning();
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
        System.out.println("Total Count " + count);
    }
}

Voilà - the Start-End problem solved! Unlike a lexical closure in a functional language, this technique executes the close() method even when a return statement is reached or an Exception is thrown. This may be good or bad depending on your situation.

Notice that the local variable count is being used "inside" the StartEnd block without being visible to that code? No dependency injection, interfaces, objects, etc. This is the essence of a lexical closure - a little bubble of extra variable scope without otherwise violating the privacy of the enclosed code. Hopefully you can imagine some of the versatile coding paradigms which the new try-with-resources block makes available in Java 7.

For a more general solution to the problem of closures in Java before Java 8, check out this post.