Exploring alternative error handling in Java October 8, 2016
Using checked exceptions in Java has always been a pain, especially since the release of Java 8. However, there are other ways of handling errors.
Regardless of programming language, error handling is a fascinating
and necessary topic for all of them. As with any problem though, there
are very different ways of dealing with it. Some languages go with
language primitives like try and catch, others use error codes as
return values for handling erroneous conditions. Functional or
functionally inspired languages like Elm, Haskell
or Rust tend to do the latter, but with data types instead of
codes. This enables the implementation of error handling as libraries,
which can evolve with usage patterns and requirements, without
changing the core language.
One of my all time favorite talks surrounding this topic titled Growing a language by Guy Steele, one of the designers of the Scheme programming language, as well as team member for the Java programming language, goes into more detail. If you haven’t seen it yet, do yourself a favor and take a look. It’s phenomenal.
Exception handling in Java
Java obviously implements try and catch for handling
exceptions. Exceptions themselves even blend in nicely with the rest
of the language, since all of them are defined as classes.
Especially with the introduction of lambdas in Java 8 though, there are some annoying dark corners. A simple example might be trying to count the number of lines of all files in a given directory.
Path directory = Paths.get("tardis");
Files.list(directory).mapToLong(p -> {
try {
return Files.lines(p).count();
}
catch (IOException ex) {
return 0L; // Ugh...
}
}).sum();
Since long types have a neutral element, this approach would
generally work fine. Imagine deserializing an arbitrary object
though. Is an exception during one of the files
an error of the programmer? Should a RuntimeException be
thrown so no other file might be processed?
What if we wanted to know the exact path of every file, we weren’t
able to process? The only way, we would be able to handle such cases
currently, would be to drop down to good old for(..).
But that’s boring and tedious, so let’s look for an alternative.
Error handling with Pair
Whether the next file should be processed after an error in the previous one or not, depends on the business use case of course. But let’s pretend we would like to know, which path didn’t quite work out and still use the nice Stream API.
Well, what might happen in the above case? Either we get an
IOException or we get a long. Naively, we could model this as a
Pair or a Tuple, maybe even throwing in an enum to define which
state we are in. I called FileCountResult in this case.
class FileCountResult {
final Long value;
final Optional<Path> errorPath;
// OR
enum ResultType { ERROR, LINES }
final ResultType type;
final Long value;
final Path errorPath;
}
This approach comes with some caveats however. For one, we might be
able to initialize this result with a long and a Path which is
rubbish, since we want either one or the other. Especially with
the enum, there have to be methods to ensure consistency, which have
to be tested… which is more code… which has to be
maintained… which… you get the point. Also, the compiler can be
quite helpful in checking these invariants for us, if we nudge it a
little.
In Haskell or Elm, we would be able to describe the FileCountResult in the
following way.
data FileCountResult
= Lines Integer
| PathError Path
Basically, it means: There is a data type called FileCountResult. This
type will either be the number of lines as an Integer or the
erroneous Path. Thus the error is just a data type.
It’s possible to model this in Java as well, but a tad more verbosely.
public abstract class FileCountResult {
private FileCountResult() {}
public abstract <T> T match(
Function<Long, T> handleLines,
Function<Path, T> handlePathError
);
public static final class Lines extends FileCountResult {
private final Long value;
// implement match
}
public static final class PathError extends FileCountResult {
private final Path errorPath;
// implement match
}
}
So, what do we have here? We declare an abstract class FileCountResult, with an abstract method match to be implemented by
all subclasses. For every branch in Haskell (the | symbol), there is
a corresponding subclass in Java. Thus, in this case we have one for
the number of lines and one for the error. Both subclasses implement
the match method, using the corresponding function. This basically
implements pattern matching for this class and can be viewed as an
implementation of the visitor pattern in Java land.
To really seal the deal, we should make the constructor of both
subclasses private and implement two static factory methods.
Now, let’s simply implement a convenience method for retrieving the
number of lines as a FileCountResult.
public static FileCountResult countingLines(Path p) {
try {
return new Lines(Files.lines(p).count());
} catch (IOException ex) {
return new PathError(p);
}
}
Here is how the above code might look then.
List<FileCountResult> results = Files.list(directory).
map(this::countingLines).
collect(toList());
Now retrieving the erroneous paths is just some stream operations away and is left as an exercise to reader. My solution can be found here.
Generalizing our approach
So let’s recap for a second here: To implement data types, which
represent just a finite number of distinct forms in Java (also called
sum types in Haskell), we can use abstract classes and implement the
visitor pattern. Concretely, we encoded the result of counting the
number of lines in a file as either a resulting long or an erroneous
Path.
This pattern seems like something we might generalize. Instead of
either having a long or a Path we could have either any T or any
E. The T in this case stands for the resulting type (so it could
be long, Integer, we don’t really care) and E stands for the
error, but it could be any type as well. Consequently, I renamed
FileCountResult to Result<T, E> in the following class.
public abstract class Result<T, E> {
private Result() {}
pubilc abstract <R> R match(
Function<T, R> handleValue,
Function<E, R> handleError
);
// imagine static factory methods
// Result<T, E> ok(T value) { .. }
// Result<T, E> err(E error) { ... }
public static final class Ok<T, E> extends Result<T, E> {
private final T value;
// implement match and private constructor
}
public static final class Err<T, E> extends Result<T, E> {
private final E err;
// implement match and private constructor
}
}
Now, representing the FileCountResult is just a matter of defining
concrete classes for T and E, respectively: Result<Long, Path>.
In Elm, this definition basically looks like this:
type Result value error = Ok value | Err error
map and flatMap
Thinking about Result<T, E>, we might notice, it’s not that
different from an Optional<T>. Basically it’s the same, just with
a reason why a computation failed, instead of just nothing.
There are two very useful methods implemented by Optional, we might
consider in Result<T, E> as well: map and flatMap. To provide
some motivation for the former, let’s say we have a pure Function<T, R>, like squaring a number, which we would like to apply to the
result, if it is present. Currently, we would need to match on the
Result<T, E> and only apply the function, if we have a value of T.
That becomes annoying very quickly, especially in succession. That’s
what map is for.
public <R> Result<R, E> map(Function<T, R> mapper) {
return match(
value -> ok(mapper.apply(value)),
error -> err(error)
);
}
// Now we can do:
Result.ok(5).
map(x -> x * x).
map(x -> x * 2).
match(value -> value, err -> 0); // 50
Ok, so what about flatMap? Consider three functions, all three of
which take some value and return a Result<T, E>. How would we go about
composing them?
Result<String, SomeError> readStringFromFile(Path value);
Result<Integer, SomeError> parseInt(String value);
Result<Integer, SomeError> format(Integer value);
// Errr, well:
readStringFromFile("winteriscoming").match(
value -> parseInt(value).match(
value2 -> format(value2).match(
result -> ok(result)
err -> err(err)
),
err -> err(err)
),
err -> err(err)
);
Wow! That was awful. Consider yourself lucky, you didn’t have to type
that. Not even sure, that would compile… let’s look for an
alternative. What we actually want, is a method, which takes a
Function<T, Result<R, E>> and applies that function, only if there
is no error.
public <R> Result<R, E> flatMap(Function<T, Result<R, E>> mapper) {
return match(
value -> mapper.apply(value),
error -> err(error)
);
}
// Let's try again:
readStringFromFile("hodor").
flatMap(this::parseInt).
flatMap(this::format).
match(result -> result, err -> "Nooooooooooooooooo");
Woha, that’s much better. There are a number of other methods, that are quite useful in working with such a type. Take a look at this snippet, where I went ahead and implemented some of them. We might even go ahead and try to generalize catching errors and converting them to results. Although Java’s type erasure would make life harder, than it should be.
Authenticating a user
As one last example, take a look at authenticating a user with
exceptions vs Result<T, E>. Let’s assume, there are two things, that
might be wrong. Either the user cannot be found, or the password is
not correct. That sounds a lot like a sum type again. I am just
showing pseudo code here again, since otherwise this post might become
even longer.
public class AuthException extends Exception {
public static final class UnknownUsername extends AuthException {
private final String username;
}
public static final class WrongPassword extends AuthException {}
}
Authenticating a user with Exceptions
public Response authenticate(String username, String password) {
try {
User user = findByUsername(username);
user = checkPassword(user, password);
return Response.ok(user).build();
}
catch (AuthException ex) {
return Response.badRequest(ex.toJson()).build();
}
}
public User findByUsername(String username) throws AuthException {
Optional<User> user = // db call and stuff
return user.orElseThrow(() -> new UnknownUsername(username));
}
public User checkPassword(User user, String loginPassword) throws AuthException {
return user.getPassword().equals(loginPassword) ?
user :
throw new WrongPassword();
}
Authenticating a user with Result<T, AuthException>
public Response authenticate(String username, String password) {
return findByUsername(username).
flatMap(u -> checkPassword(u, password)).
map(u -> Response.ok(u).build()).
formatError(e -> Response.badRequest(e.toJson()).build()).
}
public Result<User, AuthException> findByUsername(String username) {
Optional<User> user = // db call and stuff
return user.map(Result::ok).
orElse(Result.err(new UnknownUsername(username)));
}
public Result<User, AuthException> checkPassword(User user, String loginPassword) {
return user.getPassword().equals(loginPassword)) ?
ok(user) :
err(new WrongPassword());
}
Comparing both approaches
Syntactically, working with the Result is much noisier, than using
the built-in try and catch mechanisms provided by Java. As evident
in the file example though, it provides some benefits when working
with streams, precisely because it’s just implemented like any other
ordinary class.
Even though the bare essence of Result<T, E> is a mere 50 LOC
without comments (including match, map, flatMap), it’s very
flexible, extensible (as shown by the number of additional methods I
implemented) and models most of checked exceptions without any
language primitives. Even multiple exceptions can be implemented by
using the same trick with abstract classes we used for the Result
itself, as seen with the AuthException.
Additionally, for me personally, it somewhat forces me to think more
fine-grained about what might actually happen in my code and tearing
these concerns apart, because in the end, it’s just simple composition
with map and flatMap, to bring them all together again.
Conclusion
So, should we start using only Result<T, E> from now on? Of course
not, but the approach to modeling it and more generally sum types in
Java, is quite a handy trick to have in your toolbox.
It lends itself quite well to modeling state machines among other things, because it enables the compiler to check some constraints for us, instead of using tests to fixate them. Adding an additional state with more and different data requires just adding another class.
Besides, Elm and partly Rust actually really do handle errors this way.
Thanks for reading, that’s it for now. Have a nice day!