Strong typing–you either love it or hate it! Elaborate type systems catch a lot of programming errors at compile time. On the other hand they impose the burden of type annotation on the programmer, and they sometimes disallow valid programs or force type casting.
In Java there’s been a never-ending discussion about the lack of support for const types. People coming from the C++ background see it as a major flaw. Others would defend Java from constness with their own bodies.
I’m always on the lookout for solutions that let programmers have a cake and eat it too. Pluggable types are just the thing. You want const enforcement? Add a constness plug to your compiler. You want compile time null reference checks? Add another plug.
How is it possible?
Java annotations as type modifiers
Did you know that Java had a system of user-defined annotations (C# has a similar thing)? Annotations start with the @ sign and can be used, for instance, to annotate classes or variables. The Java compiler doesn’t do much with them other then embed them as metadata in class files. They can be retrieved at runtime as part of the reflection mechanism.
I never thought of annotations as more than an ad hoc curiosity until I read Matthew Papi’s Ph.D. thesis, Practical pluggable types for Java. The paper proposes the use annotations to extend the Java type system. This work contributed to the JSR 308 proposal, which allows annotations to appear anywhere types are used. For instance, you could annotate a declaration of an object with the @NonNull qualifier:
@NonNull Object nnObj;
or declare a read-only list of read-only elements:
class UnmodifiableList<T> implements @ReadOnly List<@ReadOnly T> { ... }
What distinguishes such annotations from bona fide comments is that they can be type checked by user-defined checkers. These checkers can be plugged into the compiler and run during compilation. They get access to abstract syntax trees (ASTs) produced by the language parser, so they can also do semantic checks (e.g., detect method calls through a potentially null pointer).
Examples of type system extensions
A simple NonNull checker detects assignments of potentially null objects to non-null objects, as in:
Object obj; @NonNull Object nnObj = obj; // warning!
The checker must know that the default qualifier is @Nullable (potentially null) and @NonNull is a “subtype” of @Nullable (strinctly speaking, @NonNull T is a subtype of @Nullable T). In fact, simple type annotations of the @Nullable variety may be defined within the program and type-checked by the generic Basic Type Checker provided as part of the type checker framework. For instance, the subtyping relationship may expressed declaratively as:
@TypeQualifier @SubtypeOf( { Nullable.class } ) public @interface NonNull { }
However, a specialized checker can do more than just check type correctness. The NonNull checker, for instance, issues a warning every time the program tries to dereference a @Nullable object (remember that @Nullable is the default implicit annotation).
An even more useful checker enforces immutability constraints (the generalization of const types). The IGJ checker (Immutability Generic Java) introduces, among others, the following annotations:
- @Mutable: The default. Allows the object to be modified.
- @ReadOnly: An object cannot be modified through this reference. This is analogous to C++ const reference.
- @Immutable: An object may not be modified through any reference (not only the current one). Java Strings are immutable.
Incidentally, similar immutability constraints are built into the type system of the D programming language, except that D’s const and immutable (formerly known as invariant) are transitive.
Try it!
Suppose you are a fan of immutability constraints and you want to try them right now. First, download the JSR 308 compiler and checkers from the pluggable types web site. To compile your annotated program, use the command line:
javac -processor checkers.IGJ.IGJChecker <source-files>
If your program doesn’t obey immutability constraints specified in your source code, you’ll see a bunch of warnings and errors.
Now for the annotations. You might want to do them bottom up:
- Annotate methods that don’t modify this:
void foo() @ReadOnly;
These methods may only call other @ReadOnly methods of the same object (the checker will tell you if you break this rule).
- Annotate arguments that are not modified by a method, e.g.,
void foo(@ReadOnly Bar bar);
- Annotate objects that never change after creation:
@Immutable Bar bar = new @Immutable Bar();
And so on…
You can pass @Immutable as well as @Mutable objects to methods that take @ReadOnly arguments, but not the other way around.
Of course, any attempt to modify a @ReadOnly or @Immutable object will be flagged as error by the checker.
Nullness checker
The nullness checker protects your program from dereferencing uninitialized objects. For instance, you can only call a method or access a property of an object that is typed as @NonNull.
Since the implicit default annotation is @Nullable, the following code will be flagged as an error:
Bar bar; bar.toString();
There are some interesting issues related to nullness. For instance, during object construction, this cannot be considered @NonNull. Consequently you cannot call regular methods from an object’s constructor. To remedy this situation without compromising the type system, a special annotation @Raw was added. Inside the constructor, this (and all @NonNull subobjects under construction) are considered @Raw. So if you want to call an object’s method from inside its constructor, such method must be annotated @Raw.
Polymorphism
It’s possible to overload functions based on their nullness annotation. For instance, if you know that a function’s argument is @NonNull, you might want to skip a null test:
String stringize(Bar bar) { if (bar != null) return bar.toString(); else return ""; } // Specialized version String stringize(@NonNull Bar bar) { return bar.toString(); }
Interestingly, since the compiler does flow analysis, it knows that inside the if statement bar is not null, even though it’s (implicitly) declared as @Nullable, so it won’t flag it as an error. (This is called flow-dependent typing–I implemented something like that in my Dr. Dobb’s article using local variable shadowing.)
Let’s try something more challenging–declaring the identity function, a function that simply returns its argument. The naive declaration
T identity(T x);
has a problem when nullability annotations are in force. Since the default annotation is @Nullable, the above is equivalent to
@Nullable T identity(@Nullable T x);
When called with a @NonNull argument, this function will return a @Nullable result. That’s not good! You can ameliorate this situation by declaring a separate overload:
@NonNull T identity(@NonNull T x);
The problem is that, when you keep adding new kinds of type annotations to your system, the number of overloads of identity starts growing exponentially.
Enter polymorphic annotation @PolyNull. The identity function can pass the nullness annotation through, if you declare it as follows:
@PolyNull T identity(@PolyNull T x);
Amazingly enough, @PolyNull will also work with multiple arguments. Consider the max function:
@PolyNull T max(@PolyNull T x, @PolyNull T y);
Through the magic of type unification, this declaration does the right thing. The nullability annotations of both arguments are unified to the lowest common denominator–the most derived annotation in the subtyping hierarchy to which both types can be implicitly converted. So, if both arguments are @NonNull, their unification is @NonNull, and the result is @NonNull. But if even one of the arguments is @Nullable, then the result is @Nullable. Notice that this declaration covers four possible overloads:
@NonNull T max(@NonNull T x, @NonNull T y); @NullableT max(@NonNull T x, @NullableT y); @NullableT max(@NullableT x, @NonNull T y); @Nullable T max(@NullableT x, @NullableT y);
For non-static methods, the annotation of this may also take part in the unification:
@PolyNull T foo(@PolyNull T x) @PolyNull;
Shortcomings
One of the limitation of user-defined type annotations is that they can’t take part in program optimization. That’s because the type checker is not allowed to modify the AST. (See Walter Bright’s blog in Dr. Dobbs about optimization opportunities based on immutability in the D programming language.)
Acknowledgments
Michael Ernst, who was Matthew Papi’s doctoral adviser, clarified some of the issues for me. Since I’m attending his seminar, I’ll be blogging more about his work.
January 18, 2009 at 6:22 pm
“Elaborate type systems catch a lot of programming errors at compile time. On the other hand they impose the burden of type annotation on the programmer, and they sometimes disallow valid programs or force type casting.”
You’re not talking about elaborate type systems there. You’re talking about pathetic and impractical type systems like Java’s.
An elaborate type system most certainly does not burden you with type annotations and disallows far fewer valid programs than Java’s type system does. Further, an elaborate type system disallows type casting altogether.
I hope this helps.
January 19, 2009 at 4:53 am
You wrote:
“On the other hand they impose the burden of type annotation on the programmer, and they sometimes disallow valid programs or force type casting.”
My opinion:
1. A burden of type annotation? I don’t understand. If you are e.g. declaring a function which is to be called from other modules, you have to state the types of its parameters – otherwise you might end up calling the function with arguments it cannot process. This is a burden? On the other hand if you mean [unnecessary type “annotations” in the body of a method in Java], then you are right – but the fact that Java requires each declaration of a variable to have an explicit type, does not imply that it is the programmer who has to fill it in. If the type of a variable can be inferred from its initial value, why does not the IDE automatically fill it in? (Hint: Because the IDE has been implemented by the same organization which designed the Java language?)
2. How can a consistent type-system disallow valid programs? Such an idea is completely insane. Unless you are talking about crappy type-systems only.
3. Sometimes forces type casting? Just because many popular programming languages do/allow type casting in a similar way to C/C++ or Java, it does not imply that it is the best way to do it.
Contrary to the previous comment, I do not believe our opinions will help you in any way because you are simply unable to understand them.
January 19, 2009 at 6:33 am
1. “If you are e.g. declaring a function which is to be called from other modules, you have to state the types of its parameters – otherwise you might end up calling the function with arguments it cannot process.”
This is absolute nonsense. Haskell — QED
2. How can a consistent type-system disallow valid programs?
It’s known law of theoretical computer science. Have fun figuring it out 🙂
January 19, 2009 at 6:44 am
This is a nice, informative article; I don’t understand the vitriol behind the comments. I was unaware the term “elaborate type system” was clearly and categorically defined. Elaborate seems like a somewhat meaningless adjective in this usage, kind of like saying “cool type system” or “better type system”. These are undefined value judgements and IMO worthy of an informed comment rather than condemnation.
A few of my comments:
“If you are e.g. declaring a function which is to be called from other modules, you have to state the types of its parameters”. This hasn’t been the case with my experience. Type inference systems like HM used by F# and OCaml don’t force you to declare types on functions. The compiler can infer them for you. I don’t think your comment is accurate for this reason.
“How can a consistent type-system disallow valid programs?” Although I have not programmed in Haskell, I do know from experience that type system constraints in F#/OCaml sometimes require you to annotate your program with metadata in order to compile. For instance, I can write a correct string manipulation function but in some cases the type inference engine sees ambiguity in the types and I need to add annotations. I see this as a programming language type system disallowing a valid programming. Perhaps you’re thinking of something different. I am unaware of a true, correct, one-type-system-to-rule-them-all. It seems like type systems are always a tradeoff between competing concerns… and designing a programming language is much the same. I’m not sure speaking in absolutes is helping me understand your comment. Some more examples of of your statements would be helpful.
January 19, 2009 at 7:42 am
“This is absolute nonsense. Haskell — QED”.
May I ask you to do a favor? Translate this piece of pseudo-code into Haskell for me:
module 1 (implementation ‘impl1’):
int fn(int x) {
return (x*x)
}
int restricted-fn(restricted_int x) {
return fn(x)
}
module 1 (declaration):
type restricted_int(x) = ((x is int) and (abs(x)<10))
function fn(int x) returns int
function restricted-fn(restricted_int x) returns int
module 2 (implementation):
import declarations from module 1
do {
restricted-fn(1)
restricted-fn(20)
}
application 1:
load module 1, use implementation impl1
execute module 2
January 19, 2009 at 7:59 am
2. How can a consistent type-system disallow valid programs?
It’s known law of theoretical computer science. Have fun figuring it out 🙂
—
G.E.B. – Yes, it’s fun.
—
I meant something like this: [unable to compile a program because it fails type checking (for whatever reason)] is equivalent to saying that [you have been unable to prove at least that part of the validity of the program which is enforced by the type system].
—
If a program is indeed valid (= checked by a man) but it fails to compile (by a machine) in programming language X, then the type-system has to be extended to accommodate for this program. In other words, an “elaborate” type-system has to be extensible.
January 19, 2009 at 8:26 am
Hamlet D’Arcy wrote: “For instance, I can write a correct string manipulation function but in some cases the type inference engine sees ambiguity in the types and I need to add annotations. I see this as a programming language type system disallowing a valid programming.”
In order for me to understand what you are saying I need to resolve the following question:
1. Is that some form of type-casting in the sense “int x = fn(); struct S *y = (struct S*)x;” as in C/C++,
2. or are you just helping the type-checker to finish a proof it cannot figure by itself?
If it is the latter, then I wouldn’t call it “type system disallowing a valid programming”. Some people may interpret it as “the type system is enabling valid programming”, or they may call it “a future-proof type system”, or “a form of an extensible type system”.
January 23, 2009 at 1:22 am
1. Type inference.
2. Can you have a List of @Nullable Strings?
January 23, 2009 at 2:52 am
1. Whole program analysis? Think of a library function returning an object. Is the object @Nullable or @NonNull? You can’t tell unless you look into the implementation.
2. Strings are reference types, so @Nullable is the default. String s = null is valid.
January 23, 2009 at 5:32 am
@anonymous – the annotation is you helping the compiler figure out the type, not performing a conversion.
@Ricky – Yes you can have a list of @Nullable Strings.
@Bartosz – There is a mechanism to apply a “Nullable Configuration” file to a foreign API, marking externally the Nullability of methods. So to create the file you do need to look in the implementation, but once created you can easily integrate 3rd party libraries that don’t use the annotations into yours which does.
It seems like some of the anonymous posts are trolls.
January 23, 2009 at 11:42 am
Ricky’s comment was very terse, so I assumed that he meant pure type inference, without any annotations. If you allow partial annotations, you may run a special tool over your code that will infer the remaining types. Of course, it will do it conservatively.
February 4, 2009 at 2:53 pm
In my humble opinion, the Annotations just add unnecessary complexity to the language.
The elegance and simplicity that was Java originally, is slowly fading away.
I am willing to put up with Templating (arrived in Java 1.5 from C++ and others), as it is very valuable to actually helping with type safety and ultimately simplifying the code. But Annotations are really there for the nasty and ugly thing that is Code Generators (like EJB Compilers, etc.).
The end result is that it’s becoming increasingly more complex to even compile Java. Since you need to put together Annotations add-ons and further put together EJB compilers, etc.
About @NonNull — declaring something as “NonNull” is only going to force developer who is about to activate “non-null” either extra checking for null (which is usually unnecessary) or do some worse perversion, not resulting in anything good.
If you are coding:
String someStr;
someStr.find(“something”);
Then most modern editors (Eclipse, etc) will immediately alert you to your silliness. No Annotations needed…
February 4, 2009 at 7:30 pm
The beauty of annotations is that they are optional. In fact, even in D, the use of const or immutable in the client program is mostly optional. A non-annotated program will compile just fine.
There are some clients who are very interested in using NonNull for better reliability. Wall Street firms were among them (maybe not any more 😉
February 5, 2009 at 6:38 pm
I guess I just think that:
NonNull != Reliability
But it’s one of those formulas that is really hard to prove 🙂
February 5, 2009 at 6:58 pm
Well, don’t you think that Eclipse flagging null dereference increases the reliability of your programs? NonNull serves the same purpose but can cover more cases because it can be propagated across method call boundaries.
February 5, 2009 at 7:16 pm
Why not just copy the more advanced languages with decent type systems?
http://functionaljava.googlecode.com/svn/artifacts/2.17/javadoc/fj/data/Option.html
February 5, 2009 at 7:29 pm
If there already is a perfect language then instead of copying it, why not just use it? Unfortunately there is no consensus as to which solutions are the best, so we have to discuss the pros and cons and pick the best fit for a given language. I’m trying not to take sides, just present the options.
February 5, 2009 at 7:53 pm
There is not a consensus that there is a perfect language, but there is certainly consensus that Java is inferior and impractical in every possible aspect.
I don’t use it, unless I’m writing for Functional Java – which exists simply to help make Java somewhat useful in the event that irrational authoritarianism forces me to use it. It is also apparently useful for many others who are in this position – thankfully I am not.
I hate knowing Java back-to-front. You see it for what it really is.