Sunday, November 12, 2006

Essential Java Generics

Java Generics may seem quite intimidating but it is not really that hard. It is however important to remember a few simple rules.

Subtyping

The Liskov Substitution Principle does not apply for generic elements!
  • Integer is a subtype of Number.
  • Integer is a subtype of Comparable.
  • List is a subtype of Collection.
  • List<Integer> is a subtype of Collection<Integer>.
  • List<Integer> is not a subtype of List<Number>.
  • List<Integer> is not a subtype of List<Comparable>.
// Example why generic elements are not proper subtypes:
List<Integer> integers = new LinkedList<Integer>;
List<Number> numbers = integers;            // Won't compile!
numbers.add(3.14);                          // since integers cannot contain 3.14 


Wildcards

To loosen the constraint above wildcards may be used. Wildcards are used with the keywords extends and super.
  • <? extends Number> means all types that are subclasses of Number are allowed.
  • <? super Integer> means all types that are superclasses of Integer are allowed.

The Get and Put Principle: use extends only when you intend to get values out of a structure, use super only when you intend to put values into a structure.

This also implies: don’t use any wildcards when you intend to both get and put values into and out of a structure.

// Copy all elements, subclasses of T, from source to dest which contains elements that are superclasses of T.
public static <T> void copy(List<? super T> dest, List<? extends T> source) {
   for (int i = 0; i < source.size(); i++) {
       dest.set(i, source.get(i));
   }
}                     

// Extends wildcard violation
List<Integer> integers = new LinkedList<Integer>();
List<? extends Number> numbers = integers;   
numbers.get(i);                                 // Works fine!
numbers.add(3);                                 // Won't compile!

// Super wildcard violation
List<Number> numbers = new LinkedList<Number>();
List<? super Integer> integers = numbers;
numbers.add(3);                                 // Works fine!
int i = numbers.get(0);                         // Won't' compile!
Object o = numbers.get(0);                      // Works fine since object is the upper bound!

Additional to the above principle there are also a few restrictions on wildcards: Don’t use wilcards when creating instances, specifying generic method calls and extending superclasses:

List<?> integers = new LinkedList<?>();         // Won't compile!
List<?> integers = Lists.<?>factory();          // Won't compile!
class AnyList implements List<?> {}             // Won't compile!

Bounds

Bounds are used to make sure that generic parameters are of a specified subtype.

// The generic parameter of Query must extend (or implement) Entity and Entity must have a getId method!
public class Dao<T extends Entity> {  
   T createOrUpdate(T entity) {
       if (entity.getId() != null) {
           return update(entity);
       } else {
           return create(entity);
       }
   }
}    
// Using Dao
Dao<Person> personDao = new Dao<Person>(); // Works fine since Person extends Entity!
Dao<String> stringDao = new Dao<String>(); // Wont compile since String does not extend Entity!

Bounds may also be used in more advanced ways. The example below is a simplified version from java.util.Collections and show a recursive bound. The generic parameter T is also used inside the bound Comparable<T> to make sure that the objects contained in the collection are comparable amongst themselves.

// The method max takes a parameter which must contain elements of a subclass of Comparable.
// In addition the Comparable class must be comparable with the declared type
public static <T extends Comparable<T>> T max(Collection<T> collection) {
   T currentMax = collection.iterator().next();
   for (T element: collection) {
       if (currentMax.compareTo(element) < 0) currentMax = element;
   }                                                          
   return currentMax;
}

By far the most difficult generic declaration comes from java.lang.Enum and looks like this Class Enum<E extends Enum<E>> implements Comparable<E>. Like the above declaration this is a recursive bound but it is even more constrained than the above. The key to understanding this is to know how enums are implemented in Java.

// Declaring an enum
enum Tapir {MALAYAN, BRAZILIAN, BAIRD, MOUNTAIN}

// creates a class similar to this!
public final class Tapir extends Enum<Tapir> implements Comparable<Tapir> {                      
   private Tapir(String name, int ordinal) {super(name, ordinal)}
   public static final Tapir MALAYAN = new Tapir("MALAYAN", 0);
   public static final Tapir BRAZILIAN = new Tapir("BRAZILIAN", 1);
   public static final Tapir BAIRD = new Tapir("BAIRD", 2);
   public static final Tapir MOUNTAIN = new Tapir("MOUNTAIN", 3);
   private static final Tapir[] VALUES = {MALAYAN, BRAZILIAN, BAIRD, MOUNTAIN};
   public static Tapir[] values() {return VALUES.clone()}
   public static Tapir valueOf(String name) {
       for (Tapir t: VALUES) if t.getName().equals(name) return t;
       throw new IllegalArumentException();
   }
}

As can be seen in the code above E extends Enum<E> maps to Tapir extends Enum<Tapir> and Comparable<E> maps to Comparable<Tapir>. This makes sure that enums of one type can only be compared with enums of the same type. The innermost generic parameter makes the subclass’ type available to the superclass allowing it to define methods whose parameters and return values are that of the subclass’.

Generic parameters may also have multiple bounds. The signature of the simplified example above actually looks like this:

// Actual signature of max from Java Collections
public static <T extends Object & Comparable<? super T>> T max(Collection<? extends T> collection);

When multiple bounds appear the first bound is used for erasure and the reason for the Object in the signature above is that it makes the signature backwards compatible. The reason for the super and the extends are the same as above to make the method more generic.

Erasure

Java Generics is implemented by erasure. This means that the generic information is removed when the class is compiled:
  • The erasure of List<Integer>, List<String>, List<List<Integer>> is List.
  • The erasure of Comparable<? super T> is Comparable.
  • The erasure of Object & Comparable is the leftmost, Object.

Another thing to be aware of is that the implementation of generics with erasure forces the compiler to insert additional methods into the class files.

// Additional methods are compiled into generic classes
public interface Foo<T> {
   void foo(T param);
}

public class Bar implements Foo<Bar> {
                          
   // This method will appear twice once with Object as parameter and once with Bar.
    public void foo(Bar param) {}
   
   public static void main(String[] args) {
       for (Method m : Bar.class.getMethods())
           if (m.getName().startsWith("foo"))
               System.out.println(m.toGenericString());
   }
}
                              
$ java Bar
public void Bar.foo(Bar)
public volatile void Bar.foo(java.lang.Object)
                                                     

This can trip you up if you try to overload a method to accept Object as a parameter too. It has never been a good idea to overload with Object as well as a subclass of Object but now it will not even compile:

Error:line (6)name clash: foo(java.lang.Object) in Bar and foo(T) in Foo<Bar> have the same erasure, yet neither overrides the other

Compatibility

All in all the implementation of generics in Java is a wonderful example of craftsmanship. The solution is binary compatible both backward and forward allowing new code to use old libraries as well as allowing old code to use new libraries. I do, however, wish that they would have skipped the compatibility and made generic classes aware of what they are at runtime.

If you want to know more about generics I highly recommend the book Java Generics by Philip Wadler and Maurice Naftalin.

2 comments:

Unknown said...

Great stuff Anders! Very nice summary. Some of those things are still not 100% natural to me when encountering certain situations, so I tend to look up some of the specfics from time to time. The enum stuff I had forgotten about.
But one thing stilll confuses me. Generics are in the JVM 3 specification and there are methods to get some of that information, so some information is stored about them for sure.
See GenericSignatureFormatError, getGenericSuperclass, getGenericInterfaces from the Class class and ParameterizedType.

Anders Janmyr said...

The generic class knows that it is a generic class and has access to the information you mention.

The problem comes when you try to use an instance of a generic class.

List<String> strings = new ArrayList<String>();
assert strings instanceof ArrayList; // true
assert strings instanceof List; // true
assert strings instanceof List<?>; //true
assert strings instanceof List<String>; //compile-time error

The information that strings is a List of String is lost with the erasure. I miss the runtime information the most when dealing with OR frameworks that use class information to map classes to tables.