You are on page 1of 5

Fast Fail Iterators

What is a mysterious field called modCount doing in the List Iterator classes? How can we
efficiently handle iterating over collections when another thread may be updating the
collection at the same time?

AbstractList

The AbstractList class in the java.util package is normally only examined by


developers intending to implement a List class. AbstractList is an abstract
class, which means it is a class, which provides a partial implementation of
some type of behavior, and is designed to be extended by subclasses,
which will provide the complete implementation. In particular, AbstractList
provides a skeletal implementation of the java.util.List interface, and is
designed for subclasses which will be backed by a random access data
structure such as an array (e.g. Object[]). ArrayList is one such subclass of
AbstractList.

Iterators

One service provided by AbstractList is the Iterator, or rather the Iterators,


which are required by the java.util.List interface. The java.util.List interface
specifies three methods which return Iterators:

Iterator iterator();
ListIterator listIterator();
ListIterator listIterator(int index);

AbstractList provides implementations for all three Iterator methods, and


also provides class implementations for the Iterator objects. One aspect of
these Iterator classes is of particular interest to us, a mysterious field called
modCount. The following behavior is described in the AbstractList
comments:

[Talking about modCount]


* If the value of this field changes unexpectedly, the iterator (or list
* iterator) will throw a ConcurrentModificationException in
* response to the next, remove, previous,
* set or add operations. This provides
* fail-fast behavior, rather than non-deterministic behavior in
* the face of concurrent modification during iteration.
How does this work, and why is it useful?

Concurrency Problems

The issue under consideration is iterating through a collection while that


collection may be being changed by another thread. This is a concurrency
issue. In particular, what happens if thread 1 is iterating through a
collection, and thread 2 changes the size of the collection at the same time,
by adding or deleting an element. This is a classic concurrency problem,
which can result in one of four scenarios.

Corruption

The first scenario is to just do nothing about this concurrency problem. To


frame this in terms of a database analogy, it would be the equivalent of
performing updates to your database outside of the bounds of a
transaction. If you are lucky, the iteration in thread 1 will occur before or
after the collection update in thread 2, and nothing bad happens. However
you obviously have a race condition, and it is likely that at some point the
iteration will occur concurrently with the update. Then the outcome is
undefined, usually leading to either a data corruption or some undefined
error. For example, if one were to perform a concurrent delete, this may
result in the Iterator trying to access the element beyond the end of the
now shorter collection, causing a IndexOutOfBoundsException. The other
case to consider is a concurrent addition, which may result in one element
of the collection being missed out over the iteration. It also equally likely
that other problems could occur, such as a NullPointerException due to the
collection being resized requiring a new underlying array to be created and
copied into. Normally, this "do nothing about it" scenario is not an
acceptable solution. Returning to our analogy, current database technology
imposes the constraint that one cannot update the database outside of the
bounds of a transaction. Instead, it requires us to lock the resources while
the update is being performed.

Locking The Collection

Just as databases lock tables and rows while performing updates, we could
lock the collection while we were iterating over it to solve the concurrent
access/update problem. In this scenario, we prevent any thread from
changing the collection while iteration takes place, thus guaranteeing
predictable behavior. So, we might use a Vector object for our collection
since that class has its accessors and updators synchronized on the Vector
object itself. Then your iteration can lock the Vector simply:

public ... search (Vector list)


{
//Synchronizing on the Vector itself ensures
//that no other threads can use the Vector
//accessors or updators while we are in the
//code block
synchronized(list)
{
ListIterator iterator = list.listIterator();
while(iterator.hasNext())
{
Object o = iterator.next();
//do something
...
}
}
}

By taking a pessimistic approach, we have guaranteed that we will avoid


any corruption of the collection and that the iteration is consistent.
However, there is a cost to this scenario. First we have to use a
synchronized collection, which has a slightly higher overhead than a non-
synchronized collection. And secondly, and much more importantly for
performance, the threads are serialized for access and update to the
collection, including for the duration of this and any other synchronized
iterations. Each thread trying to update, access, or iterate over the
collection must wait for any other thread doing so to finish before it can
start. In the worst-case scenario, this would convert a multi-threaded
program into an effectively single-threaded program, independently of the
number of CPUs available.

Copy On Write

A third scenario is to use a List class like the CopyOnWriteArrayList class,


due to be available in the java.util.concurrent package of the 1.5
distributions. In this case, the Iterator carries on iterating over the
previous version of the list, while the updater gets to use a new version of
the list. In database parlance, these are optimistic transactions; each
session gets to carry on it's transaction as if the other didn't run, though
the Iterator is now using out-of-date information (in a transaction one
session would fail to commit its transaction). This solution can carry a
heavy performance overhead from the copying, especially if there are
many concurrency conflicts.

Fast Fail

The fourth scenario to handle the concurrency issue is to (optimistically)


assume that multiple threads with access to the collection will mostly not
encounter concurrent update problems. In this case, locking the collection
is an unnecessary overhead. Instead, what you can do is just catch those
few occasions when concurrent access to the collection is an issue, and
handle those cases specially. This is the tactic that AbstractList takes with
its Iterators.
AbstractList includes an instance variable called modCount (short for
"modification count"). Every time any change is made to the AbstractList
object, the modCount is incremented. (Subclasses of AbstractList are
expected to adhere to this contract, though it cannot be enforced.) The
AbstractList Iterators retain the value of modCount that existed when they
are created, and check at every access and update whether that modCount
variable has been changed from that creation time value. If the modCount
variable is changed, then the Iterator throws a
java.util.ConcurrentModificationException. The iteration code looks like
this:

public ... search (List list)


{
try
{
ListIterator iterator = list.listIterator();
while(iterator.hasNext())
{
Object o = iterator.next();
//do something
...
}
}
catch (ConcurrentModificationException concEx)
{
//handle this case, maybe just iterate the collection again
//e.g. return search(list);
}
}

This scenario has a very small overhead in the additional cost of updating
the modCount variable on any change to the collection, and comparing the
modCount variable for each iteration. This should be negligible in most
cases.

Optimistic Vs. Pessimistic

The two alternative scenarios of locking the collection or throwing an


exception on detecting concurrent access/update are special cases of the
more generic pessimistic transaction vs. optimistic transactions scenarios.
Pessimistic transactions lock the resources they need to ensure that they
can complete their activity without errors, at the cost of making all other
transactions wait for the resources to be freed. Optimistic transactions
avoid locking resources, detecting if conflicts have occurred which require
rolling back and re-applying the transaction.
Optimistic transactional processing is usually more efficient where reads
predominate over writes; pessimistic transactional processing is usually
more efficient where writes predominate over reads. In the case of
Iterators, if you are unlikely to have an update occur during the iteration,
using the "fast fail" scenario is more efficient. But if updates often occur
during iteration of the collection, then using "fast fail" would end up
producing many ConcurrentModificationExceptions and also require the
iteration to be re-run many times. In this situation locking the collection is
usually more efficient.
It is also possible to mix pessimistic and optimistic modes, by locking some
resources but not others, and this is a way of fine-tuning an application. In
the scenarios we described in this article, that mixed mode option is not
available, but many concurrency conflicts are more complicated and can
benefit from a mixed mode approach.
So there you have it, an interesting technique in which we can use an
optimistic approach to solve a concurrency issue. Although you have to be
careful when using optimistic approaches to these types of problems, done
right, they can radically improve the scalability characteristics of your
application. In this instance, we have seen how some clever coding has
resulted in one being able to allow safe concurrent access to Iterators
without forcing that access to be mutually exclusive.

You might also like