/dev/null

Cameron Purdy
Friday Jul 23, 2010

It's Maps all the way down

From "A Brief History of Time":

A well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast collection of stars called our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: "What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise." The scientist gave a superior smile before replying, "What is the tortoise standing on?" "You're very clever, young man, very clever", said the old lady. "But it's turtles all the way down!"

Coherence uses a analogous architecture, but replace "turtle" in the above story with "Java Map interface". For example, a common use case is to cache a limited amount of data in an application server tier, backed by an In-Memory Data Grid (IMDG). The topology for this use case is referred to as "size-limited near caching with a partitioned cache"; in other words, the application server uses a near cache to keep recently- and commonly-used data locally in memory, while the entire data set (up to a few terabytes) is cached by the In-Memory Data Grid. Coherence, as an IMDG, automatically partitions all of the data across the available cache servers (aka storage-enabled nodes).

In this scenario, the cache that is being used on the application server is a Java Map, implemented by Near Cache. The Near Cache is backed by two Java Maps: The first is an in-memory size-limited Java Map (where the "near" data gets stored), implemented by Local Cache, and the second is the partitioned Java Map, implemented by Partitioned Cache Service. The partitioned Map is actually a thin ClassLoader-aware veneer that handles serialization and deserialization, and delegates to an underlying binary-only Java Map, also implemented by Partitioned Cache Service. The binary implementation uses TCMP messaging to communicate across the cluster to the server(s) that own the partition(s) involved with a given client-side operation. When an operation request is received by one of those servers, it is translated into an invocation against a member-specific Java Map, implemented by the Storage module of the Partitioned Cache Service. That invocation is then typically delegated to a multiplexing Java Map, implemented by a Partition-Aware Backing Map (PABM). The invocation is then delivered to an actual data-managing Java Map, which is the backing map that was configured for that cache. The backing map may in turn be composed of a sequence and/or tree of delegating maps, such as when the cache is backed by a database; in this case, the backing map is a database-aware Java Map, implemented by Read/Write Backing Map (RWBM). The Read/Write Backing Map will store cache data in memory by delegating to a configured backing Java Map, implemented for example by a Local Cache or a thread-safe Hash Map. (There are even more turtles than this, since there are fault-tolerant "safe" layers, the option for client/server proxy layers, client- and subject-specific security layers, and "backup" redundancy layers.)

All in all, Coherence contains roughly 80 Map extensions and implementations, and a few dozen more in the unit and functional tests packages. Many of the early Map implementations in Coherence were based on the AbstractMap implementation that was part of the Java Collections framework, but AbstractMap turned out to be a rather poor base class for Map implementations for a number of reasons, including:

  • At first glance, AbstractMap implements everything except for one method. Unfortunately, that one method is entrySet(), which means that instead of implementing a Map of keys to values (which makes sense), you have to implement a Set of key/value ("Entry") pairs (which doesn't typically make sense).

  • In most cases, you manage the internal data structures by key, not by entry, so you end up overriding the keySet() implementation with a custom key set, and writing an entrySet() implementation that delegates back to the Map methods themselves or to the key set.

  • The AbstractMap method implementations all delegate to the Entry Set, so simple operations such as isEmpty(), size(), remove and even get() iterate through the entries -- typically, in most cases iterating through all of the entries! To create an efficient Map implementation, you thus have to override most of the concrete implementations from AbstractMap. Furthermore, if you implement the entrySet() as describe above, the Map methods must be overridden to avoid infinite recursion.

As a result, in many cases AbstractMap ends up providing only a handy implementation of hashCode(), toString() and equals(). To help address this, Coherence provides two abstract Map implementations:

  • When a Map knows all of its keys, i.e. when a Map has a data structure that contains all of its keys, then it should extend AbstractKeySetBasedMap.

  • When a Map doesn't hold all of its keys in a data structure, then it should extend AbstractKeyBasedMap. A good example of this is when the keys and values are stored on disk or in another system, or in another Map that is delegated to, and thus the delegating Map implementation does not have to store all of the keys redundantly.

In both cases, the keySet(), entrySet() and values() methods are implemented by the abstract class, and typically do not have to be overridden. To override any of them, there is a standard inner class factory pattern that allows sub-classes to override the KeySet, EntrySet and ValuesCollection inner classes, for example:

protected Set> instantiateEntrySet()

Like AbstractMap, a number of methods are implemented, but should generally be overridden for efficiency purposes. For example, implementations extending AbstractKeyBasedMap should implement and/or override the following methods:

/**
* Create an iterator over the keys in this Map. The Iterator must
* support remove() if the Map supports removal.
*
* @return a new instance of an Iterator over the keys in this Map
*/
protected abstract Iterator<K> iterateKeys();

/**
* Returns the value to which this map maps the specified key.
*
* @param oKey  the key object
*
* @return the value to which this map maps the specified key,
*         or null if the map contains no mapping for this key
*/
public abstract V get(Object oKey);

/**
* Associates the specified value with the specified key in this map.
*
* @param oKey    key with which the specified value is to be associated
* @param oValue  value to be associated with the specified key
*
* @return previous value associated with specified key, or <tt>null</tt>
*         if there was no mapping for key
*/
public V put(K oKey, V oValue)
    {
    throw new UnsupportedOperationException();
    }

/**
* Removes the mapping for this key from this map if present.
*
* @param oKey key whose mapping is to be removed from the map
*
* @return previous value associated with specified key, or <tt>null</tt>
*         if there was no mapping for key.  A <tt>null</tt> return can
*         also indicate that the map previously associated <tt>null</tt>
*         with the specified key, if the implementation supports
*         <tt>null</tt> values.
*/
public V remove(Object oKey)
    {
    throw new UnsupportedOperationException();
    }

/**
* Returns the number of key-value mappings in this map.
*
* @return the number of key-value mappings in this map
*/
public int size()
    {
    // this begs for sub-class optimization
    int c = 0;
    for (Iterator iter = iterateKeys(); iter.hasNext(); )
        {
        iter.next();
        ++c;
        }
    return c;
    }

/**
* Returns <tt>true</tt> if this map contains a mapping for the specified
* key.
*
* @return <tt>true</tt> if this map contains a mapping for the specified
*         key, <tt>false</tt> otherwise.
*/
public boolean containsKey(Object oKey)
    {
    // this begs for sub-class optimization
    for (Iterator iter = iterateKeys(); iter.hasNext(); )
        {
        if (equals(oKey, iter.next()))
            {
            return true;
            }
        }
    return false;
    }

Additionally, depending on the implementation, it may make sense to override a few additional methods:

/**
* Clear all key/value mappings.
*/
public void clear()
    {
    // this begs for sub-class optimization
    for (Iterator iter = iterateKeys(); iter.hasNext(); )
        {
        iter.next();
        iter.remove();
        }
    }

/**
* Returns <tt>true</tt> if this map contains no key-value mappings.
*
* @return <tt>true</tt> if this map contains no key-value mappings
*/
public boolean isEmpty()
    {
    return size() == 0;
    }

/**
* Removes the mapping for this key from this map if present. This method
* exists to allow sub-classes to optmiize remove functionalitly for
* situations in which the original value is not required.
*
* @param oKey key whose mapping is to be removed from the map
*
* @return true iff the Map changed as the result of this operation
*/
protected boolean removeBlind(Object oKey)
    {
    if (containsKey(oKey))
        {
        remove(oKey);
        return true;
        }
    else
        {
        return false;
        }
    }

In fact, many of those methods are overridden by AbstractKeySetBasedMap, which extends AbstractKeyBasedMap. Implementations extending AbstractKeySetBasedMap still have to implement or override get(), put() and remove, and may chose to override clear() and removeBlind(), but do not have to override iterateKeys(), containsKey(), isEmpty() or size(). There are two new methods that do need to be implemented or overridden:

/**
* Obtain a set of keys that are represented by this Map.
* <p/>
* The AbstractKeySetBasedMap only utilizes the internal key set as a
* read-only resource.
*
* @return an internal Set of keys that are contained by this Map
*/
protected abstract Set<K> getInternalKeySet();

/**
* Determine if this Iterator should remove an iterated item by calling
* remove on the internal key Set Iterator, or by calling removeBlind on
* the map itself.
*
* @return true to remove using the internal key Set Iterator or false to
*         use the {@link AbstractKeyBasedMap#removeBlind(Object)} method
*/
protected boolean isInternalKeySetIteratorMutable()
    {
    return false;
    }

The isInternalKeySetIteratorMutable() method allows for an optimization to be made in instantiateKeyIterator() with respect to providing an iterator over the keys of the Map:

/**
* Factory pattern: Create a mutable Iterator over the keys in the Map
*
* @return a new instance of Iterator that iterates over the keys in the
*         Map and supports element removal
*/
protected Iterator instantiateKeyIterator()
    {
    Iterator iter = getInternalKeySet().iterator();
    if (!isInternalKeySetIteratorMutable())
        {
        iter = new KeyIterator(iter);
        }
    return iter;
    }

Remember, this method was required by the AbstractKeyBasedMap super class; since the AbstractKeySetBasedMap knows all of its keys, it can implement this method by delegating to the "internal" key set.

These two abstract implementations serve as the basis for over 20 Map implementations in Coherence, and when they were introduced, their use (in the place of AbstractMap) resulted in the elimination of thousands of lines of code.

Next up: The purpose of Binary, and the various "serialization" Maps.

Comments:

Post a Comment:
Comments are closed for this entry.

Archives
Links
Referrers

The views expressed on this blog are my own and do not necessarily reflect the views of my employer.
Content copyright 2002, 2003, 2004, 2005, 2006, 2007 by Cameron Purdy. All rights reserved.