Chapter 14. Improving performance

14.1. Understanding Collection performance

We've already spent quite some time talking about collections. In this section we will highlight a couple more issues about how collections behave at runtime.

14.1.1. Taxonomy

NHibernate defines three basic kinds of collections:

  • collections of values

  • one to many associations

  • many to many associations

This classification distinguishes the various table and foreign key relationships but does not tell us quite everything we need to know about the relational model. To fully understand the relational structure and performance characteristics, we must also consider the structure of the primary key that is used by NHibernate to update or delete collection rows. This suggests the following classification:

  • indexed collections

  • sets

  • bags

All indexed collections (maps, lists, arrays) have a primary key consisting of the <key> and <index> columns. In this case collection updates are usually extremely efficient - the primary key may be efficiently indexed and a particular row may be efficiently located when NHibernate tries to update or delete it.

Sets have a primary key consisting of <key> and element columns. This may be less efficient for some types of collection element, particularly composite elements or large text or binary fields; the database may not be able to index a complex primary key as efficently. On the other hand, for one to many or many to many associations, particularly in the case of synthetic identifiers, it is likely to be just as efficient. (Side-note: if you want SchemaExport to actually create the primary key of a <set> for you, you must declare all columns as not-null="true".)

Bags are the worst case. Since a bag permits duplicate element values and has no index column, no primary key may be defined. NHibernate has no way of distinguishing between duplicate rows. NHibernate resolves this problem by completely removing (in a single DELETE) and recreating the collection whenever it changes. This might be very inefficient.

Note that for a one-to-many association, the "primary key" may not be the physical primary key of the database table - but even in this case, the above classification is still useful. (It still reflects how NHibernate "locates" individual rows of the collection.)

14.1.2. Lists, maps and sets are the most efficient collections to update

From the discussion above, it should be clear that indexed collections and (usually) sets allow the most efficient operation in terms of adding, removing and updating elements.

There is, arguably, one more advantage that indexed collections have over sets for many to many associations or collections of values. Because of the structure of an ISet, NHibernate doesn't ever UPDATE a row when an element is "changed". Changes to an ISet always work via INSERT and DELETE (of individual rows). Once again, this consideration does not apply to one to many associations.

After observing that arrays cannot be lazy, we would conclude that lists, maps and sets are the most performant collection types. (With the caveat that a set might be less efficient for some collections of values.)

Sets are expected to be the most common kind of collection in NHibernate applications.

There is an undocumented feature in this release of NHibernate. The <idbag> mapping implements bag semantics for a collection of values or a many to many association and is more efficient that any other style of collection in this case!

14.1.3. Bags and lists are the most efficient inverse collections

Just before you ditch bags forever, there is a particular case in which bags (and also lists) are much more performant than sets. For a collection with inverse="true" (the standard bidirectional one-to-many relationship idiom, for example) we can add elements to a bag or list without needing to initialize (fetch) the bag elements! This is because IList.Add() or IList.AddRange() must always succeed for a bag or IList (unlike a Set). This can make the following common code much faster.

Parent p = (Parent) sess.Load(typeof(Parent), id);
    Child c = new Child();
    c.Parent = p;
    p.Children.Add(c);  //no need to fetch the collection!
    sess.Flush();

14.1.4. One shot delete

Occasionally, deleting collection elements one by one can be extremely inefficient. NHibernate isn't completly stupid, so it knows not to do that in the case of an newly-empty collection (if you called list.Clear(), for example). In this case, NHibernate will issue a single DELETE and we are done!

Suppose we add a single element to a collection of size twenty and then remove two elements. NHibernate will issue one INSERT statement and two DELETE statements (unless the collection is a bag). This is certainly desirable.

However, suppose that we remove eighteen elements, leaving two and then add thee new elements. There are two possible ways to proceed

  • delete eighteen rows one by one and then insert three rows

  • remove the whole collection (in one SQL DELETE) and insert all five current elements (one by one)

NHibernate isn't smart enough to know that the second option is probably quicker in this case. (And it would probably be undesirable for NHibernate to be that smart; such behaviour might confuse database triggers, etc.)

Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding (ie. dereferencing) the original collection and returning a newly instantiated collection with all the current elements. This can be very useful and powerful from time to time.

We have already shown how you can use lazy initialization for persistent collections in the chapter about collection mappings. A similar effect is achievable for ordinary object references, using proxies. We have also mentioned how NHibernate caches persistent objects at the level of an ISession. More aggressive caching strategies may be configured upon a class-by-class basis.

In the next section, we show you how to use these features, which may be used to achieve much higher performance, where necessary.

14.2. Proxies for Lazy Initialization

NHibernate implements lazy initializing proxies for persistent objects using runtime IL generation (via the excellent Castle.DynamicProxy library).

The mapping file declares a class or interface to use as the proxy interface for that class. The recommended approach is to specify the class itself:

<class name="Eg.Order" proxy="Eg.Order">

The runtime type of the proxies will be a subclass of Order. Note that the proxied class must implement a default constructor with at least protected visibility and that all methods, properties and events of the class should be declared virtual.

There are some gotchas to be aware of when extending this approach to polymorphic classes, eg.

<class name="Eg.Cat" proxy="Eg.Cat">
    ......
    <subclass name="Eg.DomesticCat" proxy="Eg.DomesticCat">
        .....
    </subclass>
</class>

Firstly, instances of Cat will never be castable to DomesticCat, even if the underlying instance is an instance of DomesticCat.

Cat cat = (Cat) session.Load(typeof(Cat), id);  // instantiate a proxy (does not hit the db)
if ( cat.IsDomesticCat ) // hit the db to initialize the proxy
{                  
    DomesticCat dc = (DomesticCat) cat;       // Error!
    ....
}

Secondly, it is possible to break proxy ==.

Cat cat = (Cat) session.Load(typeof(Cat), id);            // instantiate a Cat proxy
DomesticCat dc = 
    (DomesticCat) session.Load(typeof(DomesticCat), id);  // required new DomesticCat proxy!
Console.Out.WriteLine(cat==dc);                            // false

However, the situation is not quite as bad as it looks. Even though we now have two references to different proxy objects, the underlying instance will still be the same object:

cat.Weight = 11.0;  // hit the db to initialize the proxy
Console.Out.WriteLine( dc.Weight );  // 11.0

Third, you may not use a proxy for a sealed class or a class with any sealed or non-virtual methods.

Finally, if your persistent object acquires any resources upon instantiation (eg. in initializers or default constructor), then those resources will also be acquired by the proxy. The proxy class is an actual subclass of the persistent class.

These problems are all due to fundamental limitations in .NET single inheritance model. If you wish to avoid these problems your persistent classes must each implement an interface that declares its business methods. You should specify these interfaces in the mapping file. eg.

<class name="Eg.Cat" proxy="Eg.ICat">
    ......
    <subclass name="Eg.DomesticCat" proxy="Eg.IDomesticCat">
        .....
    </subclass>
</class>

where Cat implements the interface ICat and DomesticCat implements the interface IDomesticCat. Then proxies for instances of Cat and DomesticCat may be returned by Load() or Enumerable(). (Note that Find() does not return proxies.)

ICat cat = (ICat) session.Load(typeof(Cat), catid);
IEnumerable en = session.Enumerable("from cat in class Eg.Cat where cat.Name='fritz'");
en.MoveNext();
ICat fritz = (ICat) en.Current;

Relationships are also lazily initialized. This means you must declare any properties to be of type ICat, not Cat.

Certain operations do not require proxy initialization

  • Equals(), if the persistent class does not override Equals()

  • GetHashCode(), if the persistent class does not override GetHashCode()

  • The identifier getter method (if the class does not use a custom accessor for the identifier property)

NHibernate will detect persistent classes that override Equals() or GetHashCode().

Exceptions that occur while initializing a proxy are wrapped in a LazyInitializationException.

Sometimes we need to ensure that a proxy or collection is initialized before closing the ISession. Of course, we can alway force initialization by calling cat.Sex or cat.Kittens.Count, for example. But that is confusing to readers of the code and is not convenient for generic code. The static methods NHibernateUtil.Initialize() and NHibernateUtil.IsInitialized() provide the application with a convenient way of working with lazyily initialized collections or proxies. NHibernateUtil.Initialize(cat) will force the initialization of a proxy, cat, as long as its ISession is still open. NHibernateUtil.Initialize( cat.Kittens ) has a similar effect for the collection of kittens.

14.3. Using batch fetching

NHibernate can make efficient use of batch fetching, that is, NHibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization for the lazy loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.

Batch fetching for classes/entities is easier to understand. Imagine you have the following situation at runtime: You have 25 Cat instances loaded in an ISession, each Cat has a reference to its Owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and get the Owner of each, NHibernate will by default execute 25 SELECT statements, to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:

<class name="Person" lazy="true" batch-size="10">...</class>

NHibernate will now execute only three queries, the pattern is 10, 10, 5. You can see that batch fetching is a blind guess, as far as performance optimization goes, it depends on the number of unitilized proxies in a particular ISession.

You may also enable batch fetching of collections. For example, if each Person has a lazy collection of Cats, and 10 persons are currently loaded in the ISesssion, iterating through all persons will generate 10 SELECTs, one for every read of Person.Cats. If you enable batch fetching for the Cats collection in the mapping of Person, NHibernate can pre-fetch collections:

<class name="Person">
    <set name="Cats" lazy="true" batch-size="3">
        ...
    </set>
</class>

With a batch-size of 3, NHibernate will load 3, 3, 3, 1 collections in 4 SELECTs. Again, the value of the attribute depends on the expected number of uninitialized collections in a particular ISession.

Batch fetching of collections is particularly useful if you have a nested tree of items, ie. the typical bill-of-materials pattern.

14.4. The Second Level Cache

A NHibernate ISession is a transaction-level cache of persistent data. It is possible to configure a cluster or process-level (ISessionFactory-level) cache on a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be careful. Caches are never aware of changes made to the persistent store by another application (though they may be configured to regularly expire cached data). In NHibernate 1.0 second level cache does not work correctly in combination with distributed transactions.

By default, NHibernate uses HashtableCache for process-level caching. You may choose a different implementation by specifying the name of a class that implements NHibernate.Cache.ICacheProvider using the property hibernate.cache.provider_class.

Table 14.1. Cache Providers

CacheProvider classTypeCluster SafeQuery Cache Supported
Hashtable (not intended for production use)NHibernate.Cache.HashtableCacheProvidermemory yes
ASP.NET Cache (System.Web.Cache)NHibernate.Caches.SysCache.SysCacheProvider, NHibernate.Caches.SysCachememory yes
Prevalence CacheNHibernate.Caches.Prevalence.PrevalenceCacheProvider, NHibernate.Caches.Prevalencememory, disk yes

14.4.1. Cache mappings

The <cache> element of a class or collection mapping has the following form:

<cache 
    usage="read-write|nonstrict-read-write|read-only"                (1)
/>
(1)

usage specifies the caching strategy: read-write, nonstrict-read-write or read-only

Alternatively (preferrably?), you may specify <class-cache> and <collection-cache> elements in hibernate.cfg.xml.

The usage attribute specifies a cache concurrency strategy.

14.4.2. Strategy: read only

If your application needs to read but never modify instances of a persistent class, a read-only cache may be used. This is the simplest and best performing strategy. Its even perfectly safe for use in a cluster.

<class name="Eg.Immutable" mutable="false">
    <cache usage="read-only"/>
    ....
</class>

14.4.3. Strategy: read/write

If the application needs to update data, a read-write cache might be appropriate. This cache strategy should never be used if serializable transaction isolation level is required. If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation supports locking. The built-in cache providers do not.

<class name="eg.Cat" .... >
    <cache usage="read-write"/>
    ....
    <set name="kittens" ... >
        <cache usage="read-write"/>
        ....
    </set>
</class>

14.4.4. Strategy: nonstrict read/write

If the application only occasionally needs to update data (ie. if it is extremely unlikely that two transactions would try to update the same item simultaneously) and strict transaction isolation is not required, a nonstrict-read-write cache might be appropriate.

The following table shows which providers are compatible with which concurrency strategies.

Table 14.2. Cache Concurrency Strategy Support

Cacheread-onlynonstrict-read-writeread-write 
Hashtable (not intended for production use)yesyesyes 
SysCacheyesyesyes 
PrevalenceCacheyesyesyes 

Refer to Chapter 20, NHibernate.Caches for more details.

14.5. Managing the ISession Cache

Whenever you pass an object to Save(), Update() or SaveOrUpdate() and whenever you retrieve an object using Load(), Find(), Enumerable(), or Filter(), that object is added to the internal cache of the ISession. When Flush() is subsequently called, the state of that object will be synchronized with the database. If you do not want this synchronization to occur or if you are processing a huge number of objects and need to manage memory efficiently, the Evict() method may be used to remove the object and its collections from the cache.

IEnumerable cats = sess.Enumerable("from Eg.Cat as cat"); //a huge result set
foreach( Cat cat in cats )
{
    DoSomethingWithACat(cat);
    sess.Evict(cat);
}

NHibernate will evict associated entities automatically if the association is mapped with cascade="all" or cascade="all-delete-orphan".

The ISession also provides a Contains() method to determine if an instance belongs to the session cache.

To completely evict all objects from the session cache, call ISession.Clear()

For the second-level cache, there are methods defined on ISessionFactory for evicting the cached state of an instance, entire class, collection instance or entire collection role.

14.6. The Query Cache

Query result sets may also be cached. This is only useful for queries that are run frequently with the same parameters. To use the query cache you must first enable it by setting the property hibernate.cache.use_query_cache=true. This causes the creation of two cache regions - one holding cached query result sets (NHibernate.Cache.IQueryCache), the other holding timestamps of most recent updates to queried tables (NHibernate.Cache.UpdateTimestampsCache). Note that the query cache does not cache the state of any entities in the result set; it caches only identifier values and results of value type. So the query cache is usually used in conjunction with the second-level cache.

Most queries do not benefit from caching, so by default queries are not cached. To enable caching, call IQuery.SetCacheable(true). This call allows the query to look for existing cache results or add its results to the cache when it is executed.

If you require fine-grained control over query cache expiration policies, you may specify a named cache region for a particular query by calling IQuery.SetCacheRegion().

IList blogs = sess.CreateQuery("from Blog blog where blog.Blogger = :blogger")
    .SetEntity("blogger", blogger)
    .SetMaxResults(15)
    .SetCacheable(true)
    .SetCacheRegion("frontpages")
    .List();

If the query should force a refresh of its query cache region, you may call IQuery.SetForceCacheRefresh() to true. This is particularly useful in cases where underlying data may have been updated via a seperate process (i.e., not modified through NHibernate) and allows the application to selectively refresh the query cache regions based on its knowledge of those events. This is an alternative to eviction of a query cache region. If you need fine-grained refresh control for many queries, use this function instead of a new region for each query.