Getting ideal performance

Statement batching

Batching is an ability of DataObjects.NET to join sequences of individual SQL statements into batches, so each batch is sent as part of a single roundtrip to database server.

DataObjects.Net batch can be formed of the following “parts”:

Individual CRUD instructions (CReate, Update, Delete statements)
Future queries (prefetch API utilizes them as well)
Up to one regular query or fetch command (just because its result is necessary “right now”, i.e. there it is always either just one or absents in batch queue)

All these cases were shown in scenario described below (future queries example), so no separate batching example is necessary here.

Lazy loading

For loading data from a database into memory it’s handy to design things so that as you load an object of interest you also load the objects that are related to it. This makes loading easier on the developer using the object, who otherwise has to load all the objects he needs explicitly. However, if you take this to its logical conclusion, you reach the point where loading one object can have the effect of loading a huge number of related objects – something that hurts performance when only a few of the objects are actually needed. Lazy loading is, in fact, an agreement on interrupting this greedy loading process on certain relationships or properties. But to ensure the further availability of not yet loaded parts of the graph, availability markers inside each fetched object are maintained so that if not yet loaded data is requested, it would be loaded.

Default lazy loading behavior

DataObjects.Net supports lazy loading for all persistent properties except primary keys:

References to entity – persistent fields of type IEntity (and thus Entity) and its implementers;
EntitySets – persistent fields of type EntitySet<T> and its inheritors;
Structures – persistent fields where field type is inheritor of Structure;
Regular fields – persistent fields of primitive type such as int, string, DateTime, nullable types, arrays of primitive types, etc.

Note

References to entity and EntitySets are lazy load fields by default.

See the following example:

[HierarchyRoot]
public class Person : Entity
{
  [Key, Field]
  public int Id { get; private set; }

  [Field(Length = 200)]
  public string Name { get; set; }

  [Field]
  public DateTime BirthDay { get; set; }

  [Field(LazyLoad = true, Length = 65536)]
  public byte[] Photo { get; set; }

  [Field(LazyLoad = true, Length = 8192)]
  public byte[] Avatar { get; set; }

  [Field(LazyLoad = true)]
  public Address Address { get; set; }

  [Field]
  public Person Manager { get; private set; }

  [Field]
  [Association(PairTo = "Manager")]
  public EntitySet<Person> Employees { get; private set; }
}

public class Address : Structure
{
  [Field(Length = 60)]
  public string Street { get; set; }

  [Field(Length = 15)]
  public string City { get; set; }

  [Field(Length = 15)]
  public string Region { get; set; }

  [Field(Length = 10)]
  public string PostalCode { get; set; }

  [Field(Length = 15)]
  public string Country { get; set; }
}

In this example the following fields are subject of lazy loading:

Manager property is entity reference. It isn’t explicitly marked as lazy, so key of referenced Manager will be loaded by default, but Manager instance it identifies will be resolved only on attempt to read the value of this property.
Employees property is EntitySet. All EntitySets are lazy – nothing additional is loaded for each EntitySet by default. Even EntitySet instance is created on the first attempt to access it. More complete description of EntitySet loading behavior is provided below.
Address property is structure. Structures are simply aggregates containing sets of fields, they aren’t lazy by default, but in this case the whole aggregate is lazy. So none of its fields will be loaded by default.
Photo and Avatar are regular persistent fields. They’re marked as lazy here, so their values will be fetched on the first attempt to read them. By default they won’t A query returning Person objects won’t pull values of these properties by default.

Reference fields

Reference fields are internally represented as a set of fields (columns) forming the key of referenced object – so there is no reference to materialized Entity or its Key. Consequences:

We are able to materialize it at any moment: we must just create Key object and resolve it using Session.Query.SingleOrDefault<T>(Key key) method. Actually this is very close to actual process.
All the materialized objects aren’t tightly bound with each other, so garbage collector is able to reclaim the memory used by any of them, that isn’t directly referenced from your code. Of course, this isn’t always true – e.g. to make this work, default Session cache (SessionCacheType.LruWeak) must be used. This solution allows you to process unlimited sets of objects sequentially even in a single transaciton.

So marking a reference field as [Field(LazyLoad=true)] will lead to lazy loading of reference key (foreign key value); reference itself is anyway resolved lazily.

Thus marking a reference field as [Field(LazyLoad=true)] is not recommended. Reading additional value per each row is normally quite cheap (keys are usually short), but getting additional rountrip to database for it is not.

Collection properties (EntitySets)

EntitySet is intrinsically lazy:

If property of EntitySet type isn’t read accessed yet, EntitySet isn’t even created. So EntitySet creation is lazy.
EntitySet maintains its own state. EntitySet state answers on the following questions:
- Is EntitySet content loaded or not, fully or partially?
- What’s the count of items in EntitySet, is it known at all or not?
- What are known keys of contained entities? Does this set contain some specific key?
State of any created EntitySet is set to “nothing is loaded yet”.
If EntitySet state is “nothing is loaded yet”, any attempt to access its properties or methods, that must lead to partial state loading, leads to a request, that can be described as “try to load the state completely first”. Prcesily, EntitySet asks prefetcher to load up to 32 of items it must contain:
- if result contains less than 32 items, it’s clear that EntitySet state is now “fully loaded”, so any subsequent operations won’t lead to any fetches.
- Otherwise, its state is set to “partially loaded”, and fetched items will be added to its “known items” set. If loaded part of its state already contains the answer to requested operation (e.g. if user invoked .Contains method, and key it was looking for is located in “known items” set now), nothing additional will happen. Otherwise an additional request will be performed (e.g. loading of actual Count property value).
EntitySet enumeration leads to full state loading. So if enumeration is the first operation on a particular EntitySet in the current transaction, there will be no partial state loading attempt.
If a particular read operation on EntitySet has completed, its subsequent invocation in the same transaction won’t lead to database hit.

Note

Such a “greedy” partial state loading behavior is logical: DataObjects.Net is able to perform about 10K queries per second, buts materialization speed reaches 400K entities per second. This means there is almost no difference between fetching 1 entity or 40. That’s why DO4 tries to load more even you didn’t ask it for this: it does this because even if there are all 32 items, the cost of loading all of them will be similar to loading one. High probability of getting less than 32 items is another reason for doing this: we assume most of collections are either empty, or contain just few items, and only some of them are filled by hundreeds of items. So we get all the contents of most collections by a single query instead of torturing RDBMS by multiple ones. Note that partial state loading may expose negative effect on large collections: if some collection contains e.g. 1000 items, an attempt to read its Count property will first lead to partial state loading query, and then to a query fetching Count value. On the other hand, you can deal with this case using our prefetch API. Later we’ll offer more options allowing to control partial state loading behavior more precisely.

Marking a EntitySet field as [Field(LazyLoad=true)] is impossible: such fields are intrinsically lazy.

Regular or Structure properties

[Field(LazyLoad=true)] ensures undrelying columns behing such a property won’t be requested from database:

On fetches (Session.Query.Single<T>(...) method group)
On queries, if they aren’t explicitly selected in final projection. .Select(person => new {Person = person, Photo = person.Photo}) is an example of such selection.

If field marked by [Field(LazyLoad=true)] is not fetched, its value will be loaded on attempt to read it.

But it’s important to know that generally any field in DataObjects.Net can be actually loaded lazily, even if it isn’t marked as lazy loading field. This is related to inheritance:

If you query for Shape objects, DataObjects.Net requests only the fields declared in this type and its ancestors. So fields declared in Circle object (e.g. Radius) will not be loaded by default, even if particular fetched instance is Circle.
This is related to fetches as well. Methods from Session.Query.Single<T>(...) method group either fetch all the fields of exact type, if it’s known (by default DataObjects.Net keeps cached up to 16K types of entities it recently dealed with – it considers type of entity never changes during its lifetime). If it’s unknown, it fetches just fields of specified type T; the query it sends delivers a type discriminator as well, so actual entity type will also be cached.
Any attempt to load some field value will lead to a query fetching all non-lazy values that must be loaded in additional to requested value. So if you fetched Rectangle as Shape, attempt to read its Width property will lead to loading of both Width and Height values, if both were not available.
Prefetch API allows to load the complete state of sets of partially loaded entities much faster.

Let’s look at prefetch API usage example for persitent model we defined earlier:

var persons = session.Query.All<Person>()
  .Prefetch(p => p.Avatar)
  .Prefetch(p => p.Photo)
  .Prefetch(p => p.Employees);
foreach (var person in persons) {
  // Some code here...
}

As you see, here we make sure all the lazy properties including EntitySet are loaded before further processing using the way minimizing database chattiness (prefetch API).

Prefetch API

Prefetch API provides a way of grouping Entity or EntitySet state fetch request together to execute multiple of them during a single query or database server roundtrip. DataObjects.Net combines individual prefetch queries together and joining the resulting ones in batches (so it’s a part of generalized batching feature).

Prefetch to IQueryable<T>

Let’s see prefetch API in action:

[HierarchyRoot]
public class Person : Entity
{
  [Key, Field]
  public int Id { get; private set; }

  [Field(LazyLoad = true, Length = 65536)]
  public byte[] Photo { get; set; }

  [Field]
  public Person Manager { get; set; }

  [Field]
  [Association(PairTo = "Manager")]
  public EntitySet<Person> Employees { get; private set; }
  ...
}

var persons = session.Query.All<Person>()
  .Prefetch(p => p.Photo) // Lazy load field
  .Prefetch(p => p.Employees // EntitySet Employees
    .Prefetch(e => e.Photo)) // and lazy load field for each EntitySet item
  .Prefetch(p => p.Manager); // Referenced entity
foreach (var person in persons) {
  // some code here...
}

This tiny piece of code ensures the following data will be available without any additional roundtrips to a database:

Person.* - all primitive non-lazy fields
Person.Manager - reference field
Person.Photo - lazy loading field
Person.Employees - EntitySet field
Person.Photo - lazy loading field for any item in Person.Employees

As you see, we use the same .Prefetch(...) method everywhere, no matter whether we should handle primitive field, reference field, EntitySet or make a nested prefetch call.

Take a look at the resulting SQL queries that are executed by DataObjects.Net for this prefetch expression tree:

-- Batch 1
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a];

-- Batch 2
SELECT [a].[Id], [a].[TypeId], [a].[Photo] FROM (
  SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay],
    [b].[Photo], [b].[Manager.Id]
  FROM [dbo].[Person] [b]) [a]
WHERE [a].[Id] IN (@p1_0_0_0, @p1_0_1_0);

SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ([a].[Manager.Id] = @p2_0);

SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ([a].[Manager.Id] = @p3_0);

Notice, after the execution of the first query DataObjects.Net performs an additional batch to fetch the rest of the data. Actually there could be a set of such batches – this depends on the complexity of prefetch expression tree.

Prefetch to Enumerable<T>

Prefetch API in DataObjects.Net actually isn’t bound to IQueryable<T> provider – it is a completely separate API based on low-level delayed queries implementation. DataObjects.Net itself relies on it to load necessary parts of state of Entities. As a result, prefetch API can be used for any IEnumerable<T> – i.e. the original collection must not be necessarily an IQueryable<T>. The following code leads to absolutely the same queries as the one above:

var personIds = session.Query.All<Person>().Select(p => p.Id);
var prefetchedPersons = session.Query.Many<Person, int>(personIds)
  .Prefetch(p => p.Photo)
  .Prefetch(p => p.Employees
    .Prefetch(e => e.Photo))
  .Prefetch(p => p.Manager);
foreach (var person in prefetchedPersons) {
  // some code here...
}

Syntactic sugar (shortcuts) in Prefetch API

Starting from DataObjects.Net 4.4, Prefetch API provides a nice shortcut for describing several Prefetch expressions on the same node as anonymous type expression. This simplifies the overall Prefetch expression tree and make it more clean and readable:

var persons = session.Query.All<Person>())
    .Prefetch(p => new { p.Photo, p.Manager })
    .Prefetch(p => p.Employees
    .Prefetch(e => new { e.Photo, e.Manager }));

Note the usage of the construct: Prefetch(p => new { p.Photo, p.Manager }), it does exactly the same as the following code but with less efforts

.Prefetch(p => p.Photo)
  .Prefetch(p => p.Manager)

Delayed (future) queries

DataObjects.Net allows to execute a set of queries as a single batch (roundtrip to database server).

Again, let’s show this on example:

[HierarchyRoot]
public class Person : Entity
{
  [Key, Field]
  public int Id { get; private set; }

  [Field(Length = 200)]
  public string Name { get; set; }

  [Field]
  public DateTime BirthDay { get; set; }

  [Field]
  public Person Manager { get; set; }

  [Field]
  [Association(PairTo = "Manager")]
  public EntitySet<Person> Employees { get; private set; }

  public override string ToString()
  {
    return Name;
  }
}

var employee = new Person {Name = "Employee"};
var manager  = new Person {Name = "Manager"};
manager.Employees.Add(employee);

var simpleCompiledQuery = session.Query.Execute(() =>
  from person in session.Query.All<Person>()
  orderby person.Name
  select person
  );
var managedPersonCount = session.Query.ExecuteFutureScalar(() => (
  from person in session.Query.All<Person>()
  where person.Manager!=null
  select person
  ).Count());
var personsWithEmployees = session.Query.ExecuteFuture(() =>
  from person in session.Query.All<Person>()
  where person.Employees.Count!=0
  select person
  );

Console.WriteLine("All persons: {0}",
  simpleCompiledQuery.ToCommaDelimitedString());
Console.WriteLine("Managed person count: {0}",
  managedPersonCount.Value);
Console.WriteLine("Person with employees: {0}",
  personsWithEmployees.ToCommaDelimitedString());

As you noticed, we use Session.Query.ExecuteFutureXxx methods here to delay query execution.

Notice that Session.Query.Execute call was also not executed “immediately”: any compiled query is executed on its enumeration. So in this case its enumeration lead to execution of the whole bunch of queued future queries.

Note

How many batches DataObjects.Net sends to database server to perform the whole code shown above?

Correct answer is just one batch:

-- Batch 1
exec sp_executesql N'
INSERT INTO [dbo].[Person] ([Id], [Name], [BirthDay], [Manager.Id])
VALUES (@p1_0, @p1_1, @p1_2, @p1_3);

INSERT INTO [dbo].[Person] ([Id], [Name], [BirthDay], [Manager.Id])
VALUES (@p2_0, @p2_1, @p2_2, @p2_3);

SELECT COUNT_BIG(*) AS [column]
FROM (
  SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
  FROM [dbo].[Person] [a]
  WHERE ([a].[Manager.Id] IS NOT NULL)) [b];

SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ((SELECT COUNT_BIG(*) FROM (
  SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay], [b].[Manager.Id]
  FROM [dbo].[Person] [b]
  WHERE ([b].[Manager.Id] = [a].[Id])) [c]) <> 0);

SELECT [a].[Id], [a].[TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM (
  SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay], [b].[Manager.Id]
  FROM [dbo].[Person] [b]) [a]
ORDER BY [a].[Name] ASC;'

So future queries in conjunction with generalized statement batching is a very powerful feature.