Batching is an ability of DataObjects.NET to join sequences of individual SQL statements into batches, so each batch is sent as part of a single roundtrip to database server.
DataObjects.Net batch can be formed of the following “parts”:
All these cases were shown in scenario described below (future queries example), so no separate batching example is necessary here.
For loading data from a database into memory it’s handy to design things so that as you load an object of interest you also load the objects that are related to it. This makes loading easier on the developer using the object, who otherwise has to load all the objects he needs explicitly. However, if you take this to its logical conclusion, you reach the point where loading one object can have the effect of loading a huge number of related objects – something that hurts performance when only a few of the objects are actually needed. Lazy loading is, in fact, an agreement on interrupting this greedy loading process on certain relationships or properties. But to ensure the further availability of not yet loaded parts of the graph, availability markers inside each fetched object are maintained so that if not yet loaded data is requested, it would be loaded.
DataObjects.Net supports lazy loading for all persistent properties except primary keys:
References to entity – persistent fields of type IEntity (and thus Entity) and its implementers;
EntitySets – persistent fields of type EntitySet<T> and its inheritors;
Structures – persistent fields where field type is inheritor of Structure;
Regular fields – persistent fields of primitive type such as int, string, DateTime, nullable types, arrays of primitive types, etc.
Note
References to entity and EntitySets are lazy load fields by default.
See the following example:
[HierarchyRoot]
public class Person : Entity
{
[Key, Field]
public int Id { get; private set; }
[Field(Length = 200)]
public string Name { get; set; }
[Field]
public DateTime BirthDay { get; set; }
[Field(LazyLoad = true, Length = 65536)]
public byte[] Photo { get; set; }
[Field(LazyLoad = true, Length = 8192)]
public byte[] Avatar { get; set; }
[Field(LazyLoad = true)]
public Address Address { get; set; }
[Field]
public Person Manager { get; private set; }
[Field]
[Association(PairTo = "Manager")]
public EntitySet<Person> Employees { get; private set; }
}
public class Address : Structure
{
[Field(Length = 60)]
public string Street { get; set; }
[Field(Length = 15)]
public string City { get; set; }
[Field(Length = 15)]
public string Region { get; set; }
[Field(Length = 10)]
public string PostalCode { get; set; }
[Field(Length = 15)]
public string Country { get; set; }
}
In this example the following fields are subject of lazy loading:
Reference fields are internally represented as a set of fields (columns) forming the key of referenced object – so there is no reference to materialized Entity or its Key. Consequences:
So marking a reference field as [Field(LazyLoad=true)] will lead to lazy loading of reference key (foreign key value); reference itself is anyway resolved lazily.
Thus marking a reference field as [Field(LazyLoad=true)] is not recommended. Reading additional value per each row is normally quite cheap (keys are usually short), but getting additional rountrip to database for it is not.
EntitySet is intrinsically lazy:
If property of EntitySet type isn’t read accessed yet, EntitySet isn’t even created. So EntitySet creation is lazy.
EntitySet maintains its own state. EntitySet state answers on the following questions:
State of any created EntitySet is set to “nothing is loaded yet”.
If EntitySet state is “nothing is loaded yet”, any attempt to access its properties or methods, that must lead to partial state loading, leads to a request, that can be described as “try to load the state completely first”. Prcesily, EntitySet asks prefetcher to load up to 32 of items it must contain:
EntitySet enumeration leads to full state loading. So if enumeration is the first operation on a particular EntitySet in the current transaction, there will be no partial state loading attempt.
If a particular read operation on EntitySet has completed, its subsequent invocation in the same transaction won’t lead to database hit.
Note
Such a “greedy” partial state loading behavior is logical: DataObjects.Net is able to perform about 10K queries per second, buts materialization speed reaches 400K entities per second. This means there is almost no difference between fetching 1 entity or 40. That’s why DO4 tries to load more even you didn’t ask it for this: it does this because even if there are all 32 items, the cost of loading all of them will be similar to loading one. High probability of getting less than 32 items is another reason for doing this: we assume most of collections are either empty, or contain just few items, and only some of them are filled by hundreeds of items. So we get all the contents of most collections by a single query instead of torturing RDBMS by multiple ones. Note that partial state loading may expose negative effect on large collections: if some collection contains e.g. 1000 items, an attempt to read its Count property will first lead to partial state loading query, and then to a query fetching Count value. On the other hand, you can deal with this case using our prefetch API. Later we’ll offer more options allowing to control partial state loading behavior more precisely.
Marking a EntitySet field as [Field(LazyLoad=true)] is impossible: such fields are intrinsically lazy.
[Field(LazyLoad=true)] ensures undrelying columns behing such a property won’t be requested from database:
If field marked by [Field(LazyLoad=true)] is not fetched, its value will be loaded on attempt to read it.
But it’s important to know that generally any field in DataObjects.Net can be actually loaded lazily, even if it isn’t marked as lazy loading field. This is related to inheritance:
Let’s look at prefetch API usage example for persitent model we defined earlier:
var persons = session.Query.All<Person>()
.Prefetch(p => p.Avatar)
.Prefetch(p => p.Photo)
.Prefetch(p => p.Employees);
foreach (var person in persons) {
// Some code here...
}
As you see, here we make sure all the lazy properties including EntitySet are loaded before further processing using the way minimizing database chattiness (prefetch API).
Prefetch API provides a way of grouping Entity or EntitySet state fetch request together to execute multiple of them during a single query or database server roundtrip. DataObjects.Net combines individual prefetch queries together and joining the resulting ones in batches (so it’s a part of generalized batching feature).
Let’s see prefetch API in action:
[HierarchyRoot]
public class Person : Entity
{
[Key, Field]
public int Id { get; private set; }
[Field(LazyLoad = true, Length = 65536)]
public byte[] Photo { get; set; }
[Field]
public Person Manager { get; set; }
[Field]
[Association(PairTo = "Manager")]
public EntitySet<Person> Employees { get; private set; }
...
}
var persons = session.Query.All<Person>()
.Prefetch(p => p.Photo) // Lazy load field
.Prefetch(p => p.Employees // EntitySet Employees
.Prefetch(e => e.Photo)) // and lazy load field for each EntitySet item
.Prefetch(p => p.Manager); // Referenced entity
foreach (var person in persons) {
// some code here...
}
This tiny piece of code ensures the following data will be available without any additional roundtrips to a database:
As you see, we use the same .Prefetch(...) method everywhere, no matter whether we should handle primitive field, reference field, EntitySet or make a nested prefetch call.
Take a look at the resulting SQL queries that are executed by DataObjects.Net for this prefetch expression tree:
-- Batch 1
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a];
-- Batch 2
SELECT [a].[Id], [a].[TypeId], [a].[Photo] FROM (
SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay],
[b].[Photo], [b].[Manager.Id]
FROM [dbo].[Person] [b]) [a]
WHERE [a].[Id] IN (@p1_0_0_0, @p1_0_1_0);
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ([a].[Manager.Id] = @p2_0);
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ([a].[Manager.Id] = @p3_0);
Notice, after the execution of the first query DataObjects.Net performs an additional batch to fetch the rest of the data. Actually there could be a set of such batches – this depends on the complexity of prefetch expression tree.
Prefetch API in DataObjects.Net actually isn’t bound to IQueryable<T> provider – it is a completely separate API based on low-level delayed queries implementation. DataObjects.Net itself relies on it to load necessary parts of state of Entities. As a result, prefetch API can be used for any IEnumerable<T> – i.e. the original collection must not be necessarily an IQueryable<T>. The following code leads to absolutely the same queries as the one above:
var personIds = session.Query.All<Person>().Select(p => p.Id);
var prefetchedPersons = session.Query.Many<Person, int>(personIds)
.Prefetch(p => p.Photo)
.Prefetch(p => p.Employees
.Prefetch(e => e.Photo))
.Prefetch(p => p.Manager);
foreach (var person in prefetchedPersons) {
// some code here...
}
Starting from DataObjects.Net 4.4, Prefetch API provides a nice shortcut for describing several Prefetch expressions on the same node as anonymous type expression. This simplifies the overall Prefetch expression tree and make it more clean and readable:
var persons = session.Query.All<Person>())
.Prefetch(p => new { p.Photo, p.Manager })
.Prefetch(p => p.Employees
.Prefetch(e => new { e.Photo, e.Manager }));
Note the usage of the construct: Prefetch(p => new { p.Photo, p.Manager }), it does exactly the same as the following code but with less efforts
.Prefetch(p => p.Photo)
.Prefetch(p => p.Manager)
DataObjects.Net allows to execute a set of queries as a single batch (roundtrip to database server).
Again, let’s show this on example:
[HierarchyRoot]
public class Person : Entity
{
[Key, Field]
public int Id { get; private set; }
[Field(Length = 200)]
public string Name { get; set; }
[Field]
public DateTime BirthDay { get; set; }
[Field]
public Person Manager { get; set; }
[Field]
[Association(PairTo = "Manager")]
public EntitySet<Person> Employees { get; private set; }
public override string ToString()
{
return Name;
}
}
var employee = new Person {Name = "Employee"};
var manager = new Person {Name = "Manager"};
manager.Employees.Add(employee);
var simpleCompiledQuery = session.Query.Execute(() =>
from person in session.Query.All<Person>()
orderby person.Name
select person
);
var managedPersonCount = session.Query.ExecuteFutureScalar(() => (
from person in session.Query.All<Person>()
where person.Manager!=null
select person
).Count());
var personsWithEmployees = session.Query.ExecuteFuture(() =>
from person in session.Query.All<Person>()
where person.Employees.Count!=0
select person
);
Console.WriteLine("All persons: {0}",
simpleCompiledQuery.ToCommaDelimitedString());
Console.WriteLine("Managed person count: {0}",
managedPersonCount.Value);
Console.WriteLine("Person with employees: {0}",
personsWithEmployees.ToCommaDelimitedString());
As you noticed, we use Session.Query.ExecuteFutureXxx methods here to delay query execution.
Notice that Session.Query.Execute call was also not executed “immediately”: any compiled query is executed on its enumeration. So in this case its enumeration lead to execution of the whole bunch of queued future queries.
Note
How many batches DataObjects.Net sends to database server to perform the whole code shown above?
Correct answer is just one batch:
-- Batch 1
exec sp_executesql N'
INSERT INTO [dbo].[Person] ([Id], [Name], [BirthDay], [Manager.Id])
VALUES (@p1_0, @p1_1, @p1_2, @p1_3);
INSERT INTO [dbo].[Person] ([Id], [Name], [BirthDay], [Manager.Id])
VALUES (@p2_0, @p2_1, @p2_2, @p2_3);
SELECT COUNT_BIG(*) AS [column]
FROM (
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ([a].[Manager.Id] IS NOT NULL)) [b];
SELECT [a].[Id], 101 AS [TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM [dbo].[Person] [a]
WHERE ((SELECT COUNT_BIG(*) FROM (
SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay], [b].[Manager.Id]
FROM [dbo].[Person] [b]
WHERE ([b].[Manager.Id] = [a].[Id])) [c]) <> 0);
SELECT [a].[Id], [a].[TypeId], [a].[Name], [a].[BirthDay], [a].[Manager.Id]
FROM (
SELECT [b].[Id], 101 AS [TypeId], [b].[Name], [b].[BirthDay], [b].[Manager.Id]
FROM [dbo].[Person] [b]) [a]
ORDER BY [a].[Name] ASC;
So future queries in conjunction with generalized statement batching is a very powerful feature.