EDIT (21/05/15): After DMing with
@maustinjones, I have added stat for paged
RetrieveMultiple as well. I have updated the post to include this.
EDIT (22/05/15): Updated post to reflect correct behaviour of no-lock when parallelising. Thanks @maustinjones. Please also refer to the follow up post ->
http://nycrmdev.blogspot.com.au/2015/05/redux-retrievemultiple-with-large.html on why paging cookie, is the recommended approach.
MSCRM limits the maximum result size to 5000 records, when you use the
RetrieveMultiple request. In order to overcome this and retrieve all the records, you'll have to use paging. PFEDynamics team have also released an open-source library called
PFE Xrm Core Library that utilises
Task Parallel Library.
There is also
ExecuteMultipleRequest that you can use to send bunch of requests in one go, and process the responses. This just wanted to document by findings, about the performance of these options:
- Just using Parallel.ForEach
- ExecuteMultipleRequest
- PFE Xrm Core
- Paged RetrieveMultiple
Run 1 (Batch size 1000):
I have not included the paged
RetrieveMultiple in this scenario, as it is too slow, and I am too impatient to wait for it to complete.
Run 2 (Batch size 5000):
Run 3 (Batch size 5000):
Observations:
- Reducing the page size causes a drop in performance. I got better performance with 5000 records in one page, than 1000 records
- Just using Parallel.ForEach is faster than PFE Xrm Core (some overhead in instantiating the service perhaps?)
- ExecuteMultipleRequest is significantly slower than Parallel.ForEach
Best Practice tip from PFE Dynamics:
If you read the source code for the
ParallelServiceProxy class, the help text for the
RetrieveMultiple method actually has this note:
IMPORTANT!! This approach should only be used if multiple queries for varying entity types are required or the result set can't be expressed in a single query. In the latter case,
leverage NoLock=true where possible to reduce database contention.
So ideally, you should be using
RetrieveMultiple method in
ParallelProxy, to retrieve records from multiple entities. But, in my case I am retrieving from the same entity, hence I am using the
no-lock option the fetchxml.
DMed with @maustinjones, and here is an important point to consider, before choosing a technique.
Query pages shouldn't be retrieved in parallel. only separate queries altogether
When you use TPL to submit a bunch of RetrieveMultiple requests in one go, there is a chance of encountering database lock issues, even though you use no-lock. So, it is OK to parallelise queries that fetch from different entities, but using PagingCookie is the recommended approach.
To be honest, I did not experience any lock issues during the test, but standard CRM load was not simulated during the test. I am not sure what would have happened, if users were viewing contacts/running reports on contacts, and I am running these tests at the same time.
Database contentention and locking is applicable only in scenarios, where no-lock is not feasible.
TPL just blitzes the other techniques in these tests, but I'll have to test this further to see if lock contention issues arise, in much larger datasets. Do consider PFE Xrm Core, if you are looking to use RetrieveMultiple from multiple entities. It is written by the PFE Dynamics guys and is open-source.
Best Practice tips from msdn regarding TPL:
This article in msdn is a must read if you are thinking of using TPL ->
https://msdn.microsoft.com/en-us/library/dd997392%28v=vs.110%29.aspx.
I had performance issues with TPL, as I had
Console.WriteLine inside the lambda for ForEach, and it basically killed the performance. The article told me why, and so it is quite an useful read.
Note about the code:
I quickly wrote this just to test the performance, and not production quality in mind. If you read the code, you can see I am sending 20 requests at a time, inside an infinite loop to retrieve all the pages. I exit the loop when any one of the pages, returns no result. I haven't tried
ExecuteMultipleRequest inside the
Parallel.ForEach. It would be interesting to see what the performance will be in that case.
Code: Here is the code, I tested with, for reference.
using Microsoft.Pfe.Xrm;
using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Messages;
using Microsoft.Xrm.Sdk.Query;
using Microsoft.Xrm.Tooling.Connector;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using System.Xml.Linq;
namespace Playground
{
class Program
{
static void Main(string[] args)
{
RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
<entity name='account' >
<attribute name='name' />
<order attribute='accountid' />
<filter>
<condition attribute='statecode' operator='eq' value='0' />
</filter>
</entity>
</fetch>");
RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
<entity name='contact' >
<attribute name='fullname' />
<order attribute='contactid' />
<filter>
<condition attribute='statecode' operator='eq' value='0' />
</filter>
</entity>
</fetch>");
}
static void RetrieveAllPages(string fetchXml)
{
Console.WriteLine("Entity: {0}\n", XElement.Parse(fetchXml).Element("entity").Attribute("name").Value);
var organisationSvcManager = new OrganizationServiceManager(new Uri("CRMURL"),
"[username]", "[password]", "[domain]");
var crmSvc = new CrmServiceClient(new NetworkCredential("[username]", "[password]", "[domain]"), Microsoft.Xrm.Tooling.Connector.AuthenticationType.AD, "CRMURL", "80", "[Org]");
var isDone = false;
int pageStart = 1, pageEnd = 20;
if (crmSvc.IsReady)
{
var stopWatch = new Stopwatch();
stopWatch.Start();
var results = new List<Entity>();
while (!isDone)
{
IDictionary<string, QueryBase> queries = new Dictionary<string, QueryBase>();
//Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
for (int i = pageStart; i <= pageStart + 19; i++)
{
queries.Add(i.ToString(), new FetchExpression(string.Format(fetchXml, i)));
};
pageStart = pageEnd + 1;
pageEnd = pageStart + 19;
var threadNum = 1;
Parallel.ForEach(queries,
(query) =>
{
if (!isDone)
{
var pageResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(query.Value).Entities;
if (!pageResults.Any())
{
isDone = true;
return;
}
results.AddRange(pageResults);
}
});
}
Console.WriteLine(results.Count);
stopWatch.Stop();
Console.WriteLine("Parallel: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
var resultCount = 0;
pageStart = 1;
pageEnd = 20;
stopWatch.Restart();
isDone = false;
while (!isDone)
{
//Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
var executeMultipleRequest = new ExecuteMultipleRequest
{
Requests = new OrganizationRequestCollection(),
Settings = new ExecuteMultipleSettings() { ContinueOnError = true, ReturnResponses = true }
};
for (var i = pageStart; i <= pageStart + 19; i++)
{
executeMultipleRequest.Requests.Add(new RetrieveMultipleRequest()
{
Query = new FetchExpression(string.Format(fetchXml, i))
});
};
var executeMultipleResponses = ((ExecuteMultipleResponse)crmSvc.OrganizationServiceProxy.Execute(executeMultipleRequest)).Responses.ToList();
executeMultipleResponses.ForEach(x =>
{
var executeMultipleResultEntities = ((RetrieveMultipleResponse) x.Response).EntityCollection.Entities;
if (!executeMultipleResultEntities.Any())
{
isDone = true;
return;
}
resultCount = resultCount + executeMultipleResultEntities.Count;
});
pageStart = pageEnd + 1;
pageEnd = pageStart + 19;
}
Console.WriteLine(resultCount);
stopWatch.Stop();
Console.WriteLine("ExecuteMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
IDictionary<string, QueryBase> entityQuery = new Dictionary<string, QueryBase>();
entityQuery.Add("result", new FetchExpression(fetchXml));
stopWatch.Restart();
var queryResult = organisationSvcManager.ParallelProxy.RetrieveMultiple(entityQuery, true,
(pair, exception) => Console.WriteLine("{0} throwed {1}", pair.Key, exception.Message));
stopWatch.Stop();
Console.WriteLine(queryResult.Values.First().Entities.Count);
Console.WriteLine("PFE.Xrm: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
Console.WriteLine(queryResult.Values.First().Entities.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
resultCount = 0;
var pagedResults = new EntityCollection();
var fetchToQuery = new FetchXmlToQueryExpressionRequest {FetchXml = string.Format(fetchXml, 1)};
var retrieveQuery = ((FetchXmlToQueryExpressionResponse)crmSvc.OrganizationServiceProxy.Execute(fetchToQuery)).Query;
retrieveQuery.PageInfo = new PagingInfo {PageNumber = 1};
stopWatch.Restart();
do
{
pagedResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(retrieveQuery);
resultCount += pagedResults.Entities.Count;
retrieveQuery.PageInfo.PageNumber++;
retrieveQuery.PageInfo.PagingCookie = pagedResults.PagingCookie;
} while (pagedResults.MoreRecords);
Console.WriteLine(resultCount);
stopWatch.Stop();
Console.WriteLine("Paged RetrieveMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
Console.WriteLine(resultCount / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
}
}
}
}