Sticky

This blog has moved to www.dreamingincrm.com. Please update your feed Url. Thank you.

21 May 2015

RetrieveMultiple performance on large datasets

EDIT (21/05/15): After DMing with @maustinjones, I have added stat for paged RetrieveMultiple as well. I have updated the post to include this.

EDIT (22/05/15): Updated post to reflect correct behaviour of no-lock when parallelising. Thanks @maustinjones. Please also refer to the follow up post -> http://nycrmdev.blogspot.com.au/2015/05/redux-retrievemultiple-with-large.html on why paging cookie, is the recommended approach.

MSCRM limits the maximum result size to 5000 records, when you use the RetrieveMultiple request. In order to overcome this and retrieve all the records, you'll have to use paging. PFEDynamics team have also released an open-source library called PFE Xrm Core Library that utilises Task Parallel Library.

There is also ExecuteMultipleRequest that you can use to send bunch of requests in one go, and process the responses. This just wanted to document by findings, about the performance of these options:
  1. Just using Parallel.ForEach
  2. ExecuteMultipleRequest
  3. PFE Xrm Core
  4. Paged RetrieveMultiple

Run 1 (Batch size 1000):
I have not included the paged RetrieveMultiple in this scenario, as it is too slow, and I am too impatient to wait for it to complete.


Run 2 (Batch size 5000):



Run 3 (Batch size 5000):


Observations:
  1. Reducing the page size causes a drop in performance. I got better performance with 5000 records in one page, than 1000 records
  2. Just using Parallel.ForEach is faster than PFE Xrm Core (some overhead in instantiating the service perhaps?)
  3. ExecuteMultipleRequest is significantly slower than Parallel.ForEach
Best Practice tip from PFE Dynamics:

If you read the source code for the ParallelServiceProxy class, the help text for the RetrieveMultiple method actually has this note:

IMPORTANT!! This approach should only be used if multiple queries for varying entity types are required or the result set can't be expressed in a single query. In the latter case,
leverage NoLock=true where possible to reduce database contention.

So ideally, you should be using RetrieveMultiple method in ParallelProxy, to retrieve records from multiple entities. But, in my case I am retrieving from the same entity, hence I am using the no-lock option the fetchxml.

DMed with @maustinjones, and here is an important point to consider, before choosing a technique. 

Query pages shouldn't be retrieved in parallel. only separate queries altogether

When you use TPL to submit a bunch of RetrieveMultiple requests in one go, there is a chance of encountering database lock issues, even though you use no-lock. So, it is OK to parallelise queries that fetch from different entities, but using PagingCookie is the recommended approach.

To be honest, I did not experience any lock issues during the test, but standard CRM load was not simulated during the test. I am not sure what would have happened, if users were viewing contacts/running reports on contacts, and I am running these tests at the same time.
Database contentention and locking is applicable only in scenarios, where no-lock is not feasible.

TPL just blitzes the other techniques in these tests, but I'll have to test this further to see if lock contention issues arise, in much larger datasets. Do consider PFE Xrm Core, if you are looking to use RetrieveMultiple from multiple entities. It is written by the PFE Dynamics guys and is open-source.

Best Practice tips from msdn regarding TPL:

This article in msdn is a must read if you are thinking of using TPL -> https://msdn.microsoft.com/en-us/library/dd997392%28v=vs.110%29.aspx.

I had performance issues with TPL, as I had Console.WriteLine inside the lambda for ForEach, and it basically killed the performance. The article told me why, and so it is quite an useful read.

Note about the code:
I quickly wrote this just to test the performance, and not production quality in mind. If you read the code, you can see I am sending 20 requests at a time, inside an infinite loop to retrieve all the pages. I exit the loop when any one of the pages, returns no result. I haven't tried ExecuteMultipleRequest inside the Parallel.ForEach. It would be interesting to see what the performance will be in that case.
Code: Here is the code, I tested with, for reference.

using Microsoft.Pfe.Xrm;
using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Messages;
using Microsoft.Xrm.Sdk.Query;
using Microsoft.Xrm.Tooling.Connector;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using System.Xml.Linq;

namespace Playground
{
    class Program
    {
        static void Main(string[] args)
        {
            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='account' >
                                    <attribute name='name' />
                                    <order attribute='accountid' />
                                    <filter>
                                        <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                   </entity>
                                </fetch>");

            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='contact' >
                                    <attribute name='fullname' />
                                    <order attribute='contactid' />
                                    <filter>
                                      <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                  </entity>
                                </fetch>");
        }

        static void RetrieveAllPages(string fetchXml)
        {
            Console.WriteLine("Entity: {0}\n", XElement.Parse(fetchXml).Element("entity").Attribute("name").Value);
            var organisationSvcManager = new OrganizationServiceManager(new Uri("CRMURL"),
                "[username]", "[password]", "[domain]");
            var crmSvc = new CrmServiceClient(new NetworkCredential("[username]", "[password]", "[domain]"), Microsoft.Xrm.Tooling.Connector.AuthenticationType.AD, "CRMURL", "80", "[Org]");
            var isDone = false;
            int pageStart = 1, pageEnd = 20;
            if (crmSvc.IsReady)
            {
                var stopWatch = new Stopwatch();
                stopWatch.Start();
                var results = new List<Entity>();

                while (!isDone)
                {
                    IDictionary<string, QueryBase> queries = new Dictionary<string, QueryBase>();
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    for (int i = pageStart; i <= pageStart + 19; i++)
                    {
                        queries.Add(i.ToString(), new FetchExpression(string.Format(fetchXml, i)));

                    };
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                    var threadNum = 1;
                    Parallel.ForEach(queries,
                        (query) =>
                        {
                            if (!isDone)
                            {
                                var pageResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(query.Value).Entities;
                                if (!pageResults.Any())
                                {
                                    isDone = true;
                                    return;
                                }
                                results.AddRange(pageResults);
                            }
                        });
                }

                Console.WriteLine(results.Count);
                stopWatch.Stop();
                Console.WriteLine("Parallel: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                var resultCount = 0;
                pageStart = 1;
                pageEnd = 20;
                stopWatch.Restart();
                isDone = false;
                while (!isDone)
                {
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    var executeMultipleRequest = new ExecuteMultipleRequest
                    {
                        Requests = new OrganizationRequestCollection(),
                        Settings = new ExecuteMultipleSettings() { ContinueOnError = true, ReturnResponses = true }
                    };

                    for (var i = pageStart; i <= pageStart + 19; i++)
                    {
                        executeMultipleRequest.Requests.Add(new RetrieveMultipleRequest()
                        {
                            Query = new FetchExpression(string.Format(fetchXml, i))
                        });
                    };
                    var executeMultipleResponses = ((ExecuteMultipleResponse)crmSvc.OrganizationServiceProxy.Execute(executeMultipleRequest)).Responses.ToList();
                    executeMultipleResponses.ForEach(x =>
                    {
                        var executeMultipleResultEntities = ((RetrieveMultipleResponse) x.Response).EntityCollection.Entities;
                        if (!executeMultipleResultEntities.Any())
                        {
                            isDone = true;
                            return;
                        }
                        resultCount = resultCount + executeMultipleResultEntities.Count;
                    });
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                }
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("ExecuteMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                IDictionary<string, QueryBase> entityQuery = new Dictionary<string, QueryBase>();
                entityQuery.Add("result", new FetchExpression(fetchXml));
                stopWatch.Restart();
                var queryResult = organisationSvcManager.ParallelProxy.RetrieveMultiple(entityQuery, true,
                    (pair, exception) => Console.WriteLine("{0} throwed {1}", pair.Key, exception.Message));
                stopWatch.Stop();
                Console.WriteLine(queryResult.Values.First().Entities.Count);
                Console.WriteLine("PFE.Xrm: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(queryResult.Values.First().Entities.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                resultCount = 0;
                var pagedResults = new EntityCollection();
                var fetchToQuery = new FetchXmlToQueryExpressionRequest {FetchXml = string.Format(fetchXml, 1)};
                var retrieveQuery = ((FetchXmlToQueryExpressionResponse)crmSvc.OrganizationServiceProxy.Execute(fetchToQuery)).Query;
                retrieveQuery.PageInfo = new PagingInfo {PageNumber = 1};
                stopWatch.Restart();
                do
                {
                    pagedResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(retrieveQuery);
                    resultCount += pagedResults.Entities.Count;
                    retrieveQuery.PageInfo.PageNumber++;
                    retrieveQuery.PageInfo.PagingCookie = pagedResults.PagingCookie;
                } while (pagedResults.MoreRecords);
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("Paged RetrieveMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(resultCount / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
            }
        }
    }
}

No comments:

Post a Comment