Sticky

This blog has moved to www.dreamingincrm.com. Please update your feed Url. Thank you.

22 May 2015

Redux: RetrieveMultiple on large datasets

In trying to optimise, RetrieveMultiple performance I thought I had hit a sweet-spot with using TPL, with RetrieveMultiple request spanning across pages, but after talking to @maustinjones, I now have come to the conclusion, that is probably not the best approach. The dataset size I tested on: contact and account, are large, but not significantly large. After looking at the SQL Trace logs, I have decided to stick with the pagingcookie approach, while dealing with large datasets.

In order to understand, why the pagingcookie approach is more efficient, it is important to understand, how your FetchXml is translated into SQL.

Scenario:
Retrieve all records from account. Total Number of records is 10,177

Approach 1 - TPL with RetrieveMultiple parallised across pages

Request
FetchXml
Translated SQL
1
<fetch page="1" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
exec sp_executesql N'select top 5001 "account0".Name as "name", "account0".AccountId as "accountid"
from AccountBase as "account0" WITH (NOLOCK) 
where (("account0".StateCode = @StateCode0))
order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0
2
<fetch page="2" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
exec sp_executesql N'select top 10001 "account0".Name as "name", "account0".AccountId as "accountid"
from AccountBase as "account0" WITH (NOLOCK) 
where (("account0".StateCode = @StateCode0))
order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0
3
<fetch page="3" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
exec sp_executesql N'select top 15001 "account0".Name as "name", "account0".AccountId as "accountid"
from AccountBase as "account0" WITH (NOLOCK) 
where (("account0".StateCode = @StateCode0))
order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0
4
<fetch page="4" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
exec sp_executesql N'select top 20001 "account0".Name as "name", "account0".AccountId as "accountid"
from AccountBase as "account0" WITH (NOLOCK) 
where (("account0".StateCode = @StateCode0))
order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0
5
<fetch page="5" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
exec sp_executesql N'select top 25001 "account0".Name as "name", "account0".AccountId as "accountid"
from AccountBase as "account0" WITH (NOLOCK) 
where (("account0".StateCode = @StateCode0))
order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0


As you can see, the SQL query that is generated, is not exactly optimal. There are only 10,177 records in total, but 45,533 records (5001 + 10001 + 10177 + 10177 + 10177) are retrieved across 5 requests. The correct number of records (10,177)  are returned only after further in-memory processing.

Approach 2 - PagingCookie

Request
FetchXml
Paging Cookie
Translated SQL
1
<fetch page="1" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
NULL
exec sp_executesql N'select top 5001 "account0".Name as "name", "account0".AccountId as "accountid" from AccountBase as "account0" where (("account0".StateCode = @StateCode0)) order by "account0".AccountId asc',N'@StateCode0 int',@StateCode0=0
2
<fetch page="2" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
<cookie page="1"><accountid first="{4005F675-164D-E411-80D3-00155D018813}" last="{71C7E14D-357C-E411-80D8-00155D018813}"/></cookie>
exec sp_executesql N'select top 5001 "account0".Name as "name", "account0".AccountId as "accountid" from AccountBase as "account0" where (("account0".StateCode = @StateCode0) and (("account0".AccountId > @AccountId0))) order by "account0".AccountId asc',N'@StateCode0 int,@AccountId0 uniqueidentifier',@StateCode0=0,@AccountId0='71C7E14D-357C-E411-80D8-00155D018813'
3
<fetch page="3" no-lock="true" count="5000">
<entity name="account">
<attribute name="name"/>
<order attribute="accountid"/>
<filter>
<condition attribute="statecode" value="0" operator="eq"/>
</filter>
</entity>
</fetch>
<cookie page="2"><accountid first="{77C7E14D-357C-E411-80D8-00155D018813}" last="{968467AD-357C-E411-80D8-00155D018813}"/></cookie>
exec sp_executesql N'select top 5001 "account0".Name as "name", "account0".AccountId as "accountid" from AccountBase as "account0" where (("account0".StateCode = @StateCode0) and (("account0".AccountId > @AccountId0))) order by "account0".AccountId asc',N'@StateCode0 int,@AccountId0 uniqueidentifier',@StateCode0=0,@AccountId0='968467AD-357C-E411-80D8-00155D018813'

10,179 records (5001+5001+177) are retrieved across three requests, and further processing cuts down the result record set to the correct count of 10,177.

Random Notes:
Looking at how paging cookie translated to SQL, reminds me of this post from PFE Dynamics guys -> Dangers of Guid.NewGuid. I believe that the entity primary key is generated using NEWSEQUENTIALID, and hence it is not advisable to insert records directly into the MSCRM database. This is also probably how "> uniqueidentifier" query is optimised, as the Guid is in sequence, and the clustered index on the primary key can be efficiently used in this query.


Conclusion:
So, compared to the TPL approach, paging cookie approach
  1. Reduces the number of queries sent to the database
  2. Respects the count size specified in the fetchxml
  3. Does less in-memory processing
  4. Reduces the chances of encountering a database lock

In spite of all these obversations, and understanding how efficient the pagingcookie queries are, I still cannot comprehend how TPL is faster, even though it inefficiently queries the database.

Credit: Big thank you to @maustinjones in helping me understand the performance pitfills when using TPL, in this scenario.

No comments:

Post a Comment