Batch remove
By Anatoly Mironov
To add items to a list in bulk, or remove them in bulk. Well to do that you can use SPLinq, Server Object Model and … web.ProcessBatchData, which is the most effective. I did an experiment today. I created 1000 tasks in my task list three times and removed all items in three different ways and took time. First I put 1000 items with ajax and listdata.svc:
function pad(n) {
if (n >= 10000) return n;
if (n >= 1000) return '0' + n;
if (n >= 100) return '00' + n;
if (n >= 10) return '000' + n;
return '0000' + n;
}
var s;
var counter = 0;
var title = "task "
function foo() {
counter++;
if (counter > 10000) {
window.clearInterval(s);
}
else {
var value = {};
value.title = title + pad(counter);
createNewTask(value);
}
}
s = setInterval(foo, 200);
Then I ran three different scenarios. Here is the resut for VMWare machine Windows Server 2008 R2, 4GB RAM, SharePoint 2010 SP1:
1000 items
10 000 items
20 000 items
Batch
00h:00m:28s.950ms (29s)
00h:04m:20s.730ms (261s)
00h:08m:35s.260ms (515s)
Object Model
00h:01m:37s.860ms (98s)
00h:16m:40s.770ms (1001s)
00h:33m:50s.250ms (2030s)
SPLinq
00h:02m:24s.070ms (144s)
02h:17m:02s.570ms (8223s)
11h:51m:08s.130ms (42668s)
This result confirms another comparision. See even advantages and disadvantages of linq2sp vs caml. The time includes even the instantiation of the SPSite and SPWeb objects and getting of task ids to remove, this to get a fair comparision with SPLinq, where this instantiation is implicit (within the context).
Average time for removing one list item:
1000 items
10 000 items
20 000 items
Batch
29.0ms
26.2ms
25.8ms
Object Model
97.9ms
100.1ms
101.5ms
SPLinq
144.1ms
822.3ms
2133.4ms
Some curious facts: 1. SPLinq method left one list item after removing 10 000 batch (task 08787) and three items while removing 20 000 batch (task 07550, task 12042, task 19663). Why? 2. I opened the list sometimes and I found that deleting of items took very long time in the beginning, and then it became faster and faster. I suppose the whole data context was saved after every submission. I am wondering if it would be different if I’d set ObjectTrackingEnabled to false, which is true by default. See the whole blog entry for the code I ran to compare these three ways to remove list items.
internal static void Remove1000()
{
var stopWatch = new Stopwatch();
stopWatch.Start();
//uncomment one of them
//BatchRemove();//00:00:28.95
//ObjectModelRemove();//00:01:37.86
//LinqToSPRemove(); //00:02:24.07
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
var ts = stopWatch.Elapsed;
// Format and display the TimeSpan value.
var elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("RunTime " + elapsedTime);
}
private static void LinqToSPRemove()
{
var ctx = new OppgaverDataContext("http://takana/");
var taskList = ctx.GetList<OppgaverTask>("Oppgaver");
var tasks = from o in ctx.Oppgaver select o;
foreach (var t in tasks)
{
taskList.DeleteOnSubmit((OppgaverTask) t);
}
ctx.SubmitChanges();
}
private static void ObjectModelRemove()
{
using (var site = new SPSite("http://takana"))
{
using (var web = site.OpenWeb())
{
var lst = web.Lists\["Oppgaver"\];
var query = new SPQuery
{
ViewFields = "<FieldRef Name='ID'/>",
IncludePermissions = false,
RowLimit = 20000
};
// first get all the id's to be deleted
var al = new ArrayList();
do
{
var myItems = lst.GetItems(query);
foreach (SPListItem item in myItems) // note use enumeration
{
al.Add(item.ID);
}
query.ListItemCollectionPosition = myItems.ListItemCollectionPosition;
}
while (query.ListItemCollectionPosition != null);
for (int i = 0; i < al.Count; i++)
{
var item = lst.GetItemById((int) al\[i\]);
item.Delete();
}
}
}
}
private static void BatchRemove()
{
using(var site = new SPSite("http://takana"))
{
using (var web = site.OpenWeb())
{
var lst = web.Lists\["Oppgaver"\];
var query = new SPQuery
{
ViewFields = "<FieldRef Name='ID'/>",
IncludePermissions = false,
RowLimit = 20000
};
// first get all the id's to be deleted
var al = new ArrayList();
do
{
var myItems = lst.GetItems(query);
foreach (SPListItem item in myItems) // note use enumeration
{
al.Add(item.ID.ToString());
}
query.ListItemCollectionPosition = myItems.ListItemCollectionPosition;
}
while (query.ListItemCollectionPosition != null);
var sbDelete = new StringBuilder();
sbDelete.Append("<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?><Batch>");
var listguid = lst.ID.ToString();
var bcount = 0;
for (int i = 0; i < al.Count; i++)
{
if (bcount > 1000)
{
sbDelete.Append("</Batch>");
lst.ParentWeb.ProcessBatchData(sbDelete.ToString());
sbDelete = new StringBuilder();
sbDelete.Append("<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?><Batch>");
bcount = 0;
}
bcount++;
sbDelete.Append("<Method>");
sbDelete.Append("<SetList Scope=\\"Request\\">" + listguid + "</SetList>");
sbDelete.Append("<SetVar Name=\\"ID\\">" + al\[i\] + "</SetVar>");
sbDelete.Append("<SetVar Name=\\"Cmd\\">Delete</SetVar>");
sbDelete.Append("</Method>");
}
sbDelete.Append("</Batch>");
web.ProcessBatchData(sbDelete.ToString());
}
}
}
To create many items as quickly as possible you can use the same ProcessBatchData, e.g. in a console application:
//takes approx. 02m:35s
internal static void Add20Thousends()
{
using (var site = new SPSite("http://takana"))
{
using (var web = site.OpenWeb())
{
var lst = web.Lists\["Oppgaver"\];
var sb = new StringBuilder();
sb.Append("<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?><Batch>");
var listguid = lst.ID.ToString();
for (var i = 20000; i < 40000; i++)
{
sb.Append("<Method>");
sb.Append("<SetList>" + listguid + "</SetList>");
sb.Append("<SetVar Name=\\"ID\\">New</SetVar>");
sb.Append("<SetVar Name=\\"Cmd\\">Save</SetVar>");
sb.Append("<SetVar Name=\\"urn:schemas-microsoft-com:office:office#Title\\">" +
string.Format("task {0:00000}", i+1) +
"</SetVar>");
sb.Append("</Method>");
}
sb.Append("</Batch>");
Console.WriteLine("Sending batch job");
web.ProcessBatchData(sb.ToString());
}
}
}
Or even better in a powershell script:
if(-not(Get-PSSnapin |
Where { $\_.Name -eq "Microsoft.SharePoint.PowerShell"})) {
Add-PSSnapin Microsoft.SharePoint.PowerShell
}
$web = get-spweb http://dev
$list = $web.Lists\["Oppgaver"\]
$listid = $list.ID.ToString()
$b = '<?xml version="1.0" encoding="UTF-8"?><Batch>'
$cmd = '<Method>'
$cmd += '<SetList>{0}</SetList>'
$cmd += '<SetVar Name="ID">New</SetVar>'
$cmd += '<SetVar Name="Cmd">Save</SetVar>'
$cmd += '<SetVar Name="urn:schemas-microsoft-com:office:office#Title">task {1:00000}</SetVar>'
$cmd += '<SetVar Name="urn:schemas-microsoft-com:office:office#DueDate">{2}</SetVar>'
$cmd += '</Method>'
$i = 0
$d = \[DateTime\]::UtcNow
$d = $d.AddDays(-30)
$iso = \[string\]::concat($d.ToString("s"), "Z")
while ($i -lt 1000) {
$b += $cmd -f $listid, ++$i, $iso
}
$b += '</Batch>'
$web.ProcessBatchData($b)
Pay attention to quotation marks, the attributes must be double quoted otherwise you’ll get an exception:
Exception calling "ProcessBatchData" with "1" argument(s): "0x80070057"
At line:1 char:22
+ $web.processbatchdata <<<< ($b)
+ CategoryInfo : NotSpecified: (:) \[\], MethodInvocationException
+ FullyQualifiedErrorId : DotNetMethodException
If you wonder how to access specific columns read the RPC documentation on msdn. DueDate field is accepting a date in ISO 8601 format. This batch functionality can be used for populating lists in development environments. By the way, here I can recommend a nice application from codeplex: SharePointTestDataTool.sharepointtestdatatool
Comments from Wordpress.com
Mattias - Jan 4, 2012
My code look something like this and it works when i tried delete 40000 items.
private void DeleteStuff(SPWeb web)
{
StringBuilder sbDelete;
var query = new SPQuery
query.Query = "Something";
query.RowLimit = 5000
do
{
sbDelete = new StringBuilder;
sbDelete = sbDelete.Append("<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?><Batch OnError=\\"Return\\">");
var myItems = lst.GetItems(query);
foreach (SPListItem item in myItems) // note use enumeration
{
sbDelete.Append("<Method>");
sbDelete.Append("<SetList Scope=\\"Request\\">" + Convert.ToString(item.ParentList.ID) +
"</SetList>");
sbDelete.Append("<<SetVar Name=\\"ID\\">" + Convert.ToString(item.ID) + "</SetVar");
sbDelete.Append("<SetVar Name=\\"Cmd\\">Save</SetVar>");
sbDelete.Append("<SetVar Name=\\"Cmd\\">Delete</SetVar>");
sbDelete.Append("</Method>");
}
sbDelete.Append("</Batch>");
web.ProcessBatchData(sbDelete.ToString());
query.ListItemCollectionPosition = myItems.ListItemCollectionPosition;
} while (query.ListItemCollectionPosition != null);
}
David - Jan 2, 2012
Very good stuff, thank you!
Anatoly Mironov - Jan 2, 2012
Cool that you liked it!
Anatoly Mironov - Jan 4, 2012
Thank you Mattias for your comment and improvement suggestsions. Do you mean a kind of for loop which invokes BatchRemove which only “takes care” of 1000 items at a time?
Mattias - Jan 4, 2012
I’ve tried similiar code to delete 40 000 objects and it works great. Make sure to use the rowlimit property, otherwise if the list becomes to big to handle it times out. I use this in a timerjob. Also, instead of if(bcount > 1000), do it in the rowlimit instead. Rowlimit = 1000 and run the method without “if(bcount>1000)”
Anatoly Mironov - Nov 2, 2012
Thanks. Glad to hear!
ponniprakasam - Nov 2, 2012
Good Work.Highly Useful Post