## Running "find" in Parallel

From: andrew cooke <andrew@...>

Date: Mon, 21 Sep 2009 19:19:16 -0400

It's not uncommon to want to process a whole pile of files in some
way.  Typically I do something like:

find . -name "*.foo" -exec do_something \{} \;

But this processes each file in turn.  If you've got a multi-core
computer and look at, say, the output from "htop" (a nicer version of
"top") you'll see just one core is being used.

I can't find a way to improve this using just find, but "xargs" has an
option "-P" which allows multiple processes to be started.  So the
following uses 4 processes instead of just 1:

find . -name "*.foo" -print0 | xargs -0 -P4 do_something

However, it's not quite equivalent.  That passes a whole pile of
arguments to "do_something", while the first "find" example does them
one by one.  It's kind of irrelevant here, but it's actually like a
parallel version of:

find . -name "*.foo" +exec do_something \{} +

Also, the man page for find suggests you should use execdir rather than exec.

I think you can get the same thing with xargs by using "-n1".  So

find . -name "*.foo" -print0 | xargs -0 -P4 -n1 do_something

should start a new "do_something" for each arg, but keep 4 of them "on
the boil"...

Andrew