# C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

© 2006-2013 Andrew Cooke (site) / post authors (content).

## Sector/Sphere - Distributed Computing on Widespread, Heterogenous Networks

From: "andrew cooke" <andrew@...>

Date: Sun, 17 May 2009 16:59:02 -0400 (CLT)

Some snippets from the manual:

"Sector can be regarded as a distributed storage/file system. It supports
most file system semantics"

"When a file is uploaded to Sector from one location, the file will be
duplicated multiple times on other nodes. Sector duplicates files one
clusters as far away as possible. If the system is deployed over wide area
networks, Sector can be used a content distribution network."

"consider the following example application. Assume we have a collection
of astronomical images from the Sloan Digital Sky Survey and the goal is
to find brown dwarfs (a stellar object) in these images. The SDSS dataset
is stored in N files, named SDSS1.dat, …, SDSSn.dat, each consisting one
or more images."

"At the abstract level, Sphere allows users to define a data processing
function (UDF, or user defined function) that accepts a group of Sector
files as inputs, and writes the result to a group of Sector files as
outputs, when necessary. We use the term Sphere streams (or simplified as
streams) to describe the group of Sector files, either as inputs or
outputs.

Inputs -> UDF -> Outputs

In the paradigm above, both the inputs and the outputs can be single or
multiple streams, whereas the outputs of one Sphere processing can be the
input of the next processing"

http://sector.sourceforge.net/doc/index.htm
http://sector.sourceforge.net/doc.html

"Simple.  Distributed"

Andrew