| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

Last 100 entries

The Black Dork Lives!; The UN Requires Unaninmous Decisions; LPIR - Steganography in Practice; How I Am 6; Clear Explanation of Verizon / Level 3 / Netflix; Teenage Girls; Formalising NSA Attacks; Switching Brakes (Tektro Hydraulic); Naim NAP 100 (Power Amp); AKG 550 First Impressions; Facebook manipulates emotions (no really); Map Reduce "No Longer Used" At Google; Removing RAID metadata; New Bike (Good Bike Shop, Santiago Chile); Removing APE Tags in Linux; Compiling Python 3.0 With GCC 4.8; Maven is Amazing; Generating Docs from a GitHub Wiki; Modular Shelves; Bash Best Practices; Good Emergency Gasfiter (Santiago, Chile); Readings in Recent Architecture; Roger Casement; Integrated Information Theory (Or Not); Possibly undefined macro AC_ENABLE_SHARED; Update on Charges; Sunburst Visualisation; Spectral Embeddings (Distances -> Coordinates); Introduction to Causality; Filtering To Help Colour-Blindness; ASUS 1015E-DS02 Too; Ready Player One; Writing Clear, Fast Julia Code; List of LatAm Novels; Running (for women); Building a Jenkins Plugin and a Jar (for Command Line use); Headphone Test Recordings; Causal Consistency; The Quest for Randomness; Chat Wars; Real-life Financial Co Without ACID Database...; Flexible Muscle-Based Locomotion for Bipedal Creatures; SQL Performance Explained; The Little Manual of API Design; Multiple Word Sizes; CRC - Next Steps; FizzBuzz; Update on CRCs; Decent Links / Discussion Community; Automated Reasoning About LLVM Optimizations and Undefined Behavior; A Painless Guide To CRC Error Detection Algorithms; Tests in Julia; Dave Eggers: what's so funny about peace, love and Starship?; Cello - High Level C Programming; autoreconf needs tar; Will Self Goes To Heathrow; Top 5 BioInformatics Papers; Vasovagal Response; Good Food in Vina; Chilean Drug Criminals Use Subsitution Cipher; Adrenaline; Stiglitz on the Impact of Technology; Why Not; How I Am 5; Lenovo X240 OpenSuse 13.1; NSA and GCHQ - Psychological Trolls; Finite Fields in Julia (Defining Your Own Number Type); Julian Assange; Starting Qemu on OpenSuse; Noisy GAs/TMs; Venezuela; Reinstalling GRUB with EFI; Instructions For Disabling KDE Indexing; Evolving Speakers; Changing Salt Size in Simple Crypt 3.0.0; Logarithmic Map (Moved); More Info; Words Found in Voynich Manuscript; An Inventory Of 3D Space-Filling Curves; Foxes Using Magnetic Fields To Hunt; 5 Rounds RC5 No Rotation; JP Morgan and Madoff; Ori - Secure, Distributed File System; Physical Unclonable Functions (PUFs); Prejudice on Reddit; Recursion OK; Optimizing Julia Code; Cash Handouts in Brazil; Couple Nice Music Videos; It Also Works!; Adaptive Plaintext; It Works!; RC5 Without Rotation (2); 8 Years...; Attack Against Encrypted Linux Disks; Pushing Back On NSA At IETF; Summary of Experimental Ethics; Very Good Talk On Security, Snowden; Locusts are Grasshoppers!; Vagrant (OpenSuse and IDEs); Interesting Take On Mandela's Context

© 2006-2013 Andrew Cooke (site) / post authors (content).

Optimising Clojure

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 08:54:22 -0300

I had some Clojure code that needed optimising.  Eventually I achieved a 5x
speedup.  This is how.


Finding Something to Measure
----------------------------

First, I needed something to measure, so that I could tell how effective my
changes were.  This was complicated by three things: the need to generate test
data to process; the delayed action of the JIT; garbage collection.

Generating test data is quite expensive, so I needed to carefully tune
parameters so that my processing used more time than the test data generation
(not necessary for the simple timings, but important for profiling, since I
could find no simple way to delay profiling to start after the data was
ready).  Then I placed the processing in a loop, repeating it multiple times
(re-using the same test data).  Each pass through the loop was timed - the
successive times let me see when the JIT engaged (typically the first
calculation was 50% slower) and also reduce the effect of GC by selecting the
lowest value.


Profiling
---------

Once I had a good test running in my IDE, I needed to make standalone Java
classes to profile.  The simplest way to do this seems to be to use a build
tool called leiningen - https://github.com/technomancy/leiningen - which was
very easy to use.  You just download and run it, create an empty project, and
modify the project descriptor file (which I then copied into my existing
project).

My file contents are:

  (defproject compressive "0.0.0"
    :description "set-related experiments"
    :aot [#"com.isti.compset.*"]
    :main com.isti.compset.stack
    :dependencies [[org.clojure/clojure "1.3.0-beta1"]
		   [org.clojure/clojure-contrib "1.2.0"]])

where:

  - aot indicates which classes need compiling
  - main indicates the location of the "main" file
  - dependencies are downloaded automatically

I also added the (-main) method and put (:gen-class) in my ns at the top of
the file.

With all that, "lein compile" generated a set of classes that could be run
with java:

  java -cp classes:lib/clojure-1.3.0-beta1.jar \
    -agentlib:hprof=cpu=samples,depth=10,file=hprof com.isti.compset.stack

giving a profile in the file "hprof".

This profile was dominated by expected methods related to generating
and consuming sequences, and adding and multiplying generic numbers.


Vector of Doubles
-----------------

The central loop in my code sums various spectra and the square of their
values.  The spectra were stored as "vec" instances.  I hope to remove the
use of generic numbers by replacing this with a vector of doubles and, if
necessary, adding some type annotations to various functions.

So I replaced "vec" with

  (defn d-vec [collection]
    (apply conj (vector-of :double) collection))

However, this did not have the desired effect - the code was
significantly slower and the trace was dominated by calls to "Vec.count".

I asked for help on stack overflow http://stackoverflow.com/q/7223297/181772
but didn't get any very useful response, so I removed these changes.


Array of Doubles
----------------

At this point I was a little disheartened.  The traces didn't have much useful
information (that I could see, even using a nice little tool called PerfAnal
http://java.sun.com/developer/technicalArticles/Programming/perfanal/).

So I looked at my inner loop and decided that I would simply rewrite it as I
thought it should be written, relying on experience rather than profiling.  I
also found this page http://clojure.org/java_interop on the use of arrays
helpful.

I made the following changes:

 - replace the vector in which I was accumulating results with a native array
   of doubles (using double-array)

 - mutate the accumulator rather than creating a new instance each loop

 - use additional indices to identify the "active range" of the accumulator
   (which may get smaller over time, due to restricted data ranges in
   spectra).    This had not been necessary when I was generating new
   immutable instances - the instances simply decreased in size.

 - change the spectra from vectors to arrays.

 - replace some maps and reduces with explicit loops that mutate data.

This was, initially a complete disaster.  Running time increased by roughly a
hundredfold.


Type Annotations
----------------

Rather than reverting these changes, which seemed justified from experience, I
looked again at profiling.  The profile was now dominated by methods related
to java introspection.

This is a known issue with Clojure and has a simple solution - add

  (set! *warn-on-reflection* true)

and remove all warnings by adding type hints.  I did not enable this for all
code - only for the central loop and related functions.  And I used the
"^doubles" hint which indicates a native array of doubles.

This, finally, gave a significant speedup - about 5x faster than the initial
code.


Conclusions
-----------

 - Clojure code is simplest, and reasonably efficient, when you stay within
   Clojure's own types (vec, seq etc).

 - Profiling with hprof can give broad information, but I, at least, couldn't
   understand it in detail (I normally can read profiling data!) - there
   seemed to be information missing, or I simply do not understand how Clojure
   works.

 - It was fairly simple to change the code to use typed, mutable data in the
   inner loop, which gave significant speedups.

 - Dynamic interop with Java is apallingly slow, but easy to fix once you see
   that it is happening.

 - More generally, Clojure works fairly well for exploratory data processing.
   It's nicely high-level and easy to work with, and can be optimised where
   necessary.  I have also implemented this particular algorithm in C and on
   GPUs, and I am sure it is faster in both cases, but (without doing any
   formal measurements) the code feels close enough to C for me to experiment
   with new ideas quickly.  I doubt this would have been possible in Python,
   for example (even with numpy - the inner loop is annoyingly hard to
   vectorize)

More on (vector-of :double)

From: andrew cooke <andrew@...>

Date: Wed, 31 Aug 2011 22:50:48 -0300

Justin Kramer at http://stackoverflow.com/q/7223297/181772 sugegsted using 

  (into (vector-of :double) collection)

to create vectors of doubles.  This seemed like a good idea for the immutable
spectra (it's probably not clear above but I introduced arrays of soubles in
two places - the mutable accumulator used to sum spectra, and the indvidual
spectra themselves).  However, when I tried it, I found that there was no way
to add type hints for these vectors.  That led to warnings of dynamic code
when the contents were used.

So using arrays of doubles is heping in two ways - it's supporting fast,
mutable access and also helping the compiler infer (from the ^doubles hint for
the array) the tyoe of the content.

Andrew

PS I posted this on HN where it gained some view, but no useful comments.

Comment on this post