Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Writing Clear, Fast Julia Code

From: andrew cooke <andrew@...>

Date: Wed, 30 Apr 2014 20:22:53 -0400

Disclaimer: Let’s get something clear. This is about writing a Julia package, not about doing data analysis. Julia means different things to different people - this post is about using Julia as a “real programming language” (not for scripting, not for data analysis).

Every language has its own set of idioms - its own approach to structuring programs. If you’re an experienced developer you’re fluent in the languages you use every day. Writing good, idiomatic code comes naturally. But working with a new language is a shock - suddenly things don’t fit together. Your programs are uncomfortable, ungainly, unbalanced.

This was my problem. I wrote a Julia library to calculate CRCs (checksums). But things didn’t turn out as I expected - the maths was fairly easy, but my code was an ugly mess.

Worse, my code was an inefficient, ugly mess. And making it faster was making it uglier and uglier.

Eventually I stalled. I spent a week (or two) rewriting, trying to work out how to better express my ideas in Julia. Eventually made something I was happy with - https://github.com/andrewcooke/CRC.jl/blob/master/src/CRC.jl.

The new code is 2/3 the size (excluding new functionality I added later) and 2.5x faster. It’s also easier to understand.

Here I want to record what I learned. There’s little that’s really new. But still, it’s useful to collect things in one place.

1 - Simplify

Obvious, yes, but it’s easy to get carried away. Julia lets you write code that is independent of the size of integers; the compiler can automatically generate routines for different types (8, 16, 32, 64, 128 bits). So it seemed natural to write a CRC routine that could calculate checksums not just for bytes, but also for streams of 16, 32, 64 and 128 bit values.

Then there’s the size of the polynomial. And the size of the intermediate accumulator. And the size of the index into the lookup table. I let the user choose all of these, and soon had a combinatorial explosion of different integer types, requiring checks for inconsistencies and tweaked algorithms for special cases.

So I backed up; stripped things to the bone. Now, data are only bytes (all that anyone uses anyway). Lookup table indices are 8 bits (16 bits uses too much cache for too little gain). Only the accumulator size can vary (and is chosen internally - it must be large enough to contain the polynomial).

2 - “If” Statements Are A Code Smell

Simplified, my code still had too much gnarly logic. Too many “if” statements.

In an OO language, having many “ifs” often means you need classes for different cases. Then the conditionals change to method calls (the type of the object selecting which code to execute without an explicit “if” statement).

In Julia the equivalent is to dispatch on types. So instead of having one function with many logical paths, I have several, smaller, simpler functions, each to handle a particular type.

3 - Untangle Types

At this point I had a bunch of functions that dispatched on a complicated, parameterized type that described everything about the CRC.

When I looked more closely there were two separate (orthogonal) ideas that “drove” the CRC calculation.

First, whether the input data are reflected or not. This changes how we do the calculation (reflected data are not reflected byte by byte - that would be much too slow - instead “all the rest of the maths” is reflected, and the data are left untouched).

Second, whether we use 0, 1 or multiple lookup tables.

So I broke my complicated type into two: one had two subtypes (for the two directions); the other had three subtypes (for 0, 1 and many tables). Which gave 6 different versions of the main routine (called “extend”), one for each combination of the two types.

This was a huge step. Suddenly “extend” (the main loop for calculating the checksum) became much simpler. The separation of logic was obvious. Everything “just made sense”.

4 - Types Are For Data

But still, I had duplication. A handful of details (like the amount of padding needed to align data with the accumulator) were being re-calculated in multiple places.

Looking more carefully, there were two sets of values, corresponding to whether the data were reflected are not (indicated by one of the two types above). So it made sense to place the values inside the types, calculating them once in the appropriate constructors.

At this point it was easy to see the parallel between standard OO design and Julia’s types / methods. Each type was like a class; these variables were instance attributes. But Julia’s approach made it easier to also handle the lookup tables, dispatching on both together.

5 - Efficiency Requires Typing

Now the code was looking pretty good. As a bonus, the six different functions that calculated the checksums (one for each combination of the two main types) had all arguments clearly typed.

This is critical for high performance Julia code - you want the compiler to be able to figure out the exact type of everything in the inner loop. And you want those types to be single, concrete values (not unions).

A side note - This is the great power of Julia. In high level routines you can write generic code like you would in, say, Python. And then, when you get down to the innermost loops, you can write code that is effectively C with a nice syntax. It’s not that the authors of Julia have found a way to magically make elegant, abstract, code perfectly efficient. They have not. Instead they have made a language which can seamlessly transition from elegant abstraction to low-level, painstaking detail.

6 - Writing Fast Code Hasn’t Changed Much

Continuing with the point above, there’s nothing unusual about writing fast code in Julia (at least most of the time; it’s a new language and some bugs are still being worked out, so you can see the occasional strangeness). For example:

  • Bounds checking has a cost, of course, but can be disabled with the @inbounds macro (assuming you know the code is correct).
  • Loop unrolling is easy with the @nexprs macro in Base.Cartesian (see the examples in CRC.jl).
  • Allocation and GC are not a good idea in the inner loop. Pre-allocate arrays as necessary.
  • Function calls are not free. Prefer iteration over recursion. Inlining of small functions works, mostly (though there are some related bugs which I suspect won’t be completely fixed til the 0.4 release).
  • In critical code, directly indexing arrays (with @inbounds) seems to be slightly faster than the iterator protocol (it also makes typing the container easier).
  • Julia’s core library routines are very good. Smart people are putting a lot of effort into making them efficient.
  • There’s a useful (if somewhat clunky) graphical profiler - see the ProfileView package.

Summary

Pulling everything into one list:

  1. Simplify. Simple, clear code can be made fast.
  2. Dispatch on types instead of using explicit conditionals.
  3. Untangle types - you can dispatch on combinations of types.
  4. Store data in types (they are Julia’s objects).
  5. The inner loop needs to be clearly typed.
  6. At a low level, efficient Julia code is much like C or Fortran.

Hope that helps!

Andrew

Thanks, this does help.

From: "Campbell, Keith C." <keithc@...>

Date: Thu, 1 May 2014 12:25:20 +0000

I particularly like your discussion of performance in section 5 'Efficiency=
 Requires Typing'.  Understanding how Julia can deliver good performance, a=
nd what is required to get there, can be confusing and frustrating.  Your e=
xplanation is nice and clear.

Comment on this post