| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

Last 100 entries

British Words; Chinese Govt Intercepts External Web To DDOS github; Numbering Permutations; Teenage Engineering - Low Price Synths; GCHQ Can Do Whatever It Wants; Dublinesque; A Cryptographic SAT Solver; Security Challenges; Word Lists for Crosswords; 3D Printing and Speaker Design; Searchable Snowden Archive; XCode Backdoored; Derived Apps Have Malware (CIA); Rowhammer - Hacking Software Via Hardware (DRAM) Bugs; Immutable SQL Database (Kinda); Tor GPS Tracker; That PyCon Dongle Mess...; ASCII Fluid Dynamics; Brandalism; Table of Shifter, Cassette and Derailleur Compatability; Lenovo Demonstrates How Bad HTTPS Is; Telegraph Owned by HSBC; Smaptop - Sunrise (Music); Equation Group (NSA); UK Torture in NI; And - A Natural Extension To Regexps; This Is The Future Of Religion; The Shazam (Music Matching) Algorithm; Tributes To Lesbian Community From AIDS Survivors; Nice Rust Summary; List of Good Fiction Books; Constructing JSON From Postgres (Part 2); Constructing JSON From Postgres (Part 1); Postgres in Docker; Why Poor Places Are More Diverse; Smart Writing on Graceland; Satire in France; Free Speech in France; MTB Cornering - Where Should We Point Our Thrusters?; Secure Secure Shell; Java Generics over Primitives; 2014 (Charlie Brooker); How I am 7; Neural Nets Applied to Go; Programming, Business, Social Contracts; Distributed Systems for Fun and Profit; XML and Scheme; Internet Radio Stations (Curated List); Solid Data About Placebos; Half of Americans Think Climate Change Is a Sign of the Apocalypse; Saturday Surf Sessions With Juvenile Delinquents; Ssh, tty, stdout and stderr; Feathers falling in a vacuum; Santiago 30m Bike Route; Mapa de Ciclovias en Santiago; How Unreliable is UDP?; SE Santiago 20m Bike Route; Cameron's Rap; Configuring libxml with Eclipse; Reducing Combinatorial Complexity With Occam - AI; Sentidos Comunes (Chilean Online Magazine); Hilary Mantel: The Assassination of Margaret Thatcher - August 6th 1983; NSA Interceptng Gmail During Delivery; General IIR Filters; What's happening with Scala?; Interesting (But Largely Illegible) Typeface; Retiring Essentialism; Poorest in UK, Poorest in N Europe; I Want To Be A Redneck!; Reverse Racism; The Lost Art Of Nomography; IBM Data Center (Photo); Interesting Account Of Gamma Hack; The Most Interesting Audiophile In The World; How did the first world war actually end?; Ky - Restaurant Santiago; The Black Dork Lives!; The UN Requires Unaninmous Decisions; LPIR - Steganography in Practice; How I Am 6; Clear Explanation of Verizon / Level 3 / Netflix; Teenage Girls; Formalising NSA Attacks; Switching Brakes (Tektro Hydraulic); Naim NAP 100 (Power Amp); AKG 550 First Impressions; Facebook manipulates emotions (no really); Map Reduce "No Longer Used" At Google; Removing RAID metadata; New Bike (Good Bike Shop, Santiago Chile); Removing APE Tags in Linux; Compiling Python 3.0 With GCC 4.8; Maven is Amazing; Generating Docs from a GitHub Wiki; Modular Shelves; Bash Best Practices; Good Emergency Gasfiter (Santiago, Chile); Readings in Recent Architecture; Roger Casement; Integrated Information Theory (Or Not); Possibly undefined macro AC_ENABLE_SHARED; Update on Charges

© 2006-2013 Andrew Cooke (site) / post authors (content).

Lessons Learned from AppEngine's Data Store

From: andrew cooke <andrew@...>

Date: Tue, 2 Aug 2011 19:57:31 -0400

This is a brief summary of the things I've learnt while using Google's
AppEngine Data Store - a "NoSQL" database designed for high performance.

  1 - Do this!  I was wary of AppEngine because of lock-in, etc, but you can
      easily get Django working, which avoids learning a whole new framework,
      and Django non-rel has the promise to liberate you completely, if needed
      (but see below).

      No amount of reading about "NoSQL" taught me what I learnt writing code
      - if you're a programmer, the GAE Data Store is a great intro.

  2 - Think hard about how your application works up-front.  This is a big
      shift from SQL, where you probably had a logical, normalised,
      independent, data model and then mapped between that and your
      application with SQL.  You can't do that with the Data Store.  Instead,
      you need to design the data model around the actions that occur in your

      In other words: with SQL you have the luxury of a layer of isolation
      (SQL) between your database and your data access objects.  With Data
      Store, your database maps directly to your data access objects.

  3 - Think hard about where you need transactions, and where not.  Again,
      this is reflected directly in the data model.  The one aspect of the
      Data Store that has impressed me most is how they have managed to
      combine scalability with transactions.

      For me, the necessary structure was pretty clear - I have users that
      "own" certain objects, so those "owned" objects are children of the
      users.  This lets me guarantee consistency where I need it (where users
      can see an account balance, for example).  Separate from that, and free
      of any transactions or trees, are the main data in my application.
      These are not guaranteed to be immediately consistent, but are much more
      efficiently handled.  The data model reflects all this.

  4 - Think about how caching can fail.  This isn't NoSQL-specific, but it's
      important anyway: caching gets easier the less strict you are about
      behaviour.  Choose the design so that if you cache too much, or for too
      long, it's not a problem - make it generous by default (so, for example,
      I have a resource that expires after a certain time, but I don't care
      whether caching extends that - what is important is that I never
      over-restrict a user).

      Related: use negative caching only where it is absolutely critical.
      It's so easy to get in a mess here...

  5 - Carefully choose the keys for your cache.  They should reflect the
      entire state you are caching, so that you don't need to worry about
      retrieving inconsistent data.

  6 - Clean out your database in a separate thread.  Omit non-critical
      write/delete operations from views.  Instead, delegate them to a
      background worker task.  This is particularly true when deleting - it's
      a slow, painful process to delete large amounts of data from the store.

      Inconsistency is your friend.  Much of your code has to work assuming
      inconsistent data anyway, so consider turning it to your advantage -
      assume very little and then tidy things later in a separate, batch task.

  7 - Don't rely on Django non-rel until you understand the store without it.
      The non-rel package was a great help when I was starting - my initial
      code looked like a nice, familiar Django project.  Then I began
      wondering just what "eventual consistency" might mean and realised I had
      some very nasty bugs, because non-rel doesn't currently support

      And even when transactions are added to non-rel (they are work in
      progress), I would suggest using the basic models Google provides until
      you understand the system in detail.  Despite reading much of the
      documentation I really didn't grasp how everything worked until I had
      used the API.

      So I would suggest the following: if it helps, start with non-rel to get
      your first views working; switch to Google's models until you understand
      transactions; go back to non-rel if and when you are confident it makes

      [Beware that the non-rel and related packages bundle 1.3, while 1.2 is
      the latest supported directly by AppEngine - I switched back to 1.2 when
      I switched models and it was worth it just for the reduced deploy time]

In summary:

 - Your data model must reflect the actions your code performs.
 - Your data model must reflect the transactions your code needs.
 - Simplify cache use with careful key choice and relaxed behaviour.
 - Don't try to keep everything consistent in your views - delegate 
   "clean-up" to worker tasks.


The Site in Question

From: andrew cooke <andrew@...>

Date: Tue, 9 Aug 2011 09:49:10 -0400

BTW, the site on which teh above is based is http://www.parti.cl - it provides
user-icons as a service to other sites.  So if you want attractive icons to
put next to users, you just load the images from there.

Comment on this post