Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next


Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Get Coffee Quotes for Cat Soft LLC; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; Now Is Cat Soft LLC's Chance To Save Up To 32% On Mail; Call Center Services for Cat Soft LLC; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; Sanity Check For Nuclear Launch; Entropy and Life; [Link, Bike] Cheap Cycling Jerseys; [Link, Music] Music To Steal 2017; [Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria; [Review, Books, Links] Kazuo Ishiguro - Never Let Me Go; [Link, Books] David Mitchell's Favourite Japanese Fiction; [Link, Bike] Rear Suspension Geometry; [Link, Cycling, Art] Strava Artwork; [Link, Computing] Useful gcc flags; [Link] Voynich Manuscript Decoded; [Bike] Notes on Servicing Suspension Forks; [Links, Computing] Snap, Flatpack, Appimage; [Link, Computing] Oracle is leaving Java (to die); [Link, Politics] Cubans + Ultrasonics; [Book, Link] Laurent Binet; VirtualBox; [Book, Link] No One's Ways; [Link] The Biggest Problem For Cyclists Is Bad Driving; [Computing] Doxygen, Sphinx, Breathe; [Admin] Brokw Recent Permalinks; [Bike, Chile] Buying Bearings in Santiago; [Computing, Opensuse] Upgrading to 42.3; [Link, Physics] First Support for a Physics Theory of Life; [Link, Bike] Peruvian Frame Maker; [Link] Awesome Game Theory Tit-For-Tat Thing; [Food, Review] La Fabbrica - Good Italian Food In Santiago; [Link, Programming] MySQL UTF8 Broken; [Link, Books] Latin American Authors; [Link, Computing] Optimizatin Puzzle; [Link, Books, Politics] Orwell Prize; [Link] What the Hell Is Happening With Qatar?; [Link] Deep Learning + Virtual Tensor Machines; [Link] Scaled Composites: Largest Wingspan Ever; [Link] SCP Foundation; [Bike] Lessons From 2 Leading 2 Trailing; [Link] Veg Restaurants in Santiago; [Link] List of Contemporary Latin American Authors; [Bike] FTHR; [Link] Whoa - NSA Reduces Collection (of US Residents); [Link] Red Bull's Breitbart; [Link] Linux Threads; [Link] Punycode; [Link] Bull / Girl Statues on Wall Street; [Link] Beautiful Chair Video; Update: Lower Pressures; [Link] Neat Python Exceptions; [Link] Fix for Windows 10 to Avoid Ads; [Link] Attacks on ZRTP; [Link] UK Jazz Invasion; [Review] Cuba; [Link] Aricle on Gender Reversal of US Presidential Debate; {OpenSuse] Fix for Network Offline in Updater Applet; [Link] Parkinson's Related to Gut Flora; Farellones Bike Park; [Meta] Tags; Update: Second Ride; Schwalbe Thunder Burt 2.1 v Continental X-King 2.4; Mountain Biking in Santiago; Books on Ethics; Security Fail from Command Driven Interface; Everything Old is New Again; Interesting Take on Trump's Lies; Chutney v6; References on Entropy; Amusing "Alexa.." broadcast; The Shame of Chile's Education System; Playing mp4 gifs in Firefox on Opensuses Leap 42.2; Concurrency at Microsoft; Globalisation: Uk -> Chile; OpenSuse 42.2 and Synaptics Touch-Pads; Even; Cherry Jam; Lebanese Writer Amin Maalouf; C++ - it's the language of the future; Learning From Trump

© 2006-2017 Andrew Cooke (site) / post authors (content).

Offside Parsing Works in LEPL

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:14:54 -0400

I just got a complete test working for offside (whitespace/indentation
sensitive) parsing working in LEPL (my Python parser -

What follows is a re-formatted version of a test from this file -

Here's the grammar (note that I have hardly any structure - there's no
clear definition of statements or commands or variables, it's just
enough to use the indentation-aware code):

# these are the basic tokens that the lexer
# recognises - whitespace is then handled
# automatically
word = Token(Word(Lower()))
continuation = Token(r'\\')
symbol = Token(Any('()'))

# the ~ here means these are used to match
# but discarded from the results
introduce = ~Token(':')
comma = ~Token(',')

# first we need to define how a single
# logical line can continue over many
# lines in the text
CLine = CLineFactory(continuation)

# if we don't want lines to continue,
# we could just use the BLine() matcher

# next a minimal language definition that
# says statements are sequence of words
statement = word[1:]

# argument lists can extend over multiple
# lines (the parser will "know" their extent
# because they are inside (...))
args = Extend(word[:, comma]) > tuple

# and a function header is some words followed
# by the argument list
function = \
  word[1:] & ~symbol('(') & args & ~symbol(')')

# now we get to the interesting part.  we
# introduce blocks, which are indented
# relative to the surrounding text
block = Delayed()

# and lines which are what are inside blocks.
# note that a block is a valid line
# because we can nest blocks, and an empty
# line can appear too.  finally we collect
# the output in a Python list so we can
# see the structrue in the result
line = Or(CLine(statement),
          Line(Empty()))        > list

# now we can define the block: it comes
# after a function header or statement
# (both those end in introduce - ":") and
# contains lines.
block += \
  CLine((function | statement) & introduce) \
  & Block(line[1:])

# and a program is a list of lines.
program = (line[:] & Eos())

# the usual LEPL way to make a parser,
# with a new configuration type. the
# policy argument is the number of spaces
# needed in an indent for a single block.
return program.string_parser(

And here's the text that we will parse:

this is a grammar with a similar
line structure to python

if something:
  then we indent
  something else

def function(a, b, c):
  we can nest blocks:
    like this
  and we can also \
    have explicit continuations \
    with \
any \

same for (argument,
  which do not need the
  continuation marker

Running the parser against that text gives the following, where the
nested lists indicate that we have matcher the block structure

[ [],
  ['this', 'is', 'a', 'grammar', 'with', 'a', 'similar'],
  ['line', 'structure', 'to', 'python'],
  ['if', 'something',
    ['then', 'we', 'indent']],
    ['something', 'else'],
  ['def', 'function', ('a', 'b', 'c'),
    ['we', 'can', 'nest', 'blocks',
      ['like', 'this']],
    ['and', 'we', 'can', 'also', 'have', 'explicit',
     'continuations', 'with', 'any', 'indentation'],
  ['same', 'for', ('argument', 'lists'),
    ['which', 'do', 'not', 'need', 'the'],
    ['continuation', 'marker']]]

I hope to release a beta containing this in the next few days, and
will then start working on documentation.  When the docs are done I
will release a new version.

If you want to try this now, you can get the code from the hg repo -


What's so Neat...

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:44:10 -0400

...about this is that - despite some need to rewrite things - it all
fits into the existing LEPL architecture.  This is a "big deal"
because whitespace parsing mixes information between different levels
of the parser.  The presence of "(...)" or a continuation marker like
"\" influences what the whitespace "means", so while we can detect
indentation in the lexer, we cannot interpret it until the parser
itself is running.  But at the same time, we want to avoid the need to
explicitly add tokens for continuation markers and indentations
"inside" the definitions for statements, expressions etc - the line
structure should be as isolated as possible (imagine having to write a
grammar where between each word you need to include the possibility
that the continuation character appears at that particular point).

Another problem was the "global" state required to handle the current
indentation.  It turns out that LEPL's concept of monitors was a
perfect match for this.

Related to the above was the issue of how to provide a clean,
declarative syntax.  To do this I built on the ideas already
implemented for tokens, and extended streams with filters.  It took a
few iterations, but I am really happy with the final result.

And using LEPL's generic configuration and graph rewriting means that
these new extensions can be integrated with the existing code without
breaking other modules....

I'm *so* pleased this has worked :o)


More Offside Documentation

From: andrew cooke <andrew@...>

Date: Wed, 16 Sep 2009 22:20:00 -0400

There's now an initial draft of a new chapter at


Delayed due to State

From: andrew cooke <andrew@...>

Date: Sat, 19 Sep 2009 09:25:55 -0400

Offside support has been delayed slightly because it breaks when used
with memoisation.  This is because (I think) the current indentation
level is not taken into account by memoizers.

Consider the end of a block.  At the end of the block another line is
attempted.  This fails because the indentation is incorrect.  So the
block ends, decrementing the indentation level, and the line is tried
again outside the block.  However, *exactly* the same stream is used
for the line matcher in both cases.  So the second time the memoizer
for the line says "nope, we already know this failed".  When it should
have succeeded, because the indentation level is now correct.

The only clean solution I can see is to introduce the concept of
global (ie per thread) state (a dictionary) in which values (like
current indentation can be stored).  Memoizers then combine the hash
of that state with the hash of the stream to detect repetition.

But will that be sufficient?  What about when two such cases above are
nested?  Will the "inner" case be expanded?  I think so, but am not
100% sure.


Comment on this post