Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Offside Parsing Works in LEPL

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:14:54 -0400

I just got a complete test working for offside (whitespace/indentation
sensitive) parsing working in LEPL (my Python parser -
http://www.acooke.org/lepl)

What follows is a re-formatted version of a test from this file -
http://code.google.com/p/lepl/source/browse/src/lepl/offside/_test/pithon.py?spec=svn362f24c528fa6988e13953eebb1325956295696b&r=362f24c528fa6988e13953eebb1325956295696b


Here's the grammar (note that I have hardly any structure - there's no
clear definition of statements or commands or variables, it's just
enough to use the indentation-aware code):

# these are the basic tokens that the lexer
# recognises - whitespace is then handled
# automatically
word = Token(Word(Lower()))
continuation = Token(r'\\')
symbol = Token(Any('()'))

# the ~ here means these are used to match
# but discarded from the results
introduce = ~Token(':')
comma = ~Token(',')

# first we need to define how a single
# logical line can continue over many
# lines in the text
CLine = CLineFactory(continuation)

# if we don't want lines to continue,
# we could just use the BLine() matcher

# next a minimal language definition that
# says statements are sequence of words
statement = word[1:]

# argument lists can extend over multiple
# lines (the parser will "know" their extent
# because they are inside (...))
args = Extend(word[:, comma]) > tuple

# and a function header is some words followed
# by the argument list
function = \
  word[1:] & ~symbol('(') & args & ~symbol(')')


# now we get to the interesting part.  we
# introduce blocks, which are indented
# relative to the surrounding text
block = Delayed()

# and lines which are what are inside blocks.
# note that a block is a valid line
# because we can nest blocks, and an empty
# line can appear too.  finally we collect
# the output in a Python list so we can
# see the structrue in the result
line = Or(CLine(statement),
          block,
          Line(Empty()))        > list

# now we can define the block: it comes
# after a function header or statement
# (both those end in introduce - ":") and
# contains lines.
block += \
  CLine((function | statement) & introduce) \
  & Block(line[1:])

# and a program is a list of lines.
program = (line[:] & Eos())

# the usual LEPL way to make a parser,
# with a new configuration type. the
# policy argument is the number of spaces
# needed in an indent for a single block.
return program.string_parser(
  OffsideConfiguration(policy=2))


And here's the text that we will parse:

this is a grammar with a similar
line structure to python

if something:
  then we indent
else:
  something else

def function(a, b, c):
  we can nest blocks:
    like this
  and we can also \
    have explicit continuations \
    with \
any \
       indentation

same for (argument,
          lists):
  which do not need the
  continuation marker


Running the parser against that text gives the following, where the
nested lists indicate that we have matcher the block structure
correctly:

[ [],
  ['this', 'is', 'a', 'grammar', 'with', 'a', 'similar'],
  ['line', 'structure', 'to', 'python'],
  [],
  ['if', 'something',
    ['then', 'we', 'indent']],
  ['else',
    ['something', 'else'],
  []],
  ['def', 'function', ('a', 'b', 'c'),
    ['we', 'can', 'nest', 'blocks',
      ['like', 'this']],
    ['and', 'we', 'can', 'also', 'have', 'explicit',
     'continuations', 'with', 'any', 'indentation'],
    []],
  ['same', 'for', ('argument', 'lists'),
    ['which', 'do', 'not', 'need', 'the'],
    ['continuation', 'marker']]]


I hope to release a beta containing this in the next few days, and
will then start working on documentation.  When the docs are done I
will release a new version.

If you want to try this now, you can get the code from the hg repo -
http://code.google.com/p/lepl/source/checkout

Andrew

What's so Neat...

From: andrew cooke <andrew@...>

Date: Sat, 12 Sep 2009 12:44:10 -0400

...about this is that - despite some need to rewrite things - it all
fits into the existing LEPL architecture.  This is a "big deal"
because whitespace parsing mixes information between different levels
of the parser.  The presence of "(...)" or a continuation marker like
"\" influences what the whitespace "means", so while we can detect
indentation in the lexer, we cannot interpret it until the parser
itself is running.  But at the same time, we want to avoid the need to
explicitly add tokens for continuation markers and indentations
"inside" the definitions for statements, expressions etc - the line
structure should be as isolated as possible (imagine having to write a
grammar where between each word you need to include the possibility
that the continuation character appears at that particular point).

Another problem was the "global" state required to handle the current
indentation.  It turns out that LEPL's concept of monitors was a
perfect match for this.

Related to the above was the issue of how to provide a clean,
declarative syntax.  To do this I built on the ideas already
implemented for tokens, and extended streams with filters.  It took a
few iterations, but I am really happy with the final result.

And using LEPL's generic configuration and graph rewriting means that
these new extensions can be integrated with the existing code without
breaking other modules....

I'm *so* pleased this has worked :o)

Andrew

More Offside Documentation

From: andrew cooke <andrew@...>

Date: Wed, 16 Sep 2009 22:20:00 -0400

There's now an initial draft of a new chapter at
http://www.acooke.org/lepl/offside.html

Andrew

Delayed due to State

From: andrew cooke <andrew@...>

Date: Sat, 19 Sep 2009 09:25:55 -0400

Offside support has been delayed slightly because it breaks when used
with memoisation.  This is because (I think) the current indentation
level is not taken into account by memoizers.

Consider the end of a block.  At the end of the block another line is
attempted.  This fails because the indentation is incorrect.  So the
block ends, decrementing the indentation level, and the line is tried
again outside the block.  However, *exactly* the same stream is used
for the line matcher in both cases.  So the second time the memoizer
for the line says "nope, we already know this failed".  When it should
have succeeded, because the indentation level is now correct.

The only clean solution I can see is to introduce the concept of
global (ie per thread) state (a dictionary) in which values (like
current indentation can be stored).  Memoizers then combine the hash
of that state with the hash of the stream to detect repetition.

But will that be sufficient?  What about when two such cases above are
nested?  Will the "inner" case be expanded?  I think so, but am not
100% sure.

Andrew

Comment on this post