C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

Detailed Discussion of Message Dispatch in ParserCombinator Library for Julia

From: andrew cooke <andrew@...>

Date: Sun, 28 Jun 2015 07:13:57 -0300

Introduction

I am going to describe some of the decisions behind the
ParserCombinator library, which, I hope, will show how useful Julia's
method-centred, multiple dispatch can be.

OK, so I wanted to write a parser combinator library in Julia.

If you don't know what a parser combinator library is, here's a quick
sketch:

  There's a pretty neat, easy to understand, way of writing parsers by
  hand, which is common in functional languages, where you construct a
  "tree" (or even a DAG) of functions that take a stream of characters
  as input and return some structured output as a result (along with
  the remaining text).

  The stream of characters is (of course) what you want to parse, and
  the structured output becomes your AST.

  Here's a simple (untested!) example (in Julia):

      function equals(text)
          return function _(string)
              n = length(text)
              if string[1:n] == text
                  return string[n:end], text
              else
                  throw(Backtrack())
              end
          end
      end

  By itself that doesn't seem very useful, but consider

      function follows(m1, m2)
          return function _(string)
              string1, result1 = m1(string)
              string2, result2 = m2(string1)
              return string2, [result1, result2]
          end
      end

  which can be used like:

     grammar = follows(equals("hello"), equals("world"))
     grammar("helloworld")
  
  and returns ["hello", "world"].

  And, of course, you can go crazy, with functions that test
  alternatives, or repeat a given number of times, or as often as
  possible, etc etc.

So that's the kind of thing I wanted to write.  But when you start
digging into the details, it's not quite so simple.


Complications

First, Julia isn't intended to be used as a functional language.  It's
not designed to support deep, recursive calls of functions (in fact,
the stack limit in a simple test is a little greater than 100,000 on
my machine, so providing you write combinators for multiple matches
that work with iteration rather than recursion, that's probably not
such a serious issue).

Second, there are different variations on the general idea of "parser
combinators", which work in slightly different ways.  For example, you
might want to cache calls to matchers so that when the "same" call is
made a second time, the cached value is used.  Or you might want to
write the parser so that all possible parses (of an ambiguous grammar)
are returned.  Or you might want to restrict backtracking so that
error messages are more reliable.  Or you might want to display a
record of what is happening to help with debugging.  Or...

These variations, on a little inspection, tend to be connected more
with how the parser "executes", rather than with how the individual
matchers are implemented.

What do I mean by that?  Take, for example, the idea that you cache
results to avoid repeated evaluation.  That is trivial in a "lazy"
language.  But in an eager language (like Julia) you need to intercept
each call to a function, so you can check whether it was cached
earlier.  These are, in a sense, "meta" issues, that have little to do
with checking whether one string is equal to another.

So the question is: can we write our library in a way that makes it
easy to change how the parser "executes"?  And the answer is,
thankfully, yes (or I wouldn't be writing this article)!

How do we do this?  We need two things.  First, we need to use a
"trampoline" to take control of execution.  Second, we need some way
of allowing different behaviors to be "plugged in".  In Julia, this is
typically done by multiple dispatch with methods.  And it works really
well.


Trampolines

All a trampoline is, really, is a "manual" replacement for what a
compiler does automatically: it's a loop, with a stack, that calls
each function in turn, checks what the result is, and then calls
another function, as appropriate.

In practice, what that means is that when we write a function, and we
need to call some other function, we don't just "make the call".
Instead, we return a "message" to the trampoline that says "please
call this other function and then give me the result".

In addition, of course, we need the main loop, which receives these
messages and does the work.

That may sound complicated, but in fact it's not so much work.
Particularly when you're only evaluating matchers in a parser, which
all work in a similar way.

Here's most of the code needed for the main loop, taken from
parsers.jl:

    type NoCache<:Config
        source::Any
        stack::Array{Tuple{Matcher, State},1}
    end

    function dispatch(k::NoCache, e::Execute)
        push!(k.stack, (e.parent, e.parent_state))
        execute(k, e.child, e.child_state, e.iter)
    end

    function dispatch(k::NoCache, s::Success)
        (parent, parent_state) = pop!(k.stack)
        success(k, parent, parent_state, s.child_state, s.iter, s.result)
    end

    function dispatch(k::NoCache, f::Failure)
        (parent, parent_state) = pop!(k.stack)
        failure(k, parent, parent_state)
    end

Those functions are simply responding to the different message types
(Execute, Success and Failure) by manipulating the stack and calling
the appropriate matchers.  Everything is driven by this main loop:

    while true
        msg = dispatch(k, msg)
    end
    
(plus some extra details for handling the setup and teardown).

Now if you're trying to understand the above in detail, you're
probably wondering what execute(), success() and failure() are.  These
are the methods where the matcher itself is implemented.

For example, an "equals" matcher looks like this:

    type Equal<:Matcher
        string
    end

    function execute(k::Config, m::Equal, s::Clean, i)
        for x in m.string
            if done(k.source, i)
                return FAILURE
            end
            y, i = next(k.source, i)
            if x != y
                return FAILURE
            end
        end
        Success(DIRTY, i, Any[m.string])
    end

where FAILURE is the singleton Failure() message and most of the work
is just checking for equality, character by character.

A more complex matcher, which calls out to child matchers, has work
spread across several methods.  Its success() method is called when a
child matcher successfully matches, for example.

You may still be a little confused, however, because Equal, above, is
a type, not a function.  That's because it makes more sense in Julia
to work this way.

Let me explain in more detail...


Grammar As Types

Originally, when I sketched how parser combinators worked, I described
them as functions that returned functions that parsed data (see
"equals()" and "follows()" above).

But in Julia, when you use a trampoline, it makes much more sense to
use types.  So constructing a grammar is not done be calling
functions, but by calling type constructors (like "Equal()" above).

That's because Julia is based around methods, which are dispatched on
multiple types.  By using types to describe our grammar we can then
use method dispatch to implement the work.

And that's what you can see above.  The execute() method includes
"m::Equal", which means that specific execute() is called ONLY when m
is an instance of Equal.  So that execute() is equivalent, roughly, to
the anonymous function in the original sketch.

It's similar to OOP, where the method "belongs" to the type.

To make it completely clear, let's work through things in more detail.

We have to start somewhere, and I am trying to avoid the details of
how we bootstrap or return a final result, since the important idea is
"in the middle".  So let's just assume that some other matcher (an
instance of a type) is similar to the "follows()" function above, in
that it contains two Equals children.  And we'll start with the
execute() method associated with that matcher returning an Execute
message to the trampoline, which contains the first Equal child
(containing the string "hello").

So the parent matcher is saying to the trampoline "call this Equal for
me".

The main loop of the trampoline receives the Execute message, saves
the parent matcher on the stack, and then calls the dispatch() method
for Execute.  That results in calling the execute(... m::Equal, ...)
method, which checks the input against "hello" in the child.

Assuming that matches, the child returns a Success message, which the
trampoline receives.  The trampoline pops the original matcher from
the stack and calls its success() method.  That, in turn, then returns
another Execute message, to match "world", etc etc.

Obviously these matchers need to save data as they work.  The parent
matcher discussed above, for example, needs to save "hello" while it
calls the second Equals for "world".  This is done in the State types
that appear in the dispatch code.

A very nice side effect of having this explicit state is that it makes
adding memoization very easy, because we know exactly when the context
is identical (when the state is identical) and so can decide when to
use the cache.


Was That Worth It?!

OK, so at this point you're probably starting to see how things work
(I hope).  It may help to think of the grammar (the DAG of types) as a
"program" that the trampoline is executing - it's very like an
interpreter, where the grammar is the "program" that the trampoline
"executes" given some input (which is being parsed).

But you may ask "OK, I kind-of get how it works, but was all that
complexity worth it?"

Well, first of all, it's not as complex as it sounds.  It's different,
sure.  Which means it takes some getting used to.  But with a little
experience it becomes surprisingly easy to understand.  The matchers,
for example, are "just" state machines, where the State instances
describe the state, and the execute(), success() and failure() methods
drive transitions from one state to another.

More than "not as bad as it looks" - it's actually surprisingly
elegant.  In a traditional interpreter the main loop is a case
statement that checks the type of what is being executed and then
calls the appropriate function.  Here, in a sense, the function is
always the same - "execute()" - and the right choice of which
particular execute is made for us, by Julia's type system.  Things get
even better when many matchers have similar functionality because they
can, by sharing a common super-type, share a single implementation.
In the case of the ParserCombinator library, for example, most
matchers that call a child matcher are derived from a Delegate
super-type, which provides common support.

And, second of all, it's easy - almost trivial - to hack things at a
"meta" level.  Take the example of caching results.  All you need to
do is add a Map (the cache) to a new sub-type of Config and provide
new dispatch() functions as above: the dispatch for Success should
populate the Map with results; the dispatch for Execute should check
the Map in case a value already exists.  That's it!


Multiple Dispatch v OOP

If the same ideas were implemented in a traditional OOP language, much
of what I have described above would carry across nicely.  There would
be a Trampoline base class that would be subclassed for different
approaches to execution, for example.

But Julia's multiple dispatch wins out when you want behavior to "cut
across" more than one type.

In the ParserCombinator library, one of the approaches to execution
emulates the Parsec library from Haskell.  This doesn't allow
backtracking if the source has been successfully matched.  But one,
"magic", matcher, Try() changes this.  So this one particular matcher
has to "know" about the trampoline in more detail than normal, so that
it can enable backtracking.

How do you do that in a traditional OO language?  It's not so clear,
because the matchers are, presumably, classes that are quite separate
from the trampoline (in Java, for example, you start having to worry
about "package private" access)

But in Julia a method can dispatch on multiple types.  So there's
nothing to stop you having an execute() method that is for one
specific trampoline type and one specific matcher type.  And which is
only called when both those types are used together.

(It may confuse more than it helps, but a similar idea is used here -
http://acooke.org/cute/FizzBuzzin0.html)


Conclusions

I've tried to show how you can implement a parser combinator library
with more than one way for the parser to "execute", by providing
pluggable trampolines.

Much of the approach can be done in any OO language (and the general
idea comes from an earlier attempt in Python, called Lepl).  But
Julia's methods make things particularly easy.

If you want to try ParserCombinator for yourself, please check it out
at https://github.com/andrewcooke/ParserCombinator.jl

Thanks!

Comment on this post