Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

[Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life; [Link, Bike] Cheap Cycling Jerseys; [Link, Music] Music To Steal 2017; [Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria; [Review, Books, Links] Kazuo Ishiguro - Never Let Me Go; [Link, Books] David Mitchell's Favourite Japanese Fiction; [Link, Bike] Rear Suspension Geometry; [Link, Cycling, Art] Strava Artwork; [Link, Computing] Useful gcc flags; [Link] Voynich Manuscript Decoded; [Bike] Notes on Servicing Suspension Forks; [Links, Computing] Snap, Flatpack, Appimage; [Link, Computing] Oracle is leaving Java (to die); [Link, Politics] Cubans + Ultrasonics; [Book, Link] Laurent Binet; VirtualBox; [Book, Link] No One's Ways; [Link] The Biggest Problem For Cyclists Is Bad Driving; [Computing] Doxygen, Sphinx, Breathe; [Admin] Brokw Recent Permalinks; [Bike, Chile] Buying Bearings in Santiago; [Computing, Opensuse] Upgrading to 42.3; [Link, Physics] First Support for a Physics Theory of Life; [Link, Bike] Peruvian Frame Maker; [Link] Awesome Game Theory Tit-For-Tat Thing; [Food, Review] La Fabbrica - Good Italian Food In Santiago; [Link, Programming] MySQL UTF8 Broken; [Link, Books] Latin American Authors

© 2006-2017 Andrew Cooke (site) / post authors (content).

Implementing a Regular Expression Engine

From: "andrew cooke" <andrew@...>

Date: Sun, 22 Mar 2009 20:21:29 -0400 (CLT)

A week or two ago I posted a message here saying I had implemented a
regular expression engine.  I deleted it today...

It would be more correct to say I have been learning about how to
implement a regular expression engine by failing to implement one. 
However, I am making some progress.

My initial take on regular expressions was to treat them as a tree
(basically the parse tree you would expect from their string
representation).  While that wasn't necessarily wrong, it made life a lot
harder than it should have been - it is much easier to model them as
directed graphs.

One translated into graph form, it's pretty easy to generate a
non-deterministic finite automaton that does the appropriate matching. 
That's because it's pretty much the same as the core loop for trampolined
recursive descent parser.  You can even get all the different matches with
back-tracking (technically I am therefore implementing the NFA (with eta
transitions, which are useful to order different parts of the automoaton)
using a PDA, which is more general and so explains the connection to
recursive descent parsing).

That's as far as I have got.  A previous post here gives references on how
to translate the NFA to a DFA (which being deterministic is a lot easier
to "run"; the flip side is that it may have exponentially more states and
there's no way to implement it so that all variations (non greedy matching
etc) are returned).

My aim is to make this NFA a standard LEPL matcher.  The DFA
implementation will be used for lexing.  In the future I'd also like to
see if compiling Any, Literal, Word etc (and combinations with And, Or,
Repeat) to NFA make things faster.

Andrew

http://en.wikipedia.org/wiki/Automata_theory - the table at the bottom is
particularly useful.

Initial DFA Results

From: "andrew cooke" <andrew@...>

Date: Tue, 24 Mar 2009 20:52:27 -0400 (CLT)

This is sweet - it's nice when a piece of code does things you didn't
really expect it to do (I mean, in theory, I can see why it works, but in
practice, the way it solves the problems is almost "intelligent").

Here's an example for the regexp a(bc|b*d):

     a      d
  0 ---> 1 +--> 2* <--+----.
           |b     c/d |    |
           `--> 3 +---'    |
                  |b      d|
                  `--> 4 +-'
                  '      |b
                  |      |
                  `------'

or, as printed (the [...] are the original nfa nodes):

  0 [0]       a:1,
  1 [3, 5, 6] d:2;b:3,
  2 [1, 2]    label,
  3 [4, 5, 6] b:4;[c-d]:2,
  4 [5, 6]    b:4;d:2

Andrew

Original NFA

From: "andrew cooke" <andrew@...>

Date: Tue, 24 Mar 2009 21:11:09 -0400 (CLT)

Just for fun, here's the underlying NFA (with eta/empty transitions). 
This is easier to construct (both by hand and in code)

      a     b      c
  0 ---> 3 +--> 4 ----------+> 1 ---> 2 "label"
           |                |
           |              d |
           `--> 5 +--> 6 ---'
           '      |
           |   b  |
           `------'

0 a:3
1 2
2 label
3 b:4 5
4 c:1
5 b:5 6,
6 d:1

(This is all left to right except for the b* loop from/to 5 at the bottom)

Andrew

Epsilon!

From: "andrew cooke" <andrew@...>

Date: Tue, 24 Mar 2009 21:25:35 -0400 (CLT)

I've been calling my epsilons etas!  Ooops.  Andrew

Comment on this post