| Andrew Cooke | Contents | Latest | RSS | Twitter | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Lepl parser for Python.

Colorless Green.

Photography around Santiago.

SVG experiment.

Professional Portfolio

Calibration of seismometers.

Data access via web services.

Cache rewrite.

Extending OpenSSH.

C-ORM: docs, API.

Last 100 entries

Pastamore - Bad Italian in Vitacura; History Books; Iraq + The (UK) Governing Elite; Answering Some Hard Questions; Pinochet: The Dictator's Shadow; An Outsider's Guide To Julia Packages; Nobody gives a shit; Lepton Decay Irregularity; An Easier Way; Julia's BinDeps (aka How To Install Cairo); Good Example Of Good Police Work (And Anonymity Being Hard); Best Santiago Burgers; Also; Michael Emmerich (Vibrator Translator) Interview (Japanese Books); Clarice Lispector (Brazillian Writer); Books On Evolution; Looks like Ara (Modular Phone) is dead; Index - Translations From Chile; More Emotion in Chilean Wines; Week 7; Aeon Magazine (Science-ish); QM, Deutsch, Constructor Theory; Interesting Talk Transcripts; Interesting Suggestion Of Election Fraud; "Hard" Books; Articles or Papers on depolarizing the US; Textbook for "QM as complex probabilities"; SFO Get Libor Trader (14 years); Why Are There Still So Many Jobs?; Navier Stokes Incomplete; More on Benford; FBI Claimed Vandalism; Architectural Tessellation; Also: Go, Blake's 7; Delusions of Gender (book); Crypto AG DID work with NSA / GCHQ; UNUMS (Universal Number Format); MOOCs (Massive Open Online Courses); Interesting Looking Game; Euler's Theorem for Polynomials; Weeks 3-6; Reddit Comment; Differential Cryptanalysis For Dummies; Japanese Graphic Design; Books To Be Re-Read; And Today I Learned Bugs Need Clear Examples; Factoring a 67 bit prime in your head; Islamic Geometric Art; Useful Julia Backtraces from Tasks; Nothing, however, is lost with less discomfort than that which, when lost, cannot be missed; Article on Didion; Cost of Living by City; British Slavery; Derrida on Metaphor; African SciFi; Traits in Julia; Alternative Japanese Lit; Pulic Key as Address (Snow); Why Information Grows; The Blindness Of The Chilean Elite; Some Victoriagate Links; This Is Why I Left StackOverflow; New TLS Implementation; Maths for Physicists; How I Am 8; 1000 Word Philosophy; Cyberpunk Reading List; Detailed Discussion of Message Dispatch in ParserCombinator Library for Julia; FizzBuzz in Julia w Dependent Types; kokko - Design Shop in Osaka; Summary of Greece, Currently; LLVM and GPUs; See Also; Schoolgirl Groyps (Maths); Japanese Lit; Another Example - Modular Arithmetic; Music from United; Python 2 and 3 compatible alternative.; Read Agatha Christie for the Plot; A Constructive Look at TempleOS; Music Thread w Many Recommendations; Fixed Version; A Useful Julia Macro To Define Equality And Hash; k3b cdrom access, OpenSuse 13.1; Week 2; From outside, the UK looks less than stellar; Huge Fonts in VirtualBox; Keen - Complex Emergencies; The Fallen of World War II; Some Spanish Fiction; Calling C From Fortran 95; Bjork DJ Set; Z3 Example With Python; Week 1; Useful Guide To Starting With IJulia; UK Election + Media; Review: Reinventing Organizations; Inline Assembly With Julia / LLVM; Against the definition of types; Dumb Crypto Paper; The Search For Quasi-Periodicity...

© 2006-2015 Andrew Cooke (site) / post authors (content).

Empty Loops in Regular Expressions

From: andrew cooke <andrew@...>

Date: Thu, 8 Jul 2010 09:04:14 -0400

Extended regular expressions (particularly) have empty transitions that can
occur in loops.  For example, (?(1)a) only matches "a" if group 1 exists, so
(?(1)a)* could be a repated matching od the empty string.

To some extent this is already avoided at compile time, by refusing to parse
things like a**, but there are many possible cases.  The problems for an
implementation are then:

- Whether to warn or reject such cases
- If not rejected, whether to try avoid infinite loops during evaluation

I am currently working on this with RXPY.  In general, I want to (i) provide a
safe system as a default, but (ii) allow the user complete control.  So it
seeems that two flags are necessary: one to disable compile time errors and
one to disable run time safety.

Implementation must also consider efficiency and ease of maintenance /
impementation.  It seems to me that many (but not all) cases could be
automatically rewritten to a safer version, but I don't currently have good
graph rewriting support (I talk about graphs here because the "opcodes" in
RXPY are nodes on a graph; the regular expression is "compiled" to a graph of
these nodes that is then "evaluated" against the input).

Since I do not have rewriting, and because that is not a complete solution
anyway, I need some other runtime scheme.  The best I have found so far is to
add additional nodes that "break" any dangerous loops.  A machine can then
verify that input has beenconsumed between each encounter with such a node.
This keeps almost all the cost to those expressions that need such a feature.

Also, the logic to generate these nodes can also be used to generate compile
time errors.  The graph API includes "consumer(lenient)" where lenient is a
boolean.  If lenient is true then consumer returns True except when a node (or
sequence of nodes) *cannot* consume input.  Repeating such nodes (or
sequences) gives a compile time error.  If lenient is false then consumer
returns True only when a node (or sequence) *guarantees* consumption (in all
cases).  This can be used to detect when to add the runtime check node (ie
when False).

This is not completely implemented, but an initial attempt is looking very
positive - "spinning" in empty loops is avoided at very little cost, and the
integration of logic for compile and runtime checks reduces the code / detail
required.

Andrew

Comment on this post