Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Useful Java Proeprties Parsing Idiom

From: "andrew cooke" <andrew@...>

Date: Sun, 19 Mar 2006 17:07:48 -0400 (CLT)

Java isn't the easiest language to write simple parsers in.  Often that
doesn't matter because you can use XML, but at work I had a requirement
for "easy configuration" which, I think, meant "enough of your damn Spring
verbosity, it's gotta be text".

I have the impression (although now I can't think of any examples... ah, I
think it's used by Tiles configuration, at least) that it's quite common
to use Java properties file for config.  They inherit usefully, support
comments, and are "easy to read".

More exactly - since Properties files are of course used for config - the
following idiom seems to be fairly common:

  # two column table with string values

  table1.sql.table=qualified.name
  table1.sql.column.column1.header=HEADER1
  table1.sql.column.column1.type=varchar
  table1.sql.column.column2.header=HEADER2
  table1.sql.column.column2.type=varchar

  # junk loader and image

  table1.java.loader=java.lang.Object
  table1.java.image=java.lang.Object

And I hope it's intuitively obvious that's some kind of SQL related
description of tables.  You don't need to know more than that about the
application (or I'd have to kill you).

But how to parse that?

First, what does it parse to?  There's a fairly obvious tree model, which
I had designed earlier (and which matches the above nicely):

  trees
   +- table1
       +- modelName (=table1)
       +- table (=qualified.name)
       +- loader (=java.lang.Object)
       +- image (=java.lang.Object)
       +- columns
           +- column1
           |   +- modelName (=column1)
           :   +- type (=varchar)
               +- header (=HEADER1)

The nice thing to note is that just because of the way the names in the
properties file are built, you have to define a minimal path through the
tree for any value.  So you can build the tree using incomplete nodes "as
you go" and don't need to force the user to supply values in any order.

However that means that you can end up with incomplete nodes, so you need
a recursive verify() operation that checks everything is defined.

It also simplifies things to have
  Node addNode(String name)
methods that add the node if it's mising and return the named node.  This
helps you quickly step down through the tree as you handle any particular
specification.

One thing I didn't do, which I probably should have done, was tokenize the
name values.  instead I repepatedly parse them, which is inefficient (but
allows the separator value to be used if context requires it).

However, the neatest trick was the use of enums.  Many of the name tokens
are constants (sql, column, header, image, etc).  So I had code like:

  private static enum Domain {SQL, JAVA, META};
  private static enum SqlSpec {COLUMN, TABLE, OTHER};

And here's how the token is converted to an enum:

  private SqlSpec getSqlSpec(String line) throws ModelException {
    return (SqlSpec)getEnum(line, 2, SqlSpec.values(), SqlSpec.OTHER);
  }

  private Enum getEnum(String line, int chunk, Enum[] values, Enum deflt)
  throws Exception {
    String name = getChunk(chunk, line);
    for (Enum value: values) {
      if (value.name().equalsIgnoreCase(name)) return value;
    }
    return deflt;
  }

where getChunk gets the numbered token (as I said, that bit is not so cool).

So then the parse logic becomes trivial (this is already one select down):

  private void parseSql(String line, String value, Table table) {
    switch(getSqlSpec(line)) {
    case COLUMN:
      String name = getChunk(3, line, line);
      Column column = table.addColumn(name);
      parseColumn(line, value, column); // further descent
      break;
    case TABLE:
      table.setDatabaseName(value);
      break;
    case OTHER:
     throw new RuntimeException("Unsupported specification: " + line);
    }
  }

A summary:
 - allow any order in specification
 - construct paths with incomplete values
 - do a final recursive verify (verification in tree node classes)
 - Node addNode(String name) simplifies rapid traversal
 - use enums for sets for constant values
 - use nested selects with the enums

And now I'm going to change it to use StringTokenizer and avoid that ugly
getChunk...

Andrew

Now With Tokens (Extra Crunch)

From: "andrew cooke" <andrew@...>

Date: Sun, 19 Mar 2006 17:24:41 -0400 (CLT)

It was easy to rewrite with tokens.  The code is about the same - slightly
messier, I think, but none of the explicit depths are there (in the calls
to getChunk).

Maybe that doesn't sound like much, but of the four errors I found via my
unit tests, three were related to those values.  :o/

Still, I guess being wise after the event is better than never.

Refinements and Comments on the Properties Parser

From: "andrew cooke" <andrew@...>

Date: Thu, 23 Mar 2006 19:36:28 -0400 (CLT)

1 - Not Properties

It had to happen - at some point I change the format so that the order
mattered.  And then found out that Java.util.Properties doesn't preserve
ordering in the files.  So I neded up writing my own (which is not a lot
of work and allowed me to keep track of source/line for better error
reporting).

2 - Theory

A colleague here was using a simple parser package and I relaised I'd just
rushed in without thinking about theory, as usual.  A little reflection
shows that what I'm doing here is basically recursive descent (although in
partice there's little recursion) with context-dependent lexing.

That made me look again at the lexer part, which was kind-of mixed in with
the rules/productions.  I ended up pulling it out into a separate (inner)
class:

  protected static class Lexer<E extends Enum<E>>  {

    private E[] values;

    public Lexer(E[] values) {
      this.values = values;
    }

    public E lex(String name, Line line) {
      for (E value: values) {
        if (value.name().toLowerCase().startsWith(name.toLowerCase()) {
          return value;
        }
      }
      throw // error can include name and list of expected values
    }
  }

To use this, define an enum and subclass the lexer:

  private static enum TableSpec {COLUMN, DBNAME, BINDER, IMAGE, ERROR}
  private static Lexer<TableSpec> tableLexer =
    new Lexer<TableSpec>(TableSpec.values());

This leaves the rules/production in a simple switch:

  private void parseTable(StringTokenizer tokens, Node node, ...) {
    switch(tableLexer.lex(tokens)) {
    case COLUMN:
      Node child = node.addChild(...); // accumulate
      parseColumn(tokens, child, ..); // recursion
      break;
    case DBNAME:
      // ...
    case BINDER:
      // ...
    }
  }

3 - Accumulate the Parse Tree

Using an accumulator approach to constructing the parse tree simplifies
the productions (there's no backtracking so no need to worry about
destructive state).  So each node has an "addChild" method, which returns
either a known value (if traversing an already existing path), or a new
value.

4 - Embedded Names

There were two kinds of nodes in my tree.  One kind had a fixed set of
children.  Parsing the children required a lexer as described above. 
Typically this corresponded to selecting an option (for example, a data
type).

Other nodes could take "any" value and the String token is used directly
(the StringTokenizer splits on "." - see earlier discussion).  Typically
these nodes are names and the parent includes a map from name to node.

5 - Damn Enums

I still can't work out a way to generate the full set of enums
automatically (see the Lexer class above, which takes an array as an
argument).  I tried both using the generic type parameter and passing the
Enum itself with no success (ie I'd like new Lexer<TableSpec>() or
Lexer<TableSpec>(TableSpec)
to be possible).  I guess the latter should be TableSpec.class - somehow I
seem to be confusing classes and objects.

Comment on this post