Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

[Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life; [Link, Bike] Cheap Cycling Jerseys; [Link, Music] Music To Steal 2017; [Link, Future] Simulated Brain Drives Robot; [Link, Computing] Learned Index Structures; Solo Air Equalization; Update: Higher Pressures; Psychology; [Bike] Exercise And Fuel; Continental Race King 2.2; Removing Lowers; Mnesiacs; [Maths, Link] Dividing By Zero; [Book, Review] Ray Monk - Ludwig Wittgenstein: The Duty Of Genius; [Link, Bike, Computing] Evolving Lacing Patterns; [Jam] Strawberry and Orange Jam; [Chile, Privacy] Biometric Check During Mail Delivery; [Link, Chile, Spanish] Article on the Chilean Drought; [Bike] Extended Gear Ratios, Shimano XT M8000 (24/36 Chainring); [Link, Politics, USA] The Future Of American Democracy; Mass Hysteria; [Review, Books, Links] Kazuo Ishiguro - Never Let Me Go; [Link, Books] David Mitchell's Favourite Japanese Fiction; [Link, Bike] Rear Suspension Geometry; [Link, Cycling, Art] Strava Artwork; [Link, Computing] Useful gcc flags; [Link] Voynich Manuscript Decoded; [Bike] Notes on Servicing Suspension Forks; [Links, Computing] Snap, Flatpack, Appimage; [Link, Computing] Oracle is leaving Java (to die); [Link, Politics] Cubans + Ultrasonics; [Book, Link] Laurent Binet; VirtualBox; [Book, Link] No One's Ways; [Link] The Biggest Problem For Cyclists Is Bad Driving; [Computing] Doxygen, Sphinx, Breathe; [Admin] Brokw Recent Permalinks; [Bike, Chile] Buying Bearings in Santiago; [Computing, Opensuse] Upgrading to 42.3; [Link, Physics] First Support for a Physics Theory of Life; [Link, Bike] Peruvian Frame Maker; [Link] Awesome Game Theory Tit-For-Tat Thing; [Food, Review] La Fabbrica - Good Italian Food In Santiago; [Link, Programming] MySQL UTF8 Broken; [Link, Books] Latin American Authors

© 2006-2017 Andrew Cooke (site) / post authors (content).

Using Regexps to Identify Integer Ranges

From: andrew cooke <andrew@...>

Date: Thu, 25 Jul 2013 18:08:00 -0400

This was a cute question on StackOverflow - can you use a regexp to identify
integers in some range?  For example, all values between 1234 and 4321?

Obviously you can, because you could use (1234|1235|...|4320|4321).  But I
(wrongly - see below) assumed that was impractical.  So I sat down to generate
something more compact by hand.

It turns out that the algorithm is pretty easy.  If you have a range like
123-599 split it into an edge case, 123-199, and a "block", 200-599.  The
block can be matched trivially, while the edge case can be handled as a common
prefix, 1, plus recursion on 23-99 (which has an edge of 23-29 and a block of
30-99).

The code is at http://stackoverflow.com/a/17840228/181772 and the generated
expressions are pretty good (if you read them, they "make sense"):

   0 - 0     0
   0 - 1     [0-1]
   0 - 2     [0-2]
   0 - 9     \d
  00 - 10    (0\d|10)
  00 - 11    (0\d|1[0-1])
 000 - 101   (0\d\d|10[0-1])
   1 - 1     1
   1 - 2     [1-2]
   1 - 9     [1-9]
  01 - 10    (0[1-9]|10)
  01 - 11    (0[1-9]|1[0-1])
 001 - 101   (0(0[1-9]|[1-9]\d)|10[0-1])
 000 - 123   (0\d\d|1([0-1]\d|2[0-3]))
 111 - 123   1(1[1-9]|2[0-3])
 123 - 222   (1(2[3-9]|[3-9]\d)|2([0-1]\d|2[0-2]))
 123 - 333   (1(2[3-9]|[3-9]\d)|2\d\d|3([0-2]\d|3[0-3]))
 123 - 444   (1(2[3-9]|[3-9]\d)|[2-3]\d{2}|4([0-3]\d|4[0-4]))
 000 - 321   ([0-2]\d{2}|3([0-1]\d|2[0-1]))
 111 - 321   (1(1[1-9]|[2-9]\d)|2\d\d|3([0-1]\d|2[0-1]))
 222 - 321   (2(2[2-9]|[3-9]\d)|3([0-1]\d|2[0-1]))
 321 - 333   3(2[1-9]|3[0-3])
 321 - 444   (3(2[1-9]|[3-9]\d)|4([0-3]\d|4[0-4]))
 123 - 321   (1(2[3-9]|[3-9]\d)|2\d\d|3([0-1]\d|2[0-1]))
 111 - 121   1(1[1-9]|2[0-1])
 121 - 222   (1(2[1-9]|[3-9]\d)|2([0-1]\d|2[0-2]))
1234 - 4321  (1(2(3[4-9]|[4-9]\d)|[3-9]\d{2})|[2-3]\d{3}|4([0-2]\d{2}|3([0-1]\d|2[0-1])))
 000 - 999   \d\d{2}


Then someone commented that (1234|1235|...|4320|4321) actually works just fine
- the string is smaller than memory available, which is all that matters.  And
that seems to be true for quite large numbers (it should compile down to a
compact DFA anyway, so it's only the input string size that is a worry).

So the following is a reasonable solution for many cases:

  import re

  def regexp(lo, hi):
      fmt = '%%0%dd' % len(str(hi))
      return re.compile('(%s)' % '|'.join(fmt % i for i in range(lo, hi+1)))


When does the above fail (run out of memory)?  This is a table of the
expression size for various number ranges (approximately):

  0-1  1
  0-10  20
  0-100  300
  0-1000  4,000
  0-10000  50,000
  0-100000  600,000
  0-1000000  7,000,000
  0-10000000  80,000,000
  0-100000000  900,000,000
  0-1000000000  10,000,000,000
  0-10000000000  110,000,000,000
  0-100000000000  1,200,000,000,000

so 9 digit numbers require about 1GB of memory.

Andrew

Comment on this post