Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Using Constraint Programming to Identify Groups

From: andrew cooke <andrew@...>

Date: Mon, 29 Aug 2011 23:43:08 -0300

This is the text of my answer at
http://stackoverflow.com/questions/7076349/is-there-a-good-way-to-do-this-type-of-mining/7237972#7237972
- it would be better to read it there, but I wanted a local copy in case it
was deleted at some point.

Andrew


This is a little late, but this problem has been worrying me for some time.  I
was sure it could be solved with mixed integer / linear programming techniques
and asked for help in this question: http://stackoverflow.com/q/7137334/181772

However, after getting a reply there, I had an insight that your problem, at
least as I understand it, is so simple (when framed as a constraint program)
that you can solve it trivially with a simple program (which you already
knew).  In other words, constraint programming would be a cool way to solve
this, but, at least with the approach I found, would give you the same answer
as something much simpler.

I'll explain below my reasoning, how I would implement it with a constraint
solving package, and then give the final, trivial, algorithm.

Mixed integer programming solution
----------------------------------

The most important detail is the difference between horizontal and vertical
groups.  As far as i can see, anything that aligns vertically can be in the
same group.  But horizontal groups are different - components have to be close
together.

The hardest part of solving a problem with constraints seems to be finding a
way to describe the limits in a way that the solver can understand.  I won't
go into the details here, but solvers are frustratingly limited.  Luckily I
think there is a way to do this here, and it is to consider horizontal
neighbours:  if there are N points in a row then we have `N-1` sets of
neighbours (for example, with 4 points A B C and D there are the three pairs
AB, BC, and CD).

For each pair, we can give a score, which is the number of spaces between them
(`S_i`) scaled by some factor `K`, and a flag (`F_i`) which is 0 or 1.  If the
pair are in the same horizontal group then we set the flag to 1, otherwise it
is zero.

It is critical to see that the set of flags for all the pairs *completely
defines a solution*.  For any row we can run along, placing pairs with a flag
of 1 in the same horizontal group, and starting a new horizontal group each
time the flag is 0.  Then, we can take all horizontal groups of size 1 and
convert them into vertical groups: any point that is not in a horizontal group
must be in a vertical group (even if it is a vertical group of just one).

So all we need now is a way to express an optimal solution in terms of the
flags.  I suggest that we want to minimise:

    sum(1 - F_i) + sum(K * S_i * F_i)

This has two terms.  The first is the sum of "one minus the flag" for each
pair.  The flag is 1 when the points are in the same horizontal group and 0
otherwise.  So minimising this value is the same as saying that we want as
*few* horizontal groups as possible.  If this was the only constraint then we
could set it to zero by making all the `F_i` 1 - by making all pairs on a row
members of the same group.

But the second term stops us from choosing such an extreme solution.  It
penalises groups with gaps.  If a pair are in the same group, but are
separated by `S_i` spaces, then we have a "penalty" of `K * S_i`.

So we have a trade-off.  We want horizontal groups, but we don't want gaps.
The final solution will depend on `K` - if it is large then we won't include
any spaces in horizontal groups.  But as it is decreased we will start to do
so, until when it is very small (tends to zero) we place everything in a row
in a single group.

Analytic solution
-----------------

OK, cool.  At this point we have a way to express the problem that we can give
to a constraint engine.

But it's trivial to solve!  we don't need no stinkin' constraint engine to
solve this - we can just look at the expression:

    sum(1 - F_i) + K * sum(S_i * F_i)

The two sums are over the same pairs, so we can move everything into the sum:

    sum(1 - F_i + K * S_i * F_i)
    sum(1 - F_i * (K * S_i - 1))

And then extract the constant (`N` here is the total number of pairs):

    N - sum(F_i * (K * S_i - 1))

Now note that each term in the sum is independent (and additive).  so for each
term, we want the minimum value.  we have two options:

 - if `F_i` is 0 then the entire term is 0.  

 - otherwise, `F_i` is 1 and the term is `K * S_i - 1`.

So the best choice depends on whether `K * S_i` is greater than 1.  if `K *
S_i` is greater than 1 then the smallest value of the term is 0, and `F_i`
should be 0.  Otherwise the second choice above is negative, and `F_i` should
be one.

Trivial algorithm
-----------------

What does this mean?  It means that for each pair we can simply look at the
number of spaces, `S_i`.  If that is greater than `1 / K` then the two points
should be in separate groups.  otherwise they should be in the same group.

So all this fancy maths and optimisation and constraints and bullshitting
comes down to: how far apart are two points in neighbouring pairs?  If they
are closer than some cut-off, put them in the same horizontal group.
Otherwise, put them in separate groups.

So here, finally, is your algorithm:

    choose some cut-off value, X
    place each point in its own, singleton, horizontal group
    for each row with more that one point:
        for each neighbouring pair in the row:
            if the space between the pair is less than X:
                joint into a single horizontal group
    for each column:
        join any single singleton groups into a single vertical group

Conclusion
----------

 - You can use constraint programming techniques to solve this problem.

 - But such techniques are restricted to problems that can be described in
   "the right" (typically, linear) way.

 - The simplest such approach I can find is equivalent to a trivial, direct
   algorithm that divides points in a row into horizontal groups depending on
   the number of spaces between them.

 - But this all depends on a whole pile of assumptions which may, of course,
   be over-simplifications, or just plain wrong.

Comment on this post