> As it turns out, the semantics are such that these loops are always guaranteed...

sriram_malhar · on May 11, 2018

That's right, Not in a spreadsheet. Excel will reject it. It has to be a directed acyclic graph.

But in Bloom, the only data structure you have available is a lattice, whose change operators have special properties. Consider a prominent example of lattices, namely, sets

The only way to add to a set is to union in a new set (of changes). So following the spirit of your example, suppose there are ΔC items to be added to C. This results in some rules getting triggered that generate ΔB changes to B, which in turn produces (possibly) fresh ΔC changes to C and so on in a loop.

You will find that given the operators, soon there will come a time when the deltas add nothing new to either table; the computation stops at that point. B and C have mutually coevolved to a new state.

Lattices are basically data structures (like sets) fitted with a partial order, and a merge operator (like set union) that has these three simple properties:

idempotence: The same stuff added to the set does not change the set

commutative: Two sets can be merged either way (A union B == B union A)

associative: Order of merges don't matter. A union (B union C) == (A union B) union C.

As an aside, these are the building blocks of CRDTs, which is why you get eventual consistency (all replicas eventually converge to the same result)

Edit: The merge operators are order-independent. The evaluation semantics don't care about the order in which rules are written down in the program. The Bloom docs speak of "disorder" which is a bit misleading, albeit catchy.

wruza · on May 11, 2018

I think I don’t understand something important from your explanation, since I still can’t see how gp comment’s loop is possible in Bloom. Not even “possible”, but how it is implemented to be predictable and ever halting.

More realistic example could be an invoice row with a tax or discount, where (discount / (price + discount)) != discount_percent, due to .00 rounding and ieee754 problems in general. How do such systems solve these disrepancies? Is stop point arbitrary or controllable?

edit: i.e. 3% of 11.99 = 0.3597, which is 0.0003 off from 0.36

sriram_malhar · on May 11, 2018

Recall that in Bloom, dataflow operations are on tables (B1 and C1). It makes no sense to add 1 to a table.

But you can add another set. Say the rules are

B = C + D

C = B + E

where each of these are tables, and + represents union. If you change D (add 1 row), then B will get that row. In turn C will get that row (second rule). Now the computation stops because C has nothing to give B that it doesn't already have. In effect, B and C will always have all of D and E.

A fix point is when things have stopped changing. When it comes to floating point (fixed points vs fix points!), one can have an epsilon beyond which all changes are deemed negligible. If that epsilon has more precision than you need, you have a deterministic answer. Bloom does not implement that out of the box, but you can implement your own lattice types to fit into the system.

pfraze · on May 11, 2018

That mostly makes sense so I apologize if this is pedantic, but:

Is it not possible for a set to have logically infinite sets in Bloom? For instance, "the set of all integers" ? If so, how does it determine the limits of set sizes - is it connected to bit counts? Eg, "The set of all 8bit integers" ?

russell_sears · on May 12, 2018

You can guarantee termination by limiting bit precision, but it is more interesting to limit the set of operators your program uses instead.

For instance, you might have an operator that translates from hostname to IPv4 address.

That function can return billions of different numbers, but if your program’s input only has 10 hostnames in it (and you don’t use string manipulation or other tricks to fabricate hostnames) then you can infer that the IP lookup function only adds 10 new symbols to the sets that your program maintains.

Searching for “herbrand universe” will produce formal writeups of these ideas.

sriram_malhar · on May 12, 2018

No, not in Bloom. The sets must be finite.

There is one aspect that is infinite, which is along the time axis. But the way the logic works is that you are never in a position to enumerate all possible behaviours. This is a very superficial answer, and perhaps you'll gain more insight from this paper on the logical semantics:

Dedalus: Datalog in Time and Space.

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-...