Developer Learn Podcast RCast

RCast 32: TOGL (Theory of Graphs Language)


Subscribe to RCast on iTunes | Google Play | Stitcher


Greg Meredith discusses his notion of TOGL (Theory of Graph Languages) with Isaac DeFrain and Christian Williams. The notes discussed during the talk are available here.


Transcript


Greg: First of all, thanks to everyone, especially to you, Derek, for hosting and editing and making sure that the community has access to this material. It’s a great service; I really appreciate it. Also, thanks to Christian and to Isaac. Today we’re just going to have some fun. 

I wanted to talk about a theory that could be used to inform some stuff that we do on RChain, but it has a wider range of applications. I call it TOGL. It’s an algebraic way to think about graphs. Another way to think about it for programmers: it’s a data type for graphs. 

A whole bunch of graph theory is done in terms of presentations of graphs as, essentially, nodes and edges—collections of vertices and collections of edges. That doesn’t really speak to the way that graphs naturally form or the way that we use graphs in the computational setting. There is a different way to formulate them as a data type, which is as a recursive algebraic data type. 

The analogy I want to emphasize is the way we think about the list data type. If you were going to write down the list data type as a domain equation than it would be a functor, which takes some type A as an argument and then produces the sum, which is either one or A cross-list A. You can think of that as a con’s representation. You have either the empty list or you con’s a value onto the rest of the list.

That’s clearly a recursive presentation. It lines up with our way of thinking about lists, but it has other advantages. A nodes and edges presentation of graphs does not have that style; it doesn’t have that sort of flavor of presentation. As a result, when you try to write algorithms for graphs or even express graphs themselves, it can’t take advantage of this kind of recursive structure. There’s a lot more complexity that these algorithms have to wade through.

A lot of this complexity is needless. But in order to walk through it, I want to go through the construction carefully. There are some nice visualizations for it. I’ll make those visualizations available on the website. We can talk about them. It lends itself pretty nicely to an English description. Let me stop there and check in and see if what I’m saying makes sense to Isaac and a Christian. 

Christian: Yeah, this sounds like a good motivation: incorporate a better theory of the stuff that actually is the reason for the graph rather than just seeing it as an external mathematical structure from the beginning. 

Greg: In terms of the presentation, I’m using a style that’s known to type theorists and proof theorists, but maybe not so well known amongst programmers. I have this idea of a well-formed graph. I give a sequent theory and a proof theory to form graphs, and then determine when graphs are well-formed. You can prove when someone hands you a graph whether it’s a well-formed graph. We start with this idea that we’re going to have this domain of graphs, which is parametric in two things.

One is a universe from which we draw our vertices. And the other is a universe from which we draw some variables. The variables are important because we want to be able to let variables stand for vertices in graphs. There’s a wider construction within which we have another class of variables which would stand for whole graphs. 

Let’s begin simply and then move to more complicated structures. We have this domain of graphs that are going to be formed from a type for variables and a type for vertices. Then we have a notion of gamma, which is the list of variables that are utilized in the formation of the graph. Then we can form a sequent, which is, given this list, we want to formulate a graph over that list of variable references.

I use turnstile notation. We can read it as thinks that. We have given some gamma, which is this list of variable references thinks that G is well-formed. That’s a way to turn the sequent notation into an English sentence. The general shape of the formation rules are going to be at the form, given in different sequents, which determined that a G1 through GN are in fact well-formed, then we can combine the dependencies of each of the sequents. That combination of dependencies thinks that a way of combining the graphs is also well-formed. If you have some graph constructor or C tor K, then you’re going to form this K over the graphs and you’re going to form a similar kind of K over the sequents.

It says how you combine the dependency list to get a combined graph that as well-formed. That sounds somewhat abstract, but we’ll give some good examples. Building out carefully, we’re going to say that as long as you draw a vertex from a universe of vertexes, any gamma thinks that the vertexes are well-formed. Likewise, if you draw a variable from the universe of variables than a gamma that mentions X will think that X is well formed. Then we have a notion of a well-formed graph. 

We are building up recursively. Like I said, our gammas are just sequents of variables. Then we have a collection of graph expressions. There are just a few of them. The way I’ve done it in this notation is we’ve got a zero graph (or the empty graph). If you had a graph G, then if you wrote vertex in parallel (or pike G), that’s adding a vertex of V to the graph G. If you had two graphs, say G and H, then we can combine them with a tensor.

We just basically laid them side by side and that gives you a new graph. One of the things that are most distinct in this approach is nominating a particular vertex with a variable. This is a let expression. We would say let X, which is a variable, equal V in some G. The reason we want to do that is that we want to be able to form edges, but we only ever form edges between variables. This let thing actually picks out a vertex and gives it a name. If you have to let expressions, then you can form an edge between the two. That’s the total set of graph expressions. 

There’s actually a more recent writeup, but I couldn’t find the link to that one. This one is a little bit older than the more recent writeup, but it’s still pretty good in terms of getting the ideas across. 

I actually give English language pronunciation, so you have zero is the empty graph (or just empty). The pipe construction is a join V to G; the tents are constructions juxtapose G1 and G2 or G and H. The let expression is let X be V and G. The edge construction connects G1 to G2 with an edge from X1 to X2. That’s where you’ve got this pairing, let X1 equal V1 and G1 with let X2 with V2 and G2. That’s the formal expression. We pronounce it connect G1 to G2 with an edge from X1 to X2.

In that way, the edges are anonymous, but you can also provide a name for them as well. For purposes of discussion, we start off with the anonymous (or unnamed) edges. Let me stop there and just check in with us with Isaac and Christian. Is this making sense? 

Christian: Yeah, it’s making sense. That was my first question: when you declare this edge, is it just an existence or can it hold some structure? 

Greg: You can allow it to hold structure. I’m building the theory in pieces (or iteratively). 

We’ve talked about the structure of the sequence. Now we’re going to talk about the structure of the proofs. The proofs are basically what allows you to guarantee that a graph is as well-formed. The foundation of the whole thing is that if you have no dependencies, then zero is well-founded.

If the little V is in the universe of variables, then the empty dependency list gives you the vertex is well-founded; there’s a similar rule for variation. There’s a rule which we call participation, which says: if you have some sequent that says G is well-founded and you know that V is a legal vertex, then you adjoin the vertex to the graph, and that gives you a well-founded graph where the dependency of us is just the dependency list from the formation draft. 

Correspondingly, there’s a similar rule adjoining a variable to a graph. You can adjoin a variable, remembering that ultimately for this to make sense, there has to be a let expression which would bind the variable to the graph. 

Christian: Are those slashes just first-facing?

Greg: Yeah, I think what happened was somehow some semicolons got dropped. Those were supposed to be spaces. 

If you have that gamma one thinks that G is well-formed and gamma two thinks that G2 is well-formed, then if you just combine the two dependency lists—so gamma one comma gamma two thinks that G tensor G2 is well-formed. The assumption is there’s a side condition that gamma one intersects with gamma two must be empty. They can’t share a variance. 

Christian: What’s the reason for that? 

Greg: This is the meaning of tensor. Otherwise, they could potentially overlap in some way.

Then we have a nomination. So if you have a legal vertex and you have a gamma comma X thinks the G is well-formed, then you can get that to let X equal VNG is well-formed. Notice that it’s not a requirement that V actually occur in G. In the case that it does, then you’re binding something there. 

Finally, you have connection. So if gamma one thinks that let X1 equals V1 and G1 is a well-formed graph and gamma two things that let X2 equal V2 and G2 is well-formed, then you can form the graph that connects X1 to X2 and G1 and G2 (or between G1 and G2). 

There’s a set of English language interpretations of that in the document. That gives you a sense of how we give proof rules that are going to determine that you have created a graph. We just went through the graph formation rules. 

I’m hoping that you’ll express some reservations or questions because there should be some natural equations. For example, we want to equate G1 tensor G2 with G2 tensor G1. That should happen. Then we should have some rules about let, if you iterate let over the same variable, then it should only be the outer variable that, in other words, there’s a kind of shadowing that happens. You have to collapse that. 

This is where the other writeup is a little better. You can calculate membership. You can ask whether or not V is in a particular graph. For example, you know that V is in the joining of V to G. You know that if V is in G, then V will necessarily be MG tensor G prime. You know that if V is in G, then V will be in let X equal V and similarly for edges, now you’ll have a link one and a link two. You can calculate membership in this way. 

With a notion of membership, you can actually characterize those equations. For example, if you tensor the empty graph with the graph that should just be the same as the graph to tensor G1 with G2, which would be the same as G2 and G1. If you tensor G1 with G1 tensor G2, that should be the same as G1 tensor G1 tensor G3. It’s associative. You should be able to permute the adjoining of any vertices. You should be able to collapse nested lets in the case that they mentioned the same vertex and likewise those variables. 

There’s a form of extrusion. If you nominate a vertex with a variable,—you selected it—and then you tensor that with some graph, then you should be able to pull the tensor inside the nomination. That’s important. 

There are all kinds of things that are simplified because of this presentation. There are lots of things that you can do now with sub-graph isomorphism that are not hard because of the way we’ve formed the graphs. But there’s no such thing as a free lunch. 

The complexity moves over into checking these equalities. In particular, checking that something decomposes into a tensor in two graphs can be exceptionally hard. That’s where the complexity moves over. 

Christian: A quick question. I was expecting some roles that if G1 equals G2, then when you apply any of the constructors onto them, those are equal, so you can apply locally?

Greg: Those are the theorems rather than rules. You can prove those. The only place where you have to worry is in variable blinding. The tensor stuff is a straightforward proof. The adjoining stuff is a straightforward proof. Edge formation is a straightforward proof. It’s only in the variable binding stuff you have to be a little more careful. You have to quantify over certain substitutions. 

Let me now talk about some examples. There’s a family of examples starting from the discrete graph. All the nodes are discrete. Then you form a chain. You have one node connected to the next node connected to the next node. Basically, like a pencil or a chain. That’s the next level of complexity. Then you want to be able to form a cycle and then you’d like to be able to form a complete graph. 

Those are some standard examples from graph theory. If you look at each of these examples, in terms of your favorite graph data type, it’s really interesting to see how the complexity changes as the complexity of the example changes. Essentially, the complexity examples are getting greater and greater and greater until finally you get to the complete graph and the complexity of the object is very high. The question is, what happens to the complexity of the representation as the complexity of the graph goes up?

What distinguishes this theory from other graph representations is that essentially the complexity of the representation stays almost constant. We get the one-liner descriptions of the graphs that remain one-liners as the complexity of the graph goes up. These one-liner descriptions don’t change very much, which means that there’s something right about this kind of formulation. That’s what makes this interesting. 

I describe these graphs in terms of morphisms from some other domain. For example, the natural numbers into the domain of graphs. The discrete graph on the natural number zero is just the empty graph. Then the discrete graph on N is going to be equal to a join N, whatever that is, we’re going to take the naturals as our universe of vertices. 

The discrete graph on N adjoined to zero tensor with the discrete graph on N minus 1. If we introduce this notation of sort of box N as a join into zero, or put N in square braces, we can see that discrete N is just box N tenser box N minus 1 tensor yada, yada, yada…all the way down to zero.

It’s a nice crisp recursive specification of the graph that has a natural expansion that looks algebraically just like the graph looks if you were to draw the graph. We can use the idea of the discrete graph to make a chain. I’m doing something a little tricky: I’m building variables out of the domain of vertices. The way I distinguish them is with quotes. 

I have variables that are essentially quotes of natural numbers. The chain of zero is just the empty graph. The chain of one is to let the quote 2 be the vertex 1 in box 1. Box 1 stands for adjoining 1 to the empty graph. Then chain N his let 2 in the quote to N equal N let 2 N minus 1 equal refer to N in box N and then chain of N minus 1.

Notice that what that’s doing is always chain of N unless it’s zero is a let form. Since we spell out exactly what chain of one is, will bottom out at one. You can verify that for any N greater than or equal to one chain of n is going to be a let form (or a nominated form). This idea of chain of N is going to be an edge. It’s got the right shape.

Christian: Are these quotes supposed to be suggestive of Rho quotes? 

Greg: They are. We’ll get to that later. Right now I’m just making things easy on us so that I don’t have to manage subscripts on variables. That’s what I’d have to do in order to express this in another way. Once you can adduce this little lemon that I made, then it’s easy to see that the chain form gives you exactly the chain graph that I mentioned.

Now once you’ve got that, then the cycle is even easier. The cycle of known nodes is just the empty graph. The cycle of end nodes is, we take the chain of N and then you adjoin the end to the beginning. You pick out the one and you adjoin it to the end. 

Again, notice what’s happened. The complexity of the discrete graph is a single line recursive function. The complexity of the chain is a two-line recursive function. The complexity of the cycle is a two-line recursive function. The complete graph requires the introduction of the notion of a for comprehension and the ability to go and pick out the entire vertex set. Then we can lift the edge computation from two nominated graphs (or graphs in the nominated form) to an edge formation over two for comprehensions.

Basically, it’s connecting everything in one for comprehension to connecting everything in the other for comprehension. We’re introducing a little bit of syntax that has a very intuitive meaning for programmers. The syntax is basically just like a Scala for comprehension or a Rholang for comprehension. For X from the vertex set of G, if some predicate on X holds, then you yield a particular graph. That’s the expression. 

If we have two of those expressions, that’s going to the yield if you expand it out, a bunch of let expressions, which could then be used to formulate an edge. This gives you a collection of nominated expressions—basically a Cartesian product of two nominated expressions—then you can form all edges between all the pairs. No surprise. 

Again, it’s a single pair of recursive equations. The complete graph of zero is the empty graph. If you have N while you go and form the complete graph of N minus 1, and then you adjoin a vertex, you make a graph that basically just has one vertex, and then you form all the edges from that vertex to all the others. That’s how that’s done. 

The complexity of the description remains basically constant. We did do this for comprehension trick. But I argue it’s a very natural computational device. 

For comprehension syntax has found its way into mathematics through set builder notation. That’s over a hundred years old. It first found its way in the mainstream computing community through SQL. Select from where is for comprehension’s syntax. Then it found its way again through the Haskell do notation and generators in languages like Python. Then was lifted again into the limelight in Scala with their there for comprehension notation. It’s found its way into mainstream mathematics and computing over and over and over again. It’s a very natural thing to want to include. Mathematicians should feel very comfortable with this, as should computer scientists. 

In the presentation, we talk about wires and rewrite systems. There’s a natural way to encode the Lambda calculus in this setting. For example, Christian, in your work on graph enriched semantics for formulae, you might consider this kind of notation as a way to calculate these enriched gadgets. 

Christian: Yeah, that would be very nice. 

Greg: That was one of the reasons why I thought it would be interesting to have you on this call. Do you have a notation for these gadgets? And here is one. 

It turns out that you can also provide a theory of categories on top of this. The theory of categories that are built over a domain X and the domain V are subdomains of the theory of graphs. There’s extra information about how paths have to form arrows on the form directly. There’s always an arrow that reflects a path. 

Christian: I didn’t quite understand. 

Greg: A category is closed under composition. If I take two edges in a graph and I consider the path, two edges where were the target of one is the source of another, then I have to have an edge in my graph where the source of the first is the source of this new edge, and the target of the second is the target of the new edge. In fact, I have to do that for all paths.

Isaac: You’re just saying we can take any category represented as a graph, but it’s a particular type of graph because it has to have this competition. 

Greg: That’s precisely what’s going on. One of the interesting things is that because we can do this kind of decoration stuff on edges, we can actually give a formation of N categories using this approach. I sketched this out for Mike and some others in Park City, Utah a couple of years ago, but I didn’t write it up here. If you’re ever interested, Christian, I can go through that construction. 

Since we’re parametric in X and V, we can do all kinds of weird recursion things. I can make this new domain, say D of X and V, which is G of X plus G of X and V together with V plus D of XV. Now you’ve got the variables mention this domain and the vertices mention this domain. 

We get to compute graph that talks about themselves, which is really interesting. You can take any graph and form the graph of its syntax, which I think is kind of fun. You get graphs that mention other graphs that talk about other graphs. That has a lot of obvious power.

That’s one thing. But then the other thing that’s really interesting is that we can form a type system, or we can use the LADL construction if we take our collection as sets as opposed to multi-sets (or some other commutative monoid). Our formulae within range over true, which is our ground, and then the usual Bullion constructions, so if fee and C are our formulae you can form the conjunction, so V and C, then if we don’t talk about negation yet, we can have zero, so it’s the logical lifting of the empty graph to the logical formula that that picks out all graphs that are structurally equivalent to the empty graph. We can form a formula which, given a formula fee, we can adjoin a vertex to that formula. This would pick out all graphs that have that vertex to join. We can form let X equal V in fee, so it picks out all the graphs that have this nomination as an outer layer. Then we can do the same thing for connection. I can connect the set of graphs that satisfy fee to the set of graphs that satisfy C through X1 and X2 naming the vertices V1 and V2.

Intriguingly, if you have internal structure on your vertices or your variables, then you can extend like we do in this reflective domain D, then you can utilize that structure in your formulae. Rather than just having your formula specifically mentioned V’s and X’s, you could have formulae for V’s and X’s that exploit their internal structure. You can add additional stuff. 

This is an extremely compact logical language for characterizing graphs. There are two different ways to form types of graphs. One is sort of these natural subdomains of G, like the category subdomain of G is a type, and then there’s this question about whether or not the types that we get from the formulae line up with those domains.

I have a conjecture here that talk about the relationship of those two different types. As a teaser, I’ll not mention exactly what that conjecture is, but I suggest that people go check out the writeup. 

Christian: Which two types? 

Greg: Notice that we can pick out subdomains of a particular domain of graphs, like C of SV is a subdomain of V of XV. That’s a native notion of type. How do I programmatically or mathematically describe these subdomains? By hook or by crook, I get to use whatever mathematical gadgetry that would allow me to pick out a subdomain. Then there’s this other way of picking out type which are these formulae, this logical language that I’ve described.

Then the question is: what’s the relationship between those two? How expressive are the formulae in terms of the more native way of picking out subdomains? It turns out there’s an interesting relationship between those two. 

There are a lot of different operations that can be provided just by manipulating the X and the V. If I have a G that’s built on an X and a V, I could flip their roles. I can make graphs where the vertices become the variables and the variables become the vertices. 

Christian: Why would you want to do that? 

Greg: There is any number of technical reasons that have to do with relationships across syntax. You could eliminate the variables. You could say, given a graph that is built on X’s and V’s, only consider the graphs where V’s are also the variables as well as the vertices. Similarly, build a domain of graphs where the variables are the vertices as well as the variables. 

Then you can do other kinds of things. You could combine the domains. So if I have G1 built on X1 and V1 and I have G built on X2 and V2, I can do a summation where I have G built on X1 plus X2 and B1 plus B2. I could do the same thing with the product. I can take the product of the two. 

Then I can do interesting things like a crossover. I can do G of X1 V1 X2 V2. The variable part is X1 plus V1 and the vertex part is X2 plus V2. There are lots of other operations like this. You can utilize those in Mulholland-style evolutionary processes to pick out graphs, which is particularly useful in a setting where you have reflection like this D domain that I described before 

Christian: For these operations, would you need to be careful or have some theorems that you’re still going to be able to make the kind of interpretations that you want, or that you’re going to preserve subdomains? 

Greg: The recommendation is that those are theorems. You prove when you have natural homomorphisms. Those homomorphisms will ultimately depend upon the internal structure of X and V. If you’ve got a lot of restructure in V and not a lot of restructure in X, flipping them around will not do very much. You won’t be able to preserve a lot. You get maximum reservation when you do the reflective domain. You’ve got this fractal-like gadget. If you flip it around, it doesn’t matter so much. You’ve got the fixed point of all those operations, so you’ll get maximal preservation like that. 

I just want people to see that this formulation allows us to do this kind of design-level thinking that I’ve been trying to promote over and over again. In this presentation of graphs, I’ve climbed the hill so fast. You don’t see these kinds of operations available even in Diestel algebraic graph theory, this kind of stuff is just not available. You can’t see the forest for the trees in the nodes and edges formulation graphs. Even Barr’s presentation of graphs, a categorical graph language, doesn’t have these kinds of constructions. In particular, you don’t get anything like LADL. In particular, this relationship between different ways of picking out subdomains and then being able to talk about what’s representable, the other logic and what’s not. 

The point is that it gives you this design-level thinking. It gives you a vantage point from which to be able to see much more of the landscape of graphs. It’s a natural outgrowth of a more modern way of thinking about graphs. One comparison that I would recommend people go and take a look at is the notion of graph and graph logics that grew out process calculi.

Luca Cardelli and Giorgio Ghelli and Philippa Gardner came up with a graph logic that was built out of two steps. The first step is looking at a natural encoding of graphs in process calculi. There’s a natural way to think about graphs in process calculi that have to do with the prefixes and encoding edges. 

That insight goes all the way back to Milner and Hennessy’s insight with the Hennessy-Milner logic. Seeing the actions as giving you an edge between vertices, which are related to the process states—the terms. Then that gives rise to an encoding in process calculi. When you have the spatial behavioral logics, those logics can now talk about processes that are natural representations of graphs. 

Ghelli took us all the way to implementation. He implemented a graph query language based upon that logic that it has a pretty high expressive power. It’s different than the one that I’ve provided here in terms of LADL, but they have similar expressive power. You have roughly the same kind of complexity issues. If you try to decompose the graph as a tensor, that’s going to result in certain kinds of complexity. But there are so many more things you can say in this setting that you can’t even say in the nodes and edges setting because it’s computationally too overpowering. 

Christian: This graph logic related to process calculator is definitely about what I was going to ask in terms of the very basic concrete everyday applications of this idea in the Rho calculus. Are you translating Rho terms into graphs in the communication channel relations between the processes that compose the terms?

Greg: I was trying to not tie myself to process calculi. This gives you a native notion of graph. Then you can go and think about all the different kinds of graphs that you could build out process calculi. You could also think about all the different ways you might consider representing graphs in process calculi. Then you can compare that representation to this representation. 

Christian: If you are inputting some type as a dependency that has its own notion of rewriting, how do you transfer that implicit graph of terms in rewrites of that language into this? Is there a nice way to do that? 

Greg: Yes, there is. That’s the example that I give for the Lambda calculus. There are a couple of different ways to do it. I gave one way. The graph of all reductions of a term is given by the red formula in the theory of graphs document. There are probably other ways to do this that are even more friendly, but I’ve got basically four equations that give the entire graph. It feels relatively tractable. In the document, I invite the ambitious reader to give a red formulation for the Rho calculus. It’s easier to do it for Rho than for Pi, just because you have fewer moving parts in Rho than you do in Pi. You could do something similar for Pi or Blue calculus or Ambient calculus or what have you. 

One of the things that I felt was really encouraging is that if you take the standard graph theory examples, the notation scales. If you try to calculate the graph of reductions, the notation scales. If you try to represent categories, the notation scales. Then there’s all this other stuff that the notation can do in a scalable fashion that is just not even approachable in ordinary graph theory, such as this syntactic domain where we have this recursive thing and I can now talk about the syntax of the graph and the graph. That’s just not even considered or contemplated in the draft, let alone in a way that a handful of recursive equations. 

Christian: That was the part that you kind of flew through. I thought it was really cool, but I had trouble thinking about what it would even mean for a graph to be able to talk about its own syntax. 

Greg: Well, fortunately, the math is there. Finally, we have this logical language that allows us to talk about sets of graphs or collections and graphs as formulae. The point is that it feels exceptionally scalable. The dog doesn’t bark. The curious incident of the dog in the night. The dog doesn’t bark. The notation is quite happy and familiar and comfortable with this very wide range of examples and doesn’t break a sweat, doesn’t get excited or overexerted. We can cover lots and lots and lots of stuff. That’s the idea. It’s one of these things where I just hope that someone out there will pick it up and run with it. Goodness gracious, it takes years to take even just the Rho calculus all the way to market.

A very natural thing to do with it would be to try to build a competitor to some of the graph query or graph database things. Since this is so scalable, I suspect that it’s far superior to the existing graph databases, but we’d have to prove that by doing an implementation. 

Christian: Is the main difference the idea of the dependencies?

Greg: That’s exactly right. It’s the main difference is how you build edges. In this approach, you only ever build edges between named vertices. Then you have to manage that naming technology. That’s what the dependencies are doing. It’s just applications of stuff we know and love from Lambda.

Lambda tells us how to deal with variables. If you just inject that in this most natural way into the rest of the graph constructions, then basically it’s just a conservative extension of what you would expect from a completely naive data type to add variable management. That’s how I arrived at this particular construction.