Subscribe to RCast on iTunes | Google Play | Stitcher
A talk between Greg Meredith and Christian Williams on LADL (Logic as A Distributive Law).
Greg: In a recent meeting with some of the members, they wanted to find out a little bit more about some of the research engagements from other groups on the LADL work. Since Christian Williams has been here with me this week, I thought we’d take some time to talk about his work with John Baez and then how that fits into the larger research program on LADL. Christian, you and John have a paper accepted somewhere?
Christian: Yes, we have written a paper called, “Enriched Lawvere Theories for Operational Semantics.” We just submitted it to this conference called, “Typology, Algebra and Categories in Logic,” and it was accepted there. We’re going to submit it to a journal soon.
Greg: Oh, that’s awesome. You want to tell us a little bit about what the paper covers and then we’ll try to fit it into the overall LADL research program?
Christian: That sounds good. So a pervasive idea in computer science is that you can describe languages using signatures and you can generate a language by describing some ground terms as well as operations you can do on those terms. For example, you can build up the Lambda calculus by variables and Lambda abstractions and applications. And then you can impose equations or rewrites on this language and begin to compute. This idea has a nice formalization in category theory with something called Lawvere theories. They are originally a formalization of a subject called the Universal Algebra that formalizes abstract algebraic structures such as monoids or groups. But this same idea applies to computer languages.
The basic idea of our paper was that normally these structures are presented with only equations. For example, in a monoid you have an identity element, so that A + 0 = A, but in computer science you need to account for not only equations but some actual directed rewrites from one kind of term to another. The process of computation is really directed in that way and it doesn’t make sense to treat everything like equivalences. For example, Beta reduction in the Lambda calculus is certainly directed and you forget information.
And so our idea is that in the category theory representation of computer science, the objects of the category are types and the morphisms are terms. So if you want to represent the higher dimensional notion of rewriting one term into another, you need that the thing of morphisms between any two objects. In a normal category, that thing of morphisms is just a set, it’s just a play, an old collection. But there’s this idea in category theory called enrichment—an enriched category—where between any two objects you could have a base of morphisms, like a vector space, a topological space, or other interesting structures. The one that we need for rewriting terms is a graph. To describe the so-called operational semantics of a language, you need a label-directed graph where the vertices, which here would be the morphisms (or terms) and then the edges, the labeled edges, are the rewrites between terms. We basically formalized this framework and prove some nice things about it.
Greg: It’s really nice to have someone else describing this. I can give a little bit of background from another perspective. Milner’s seminal paper functions as processes where he proves that the Pi calculus is Turing complete by giving a faithful encoding of the Lambda Calculus into the Pi calculus. He also introduced us two other new things, one of which is the reduction-based presentation of the Pi calculus. Up to that point, the Pi calculus was being presented in a CCS-like style using a label transition semantics. It was not until, I think it’s 1991, that he writes this paper, “Functions as Processes,” where he presents the Pi calculus in a more Lambda-like style with the common rule that those folks who know Rholang have come to know and love. And that raises all kinds of fun and interesting questions about the notion of equivalents, i.e. by simulation and how the comm was related to the labeled transition semantics. Those questions don’t get answered for almost a decade, and they’re still not completely answered to anyone’s general satisfaction.
He factors the presentation of the Pi calculus into three chunks. The first chunk is the grammar that presents all of the terms in the language. This is what corresponds to the Lawvere theory. This sort of plain vanilla Lawvere theory that Christian was talking about. And then he adds some equations. The reason you want needs to add equations is because the Pi calculus terms, when they’re presented syntactically, they make too many syntactical distinctions. For example, if we really mean a concurrent execution, literally right P PAR-Q, then that really should be the same as Q PAR-P. It should be no different in the ordering. To achieve that Milner imposes these equations like P PAR-Q is equal to Q PAR-P.
There are other equations, for example, at P PAR-0, which is the inert process, is just the same as P, and also that if that PAR is associative. So if I run P concurrently with Q and I run that a composite process with R, that should be the same as if I were to compose with a parallel composition, the Q with R and then run P parallel with that. So those are some of the equations that we get to whack down the syntax to something that more closely resembles the semantics.
But none of those equations are information losing. None of those equivalents is there any more information on one side then on the other. Then Milner adds a third section. So we’ve got the grammar, and then we’ve got the structural equivalents, and then we’ve got the operational semantics (or the rewrite rules).
Milner presents this format for the Pi calculus. It becomes the de facto format from that point on for just about any computational calculus. The syntax, some structural equivalents, and then some operational semantics (or rewrite rules). If you look at the wider range—if you step outside as sort of programming language semantics or concurrency theory—and you look to the industry, you see a very similar shape peeking out from specifications of real-life systems like the Java virtual machine.
The interesting point is that you can’t just have the naked syntax in the case of Java like you do in the Pi calculus. In Java, when you do the rewrite rules, you have to include the additional state, such as the registers or the heat or the threads. Those are all a part of the rewrite rules that go along with the JVM.
If you were to give a presentation that was in the same style as Milner gives for the Pi calculus, you’d actually have to enrich the syntax of Java to carry along all the state of the virtual machine. Then there would be some of corresponding equivalence rules. For example, one of the rules that Java shares with the Pi calculus is alpha equivalence. When the Pi calculus is a processes waiting for input, the variable where the input will be received and bound, that variable name doesn’t matter at all.
That’s the same as if you have a method with an argument. If you consistently rewrite the argument everywhere inside the method, as long as you don’t screw up and have it captured by some other binding, then that program is the same. Those two notions of equivalence are present both in Java and in the Pi calculus and in other calculi, so just the Lambda calculus and the Rho calculus.
We see that same shape for the presentation of languages, both in the theoretical side, but also on the very practical side, for the specification of virtual machines and other kinds of models of computation. When Christian and John are coming up with a framework for expressing these ways of presenting computational calculi, it’s an important thing because it helps us to understand exactly what goes into the formulation of the semantics of a programming language, or the formulation of the semantics of a virtual machine, or even other kinds of models of computation (such as a hardware design). These are sort of foundational or fundamental components in the trademark. Christian, there are limitations in terms of the presentation that you’ve viewed, right?
Greg: One of them that we’ve been talking about all week has to do with the handling of the bound variables. As I understand it in your presentation, you’re not currently addressing languages or computational models that have bound variables. Is that correct?
Christian: Yes, that’s correct. This is a classic example of a broad distinction in the ways of thinking between computer science and math. Categories, and in particular Lawvere theories, don’t know anything about variable binding. The only time that variable binding really happens in math is when the mathematician is doing their proofs. Within the formalism itself, it’s not there.
When you want to have a theory of the Lambda calculus, you need to specify that the Lambda abstraction binds the variable so that you recognize what you need to substitute and what you need to keep so-called fresh. If you have some big set of variables that you’re using, like you said, you need to avoid this variable capture substitution, then you need to be keeping track of which variables are bound in which ones are free.
None of that stuff is formalized in a normal Lawvere theory and the enrichment doesn’t do anything to help with that. What we’ve been talking about this week is this really interesting question of how theoretical computer scientists at this intersection between computation and categories have formalized this question. There’s a lot of really interesting math there. The solution is a little bit in flux, but we’re coming up with nice ideas and we’re going to formalize this in the coming months.
Greg: You’re right, the sort of classical or mainstream mathematics often doesn’t address this distinction between sort of object-level and meta-level and in particular, careful accounting of the notion of variables. But it does exist in logic. So there exists and for all our variable binders in logic, and correspondingly, it does exist in type theory. Of course, it does exist and programming languages and a lot of computer science phenomenon. Even when you’re doing ordinary calculus, you have a notion of a function that is dependent on a variable. We know that every student of calculus knows that if I write F of X = X squared + X + 1, that’s exactly the same function as F of Zed is equal to Zed squared + Zed + 1. In fact, there are rules for change and variables in Newton-style calculus or Lambda style calculus. There’s an awareness of it in classical mathematics, but not the kind of careful accounting of what the phenomena really is doing for you.
Christian: Mathematicians are used to taking some of the technicalities for granted. They say, if we ever need to actually implement this stuff, then we’ll take care of it. But mathematicians get the function in the world where everything is already already figured out.
Greg: That’s right. It’s actually quite interesting because there have been several different attempts to address this phenomenon, which in the literature is called nominality. One of the interesting points about nomonality is that there were at least two approaches that have been somewhat successful, but one was handling only the phenomena of substitution itself. And the other is sometimes called nominal theories. Handle that binder phenomenon. Not just the action of substitution but what it means to have a computation that’s frozen, waiting on some input from a variable and the role of that variable within that frozen computation. It’s also quite interesting that neither of those approaches really talk about the internal structure of the binder itself. The Pi calculus muddies the water a little bit because there are these names that are free and that are used for synchronization.
When a process is waiting on a channel for input and another processes is speaking of some output on that channel, that notion of name is potentially distinct from the notion of variable that’s being utilized by the process that’s waiting on input. The structure of name is also something that has been glossed over even by the folks who worry about nominal phenomenon. That is what the Rho calculus brings to the picture. In reality, if we want to have an effective comparison of names, and we have an infinite supply of names, in order for that to be realized in some computing hardware, we have to have some internal structure on these names, which we can use to pull the names apart in order to compare them.
Christian: Ultimately it’s a really good thing because a lot of the mathematical framework we were looking at with these nominal theories was based around the idea that we had accountably infinite set of Adams, and all we could do was shuffled these around, and we had no way to distinguish between these other than representing them as different symbols. It’s never going to be that dire of a situation. You’re always going to have a lot more structure to deal with and that will only make the managing of the names more efficient.
Greg: That’s exactly right. I think it’s useful for people to get a sense of how much there is to cover in terms of theory, even some of these really foundational issues have not yet been covered to everyone’s satisfaction. There are lots of little pockets of understanding all throughout the computing and theoretical computer science communities and they’re not all connected. Another part of the work is making sure that some of these things are connected, the different understandings from these different sub-communities are compared and contrasted, and we get an understanding of how the different solutions play together, and what phenomena we can cover it with each of these different solutions.
Christian: This is another reason why RChain is so important. There are all these really smart people out there that think about all of these amazing fancy things but never actually put any of it into practice or implement an actual system. RChain is a very rare system that’s calling for the best of all of these ideas. It’s really crucial that we get people together and begin to incorporate these ideas on a large scale to rise to the challenge that RChain represents.
Greg: I agree with you wholeheartedly. One of the things that has always been a driver for me is putting the best of our theoretical notions in the crucible of providing something of high utility. That’s exactly what we did with Rosette, and that’s exactly what we did with BizTalk, which are precursors to the RChain architecture. It’s very difficult to get good software without good mathematics behind it. It also works the other way. It’s very difficult to get good mathematics without testing it against both the rigors of the mathematical process, but also against the rigors of utility. It’s not to say that all mathematics should be applied, but rather that placing even the most abstract mathematics within a framework where we can test the ideas is really important and points to a characterization of computer science that I like a lot, which is experimental mathematics.
Christian: Yeah, definitely.
Greg: Another aspect of how this all fits into the LADL framework and points to some holes in our understanding: The whole LADL program is all about this notion of what types of logic is. We’ve had a notion of logic for about 3,000 years. Along the way there have been all these amazing, interesting logics that have been discovered. We started with the kind of Aristotelian conception of logic. That held sway for a long time. Then we began to notice that there were relationships between that and certain kinds of algebras, like Boolean Algebra. That led to some other intuitions that had to do with a dissatisfaction with certain kinds of proofs. Proofs of the existence of an entity that didn’t actually pony up and example were less satisfying.
So if you prove that a certain thing existed by showing that it was absurd if it didn’t exist, that’s a little less satisfying because you don’t have the thing in your hand to inspect and take a look at. It’s just like, well, it has to be there because it’s ridiculous for it not to exist. We get into a contradiction if it doesn’t exist.
Mathematicians felt dissatisfied with that. As they were exploring this idea, they developed a certain family of logics that dropped this notion of excluded middle, which gives rise to what’s called intuitionistic logic. Then it was discovered that the algebraic structures that were associated with intuitionistic logic corresponded precisely to the types that are associated with the Lambda calculus. That led to a whole bunch of other kinds of developments, both in logic and computing.
Along the same lines though there was a different strand was the notion of modal logics. At least initially, they were thought about in terms of possible worlds. A fellow by the name of Saul Kripke gives this way of thinking about modalities like necessity and possibility as sort of moving us along in this graph of possible worlds. When the concurrency theories started coming online, Milner and Hennessy make the observation that the graphs of the rewrites of processes look a lot like these graphs of possible worlds. They give this nice interpretation of modal formulary in this setting, which gives rise to the Hennessy-Milner logics. In a similar vein, Girard starts to notice that conjunction has different meanings if we live in a world where we’re sensitive to resources. In ordinary classical logic, if I say, “it is raining” and “it is raining,” then the meaning of that is just “it is raining.” So A and A is just A, and that sort of makes sense, if we were to think about that in terms of sets. If I intersect a set with itself, then I get back to the original set. That makes a certain amount of sense.
It also makes sense the other way around. If a union a set with itself, since sets don’t admit any duplication, then I get back the original set. Those are ways to think about logic if we’re insensitive to resources. But there are certain kinds of things, such as establishing a property of a chemical substance. Oftentimes, if I wanted to establish the property of a chemical substance, I have to use up some of the chemical substance that I might have on hand in an experiment in order to establish that property.
Establishing a particular property might be costly in some way. Another way to think about it is, if I have a dollar and Christian has a dollar, our sense of ownership suggests that we have $2 on the table (or $2 under discussion). That notion of conjunction is not the same as the classical notion of conjunction. Girard works out a notion of logic, linear logic, which is sensitive to resources. That turns out to be closely related to something called spatial logics, which are our sensitive not just to resource but to structure.
Computer programs have interior structure, and that structure corresponds to resources. Oftentimes if a particular computer program, for example, autonomous is maintaining a certain state that it encapsulates. All of these logics end up being correlated. Finally we get these kind of spatial behavioral logics, which Luis Caries and Luca Cardelli identify.
Just giving a brief history of logic over the four or five minutes. But along the way, nobody has stopped to say what a logic is. So we have all these examples of logics, but we don’t have someone coming forth and saying “a logic is blah.” The LADL approach is to say what a logic is: exactly what data one has to present in order to present a logic. This goes back to what we were talking about at the top of the conversation, which is: What do we have to present in order to present a programming language or to present a virtual machine? The work that Christian and John have done is part of an elaboration of a program that essentially Milner identified: a programming language or a model of computation needs a syntax, some structural equivalents, and some rewrite rules.
The LADL approach says, in order to present the logic that is consistent with two ideas, which, Christian identified. One of the ideas is sometimes called Curry-Howard. We want our terms—in our programming language, our programs—to be morphisms in some category. That’s our orientation. Then our types which correspond to formulae in some logic, those should be the objects in some category. Then it gets even more interesting. If the terms correspond to programs, then we want the proofs of the formulae to correspond to those terms.
That’s the way we need to organize a relationship between logics and programs. That’s one principle that’s pinning down how we get to package things up.
Another principal that pins down how we get to package things up, which might at first seem at odds with this notion (but in fact isn’t) is called realizability. Realizability has had a lot of success in the semantics of various logical frameworks. What realizability sizes that the meaning of any particular formula needs to be the collection of all the terms, all the programs that are witnesses to that particular formula.
Once you have those two principles laid out, you don’t have a lot of wiggle room in terms of one’s interpretation of a logic. It dictates the shape of what a logic is. And so the LADL algorithm says specifically that a logic is given in terms of a notion of computation, whatever your your universe of discourses that you’re going to write formulae about. Then there’s going to be a notion of collection. This is important because if our formulae are going to denote collections of programs, you have to tell me how I’m going to collect them. Am I going to collect them and sets or am I going to collect them in lists or am I going to collect them in trees or multisets, otherwise known as bags or graphs, whatever.
You have to pony up a notion of collection in order to gather up the programs that are the meanings are formulae. The next piece, the small insight, is well then formulae are represented by poking holes in your term constructors and other kinds of machinery like that. Decorating your term constructors so that you can construct terms over collections of terms. Then what the logic has to do is it has to take these terms that have holes poked in them, that you might substitute a whole bunch of terms that in those holes, and turn them into collections of terms.
You need sort of a core way of distributing terms over collections to collections over terms. That’s sometimes called the distributed law. The reason it’s called LADL is because it’s “Logic as A Distributive Law.” It sets forth what the data that have to be given are in order to specify a logic, in the same way that Milner had set forth what the data are to specify what a programming languages is, or a model of computation is.
Christian: A way to think about this distributive law, just as a really basic example, is: if recall, you’ve heard that phrase before from when times distributes over plus with numbers, and it’s about when you have two structures on the same object, you need them to be compatible in some way. So one of your structures is the term language and one is the collection language. And so a term of collections is analogous to a product of sums. So if your term language was like the theory of a monoid, something that has a binary operation, and your collection language was Boolean Algebra, so that you could have ands, ors, and nots, then you could write a formula like A x (B or C). Intuitively we know that that’s asking for the collection A B, or A C. That corresponds precisely to distributing that times over the Boolean or in the same way that you distribute times over plus.
Greg: That’s exactly right. That’s a very useful set of intuitions for people who are interested. To set this in context or Rholang and for RChain in general, I’ve been thinking about the role of spatial behavioral types in the support of concurrent programming for 30 years. When I was contemplating doing a PhD at Imperial, it was because I had seen a Samson Abramsky give a talk in Ottawa. I was so impressed by his talk on linear logic that I went and grabbed his paper, “Computational Interpretations of Linear Logic.” It changed my life. Once I read that paper, I said, “I’m never writing a piece of software again that doesn’t have this level of correctness behind it, that doesn’t use this kind of correct by construction methodology.”
It’s been a lifelong commitment. When I read the paper, I realized that Curry-Howard, this orientation of types for programs equal formula and a logic, and programs themselves equal proofs and a logic, that that had been carried out for sequential computation but hadn’t been carried out for concurrent computation. I wrote to Samson and said, “This is what I think is the next direction, to carry out the Curry-Howard program for logics for concurrency.” Samson said, “Well, come aboard because we just started working on exactly that idea.”
So I left my cushy digs at MCC, my corner office with the most advanced computing gadgets of the day, to go work on this stuff. By the time I got to namespace logic for the Rho calculus, I had been well-steeped in that program and had a really nice grounding and a nice logic. What I saw was another mountain. I climbed a particular summit, but I was very much inspired by Samson’s works. So Samson had done an earlier generation called domain theory and logical form, in which he gives an algorithm for deriving a logic from a domain presentation, which is really close to the ideas that are in LADL. So inspired, I realized that the work that I had done to generate namespace logic had an algorithmic underpinning. If it were appropriately generally characterized, that we could have not just one, but all kinds of logics.
The reason that becomes important from an applications point of view is that logics are the basis for query languages. The most brute force kind of simple way to think about this intuition is, suppose I have some database of programs and I have some types. I could search my database of programs with a type just by pulling the programs in one at a time and checking to see whether they type check. The ones that do type check, I return as a result of the query, and I leave the other ones behind. That’s a simple way to think about it.
There more efficient ways to think about it, but that becomes important when we think about programs today. Github is a gigantic data source of programs and we don’t have an effective way to search it. On the other end of the spectrum, if you look at a system called Hoogle, which is basically using Haskell types to search the cabal, which is the online module repository for Haskell programs. Unfortunately, the Haskell types—the functional types in general—are not rich enough to give us good search criteria. As an example, sort and shuffle have exactly the same type, but they mean the opposite.
If we have this richer characterization of logic, and in particular the spatial and behavioral characterizations which fall out of the LADL algorithm, then we get some particularly powerful search tools that still stay on the right side of decidable. In particular, you can can make your logic have this kind of x-ray vision into the programs without giving up decidability of the type check, which means the execution of your query won’t take forever and never give you an answer.
That ends up being really important. If we think about applying these logics as query languages to a broader class of algebraic structures—a really good example would be the Clifford algebras. Clifford algebras have a nickname, which are the geometric algebras. Over the last couple of decades they’ve made a lot of significant inroads as a tool for reasoning about geometry.
Now you can apply the LADL algorithm to the Clifford algebras and that gives you a query language for geometry. So you get a geometric database. That’s a very different kind of animal. One impossible query that you can’t do with today’s databases is, find me all the wires that run from the nose of the airplane to the midsection of the airplane that undergo a twist. That’s the kind of thing you can write down with these logics and query against.
Now when you think about the big geometric application that’s coming online in the next decade, that’s self-driving cars. What you need there is a high-power geometric query language so that these systems can reason about the terrain that they’re encountering. So LADL has the applications that this far. That’s why it’s important to carry out the research at the level of generality that we’re carrying.
Christian: I really like to tell my friends that nowadays Google does its best to search with keywords, but we’re going to have a method of searching that is as deep as you want.
Greg: That’s exactly right. Since you used the buzzword deep, I want to make one last sort of provocative statement before we close out the podcast. In today’s market, AI has become synonymous with deep learning, but I just don’t buy it. Not for a second. There are two reasons why.
One is because these deep learning systems are not compositional. If I train one of these nets to represent the age of a face, and I train another net to recognize the expression on a face, I can’t combine them to get a network that recognizes an expression and an age of a face. That’s really problematic if we want to scale up these systems. The argument I’ve made over and over and over again is, composition is how we scale.
The other critique that bothers me about deep learning is that there’s no answer to why. The answer to why a particular face is recognized as a particular age, or why a particular expression is recognized, on a particular phase, the network can’t tell you why. There’s no justification. The nodes in the network have been given certain weights; there’s just no answer to the question why.
Whereas in older systems like theorem proving systems and resolution systems and these kinds of certain kinds of puzzle solving systems, those kinds of systems could tell you why. They can provide you a justification for the plans that they had devised or the solutions that they had come up with. That was essentially the steps in the reasoning. That’s just missing from deep learning.
But when you think about the LADL algorithm, it provides a new class of theorem proving capabilities that can provide an enrichment of this older idea of what AI was about, that I think can have a lot of weight because it is compositional and can provide you with the answer to why.
Christian: That’s really interesting.
Greg: I just wanted to, to hang that little tidbit out there. I’m getting increasingly worried about this tendency to identify AI with deep learning. It’s not. When I was studying AI, it was a much larger field, and included these other kinds of characteristics of systems. When we bring those characteristics back into play where we’re going to start to have more robust solutions. I’m just really worried about the noise that can infect these deep learning solution so that they’re just a little bit off. When you iterate just a little bit off, you get lots and lots off.