Developer Learn Podcast RCast

RCast 34: Deep Dive Into TLA

Subscribe to RCast on iTunes | Google Play | Stitcher

RChain developer Pawel Szulc leads a discussion on TLA (Temporal Logic of Actions) with Greg Meredith, Isaac DeFrain, and Christian Williams.


Greg: It’s been difficult to get everyone in the same venue, partly because of time zone issues. As I mentioned the last time we were attempting this, I feel it’s representative of something that happens culturally. It’s really hard to get the theorists in the room with the practitioners; it’s also hard to get the practitioners in the room with the theorists. However, when it comes to grant-writing time, the theorists really need to justify their work in terms of the practitioners; when it comes to shipping and sweating bullets about the correctness of their code, the practitioners really want the assurance from the theorists that things are going to work out. 

We’re at that point where the rubber meets the road. Pawel, what we had ultimately wanted to talk about was your experience with temporal logic of actions, TLA+: what drove you to use it, what your experience was, and then talk about that experience in the context of what we ultimately want to deliver with the behavioral type system for Rholang.

Pawel: Let me tell you my story, which has been a fun experience with formal methods. I had absolutely no idea what formal methods were a year ago. I’ve been using them for some dependent type languages, but the term itself was completely new to me. Then I stumbled upon a presentation at one of the conferences in London two years ago. That presentation was prepared by Kathleen Fisher. She was talking about this program called HACMS. This was a DARPA program where they performed research on whether formal methods can eliminate an exploitable box.

As an army, they had an understanding that things like security of hardware and software being exploitable are important. They were researching whether formal methods would help. They had a tremendous amount of success, starting with a Boeing helicopter that could be driven with pilots or without them, which Boeing assured that it was safe and not exploitable. The researchers were able to hack into it within one day, which I found amusing. They spent 18 months teaching Boeing engineers formal methods, while at the same time implementing the colonel for the helicopter from scratch.

Within 18 months they were able to provide software that was completely unhackable. They had a team of white hat hackers who were not able to hack into it, even though they had all the access to the code itself. I read the paper; they go into a lot of details into how it was designed. It was mind-blowing. Having spent almost 15 years in the industry, whenever I tried to think about it, I try not to do that because that makes me crazy. We call ourselves software engineers. My parents are both engineers and they laugh at me. They always say, “Oh, you’re not an engineer. Your industry’s far away from being called engineers.”

I see in formal methods a little bit of hope. However, I strongly feel that with formal methods, our community still hasn’t derived all the terminology that is needed. But I see that formal methods can be divided. That’s not based on my own work, but from listening to people and reading their blog posts. Formal methods could be divided into two separate groups. 

One would be a formal verification of code and the other one would be formal verification of design. While you get better benefits, especially if you know what you’re doing with formal verification of code, where you can formally verify that, “This method is saying it’s going to do this. And I can guarantee you don’t have to look at the implementation. You have a formal guarantee that it’s actually going to do that thing.”

It’s tremendous and outstanding. I would just love to have a mainstream language in which I could program like that. But that becomes especially problematic if you design things where the people who you’re trying to create the software for are not fully aware of what they really want to have. Things are changing each sprint or iteration, whatever you name it. You get different design decisions being changed, and all that and each time are re-implementing that the action or any formal-like proofs can be cumbersome. 

That’s when you go into that other part where TLA shines. That’s formal verification of system design, where you can talk about the design and talk about properties, talk about environments in the design that you would like your system to have, but you don’t necessarily have to immediately write any code.

You can talk about what you would like to achieve and then you will have a model checker which you can think of like a brute force in which you can explore the state of all possible states in which a system can be and it will look for all the environments and all the properties hold. With TLA+ status, not only do you get an ability to write environments in terms of first-class logic, but you also have the concepts of temporal logic. When you can talk about things that will happen in order, we want to say, “If this thing happens then that thing must eventually happen. You can create sentences like that.”

Imagine that you have to provide software that will do this really complex thing. What do you normally do? You have a conversation with your colleagues. Maybe you write some stuff on the whiteboard. You spend time designing the idea of how this is going to work. Once you agree on what you want to have, then you spend weeks or even months putting that design into code. 

It takes time for the code to mature enough for you to find the first issues and errors with the design. Sometimes the issues are small and fixable; sometimes the issues are so fundamental that you have to spend months or weeks re-implementing the foundation. Your clause gives you this ability to verify that the very first drawings that you did on the whiteboard, the initial guarantees of the concept that the team had when designing the software.

For the process of moving this design, this idea from whiteboard to actual code, TLA+ will not give you those guarantees. It still requires whatever engineering skills we have up our sleeves going from the design to the overall high-level concepts of how the software should into the code base itself. But, as I said, when we are designing our system, we might fail twice. We might either fail in the initial design, where on our concept of how this thing should work is fundamentally broken. The other place where we can encounter problems is when going from our design to code.

Obviously, it can still break, but TLA+ gives us a language for specifying the system, and it gives us a tool that the model chapter understands the language of target specification. It can explore all the possible states in which the system can be and look for issues and problems. 

That’s what we’ve been doing with TLA+ in the RChain project. I got excited with TLA+ a few months before that. I play around with it in the evenings.I haven’t done anything professionally in it before. But then I was given a ticket. Apparently, there was a weird problem within Capser; at some point, a node that tried to join the network could sometimes hank while joining. 

It was a bizarre error. After investigating it and looking at the code I realized it was just a typical problem of a message being dropped and not being delivered. Our code was not robust enough to handle that situation, but the fix was easy. Then I wondered how many other issues like that exist in our code and in the current protocol that we have when we can put them on Casper. 

There are two scenarios. One is the Genesis ceremony and the other one in nodes joining the network. I thought this would a fun experiment; I would specify the whole thing in TLA+. I underestimated it. A thing that was was supposed to take less than a day took a day-and-a-half, but eventually ended up almost completely in a specification of Casper in TLA+, when it comes to how protocol in Casper works.

We didn’t specify Casper itself; we didn’t specify blocks being added, messages, and all that. It was more about the messages. Originally when I joined the project, I was part of the comm layers team. This was the issue I was most concerned with. I was just specifying what kind of messages are being exchanged and what environments we wanted to hold. The environments were simple. When we instantiate node, it goes through some state changes, especially during the Genesis ceremony, but also when it joins the network. It goes through some state changes and depending on which state it is in, it will handle incoming messages differently. I was able to easily specify each of the states in which the node can be and also specify how it handles incoming messages.

Temporal environments were simple. I would say things like, “Once we get a new node, eventually it has to end up in a state called running.” The running state is the state that we call the final study in which all nodes have to end up after going through the Genesis ceremony or after joining an existing network. 

It’s quite rewarding that when you look at the specification as it is defined today and it’s available in our Github, when you look at the specification, especially that parts at the very end where we describe how the node reacts to different messages depending on the state it is in, if you’ve seen the code base, you can see directly that this is the direct connection between the specification and the code itself.

If you know a little bit of TLA+, you almost read that specification in plain English. You see things like, given this state, given that message, this will happen. You have a few lines of specification telling you what will happen. That’s pretty much the whole story. 

While specifying in TLA+, we found the bug that we already knew about. Before implementing the software, I was able to fix “fix” the bug in the TLA+ specification. Once that was fixed, then the model checker immediately hit another error, which we were not aware of at that time. But it was in the protocol. That allows us to apply another fix in the specification, and then the whole spec started running correctly. All the environments and properties were holding, given the way the model checker was running. That gave us some sort of guarantees that the protocol that we have right now is correct.

Christian: This is really cool stuff. I’m just curious to start learning about TLA as I’m trying to contextualize how you can search such a large space of possibilities and to be able to detect conflicts and runtime. What’s the main reference that you recommend for TLA?

Pawel: TLA+ comes with two different tools. One actually allows you to write proofs. But the thing that allows you to use a model checker and explore all the possible states in which the system can be, which with a complex system that number of possible states just explodes to almost something like infinity. So yes, sometimes you have to limit the number and tell the model checker to not explore this area anymore. 

To give you an example, let’s say you want to model a network layer. For every process that you would model in that specification, it has an incoming queue and outgoing queue. When you want to send a message, the message derives an outgoing queue, and when the message is sent over the network, it disappears from that outgoing queue and that message ends up in some other processes’ incoming queue. 

Then you can model things like messages being dropped. For example, being removed from an outgoing queue and not arriving anywhere. Obviously, nothing limits the size of the queue itself. We can imagine a system that keeps on trying to send messages and they never pop out from the queue. You keep your specification clean, it has no limitations or issues, but when you create a model, you can actually say, “do not explore where all these states are bigger than five elements.”

There might be a bug in your system where your queue reaches six elements. Then you will never have an ability to find that bug because you explicitly told the model checker to stop exploring when you have more than five elements. So yes, you stumbled upon those limitations. 

When I was running the model checker for Casper protocol, I had to put those limitations exactly on the queues so the system could look send this tremendous amount of messages on the queue. The number of states would explode to infinity. 

As for resources where you can learn about TLA+, different people might say different things. TLA+ is a specification language that uses some elements from maps, from set theories from logic, but it is a language that is straight forward and figures for functional programmers. What you write are basically formulas. They are formulas as an argument that take two states. They take the state in which you are currently in and a possible next state to which the system can potentially move to in the next step. The formula that you write as an argument takes those two states and return type of reformed functional formula is a boolean value. Whether we allow transition from state one to the second state or we don’t allow it.

You also need to provide a formula like an initial state formula. Having only those two formulas, the model checker can just explore all the possible states, because it starts only with an initial state and then it’s using that second function of only returning true or false.

In the next state, let’s say B, and the formula says “false.” Can I move to state C, and the formula says “true.” Then the model checker can actually transition from an initial state to the state C. In its core, this specification language is very declarative. Whenever I show it to functional programmers, they quickly grasp the concept. However, people who tend to write software in a more imperative style, they tend to struggle because whenever they think about states and state’s transition—a state is just a set of variables with given values for those variables—they always think about a variable X changing from one to zero. 

When they look at that language, it looks almost like there are assignments in it. Even though the formula itself is returning a boolean value, true or false, whether I can go from state one to state two, the syntax that is being used looks a little bit imperative. Especially if you come from an imperative background, it looks appealing, it looks like assigning new values to existing variables, but that’s not what it is. People struggle with that. Then some people create a new language called Pascal.

It’s a language that has two different varieties. One mimics a tuple Pascal; the other mimics a little bit of C syntax. Those languages compile down to TLA+. Essentially, you write your specification in Pascal and then the system will compile it down to TLA+. I don’t like this approach. They try to use a C syntax that was designed for talking about statements being executed one after the other, wherein any imperative language we currently have, the determinism is basically written in their core. You have no deterministic things in it; that would be just crazy. 

When we specify things in TLA+, we want to be non-deterministic. You want to say either the message will be dropped or delivered. We don’t want to put any constraints when this happens.

Coming back to the question about resources, Leslie Lamport wrote a book about specifying systems. It’s not fully up to date, but the core concepts stayed the same. Leslie Lamport tried to write a second edition of this book, but then he said that eventually he learned that people no longer read books, they only watch Youtube videos, so he started recording tutorials about TLA+.

By the way,  those videos are amazing. They are hilarious. At the same time, you learn a lot. They are no longer than 15 minutes. Even if you’re on a commute, you can put your headset on. But if you want to go like into hardcore details, then Specifying Systems is a very good book. 

Greg: That was a really nice presentation of TLA. I want to point out that we can map all of those notions very nicely onto the Rho calculus and Rholang. In particular, the notion of state in process calculi, in general, is a process. A process is a state and processes transition via the comm rule. They transition from state to state via comm rule. When that transition happens, you accumulate a substitution. The substitution maps directly onto your idea that there are some variables with values. The substitution is going to say that in the case of the Rho calculus that was sent is now bound to some variable in the body of the for comprehension.

The substitution represents the mapping of variables to value. All of those ideas line up exactly with elements in the Rho calculus. It allows us to transport the ideas of TLA over to the Rho calculus whole cloth. That’s exactly what the behavioral types for Rholang do, but they do it in a different way than TLA. 

The formulae become types for Rholang programs. Instead of checking your design, you’re actually going to use these TLA-like formulae to check your running code. The code that you deploy can be model-checked to see whether or not it satisfies a particular formula. The big gap here, as you pointed out on several occasions when we’ve talked in public, Pawel, is that you have to be the one that provides the connection between the model that you’ve built and the code that you actually deploy. That lives in your brain as opposed to in any execution artifact. 

The advantage of a behavioral type system is that we can get all of the goodness of a system like TLA, but instead of on a model that’s only loosely connected to deployed code, it becomes directly applicable to deployed code. Does that make sense? 

Pawel: Yeah, but I have a question. I spent some time researching behavioral types. I couldn’t find that much information about it on the web. One of the things that I found was an old document about why RChain is needed. One of the chapters in the document was talking about behavioral types. Correct me if I’m wrong, but I understood that behavioral types provide a way of writing proofs about software, about your program, your function, whatever they will deliver, the same way languages like Dafny allows you to write proofs.

Greg: It’s not about proofs. Let’s go a little deeper because that’s a really important question. It raises all kinds of fun and interesting avenues to explore. First of all, one of the best papers for behavioral types is Luis Caires’s paper, “Behavioral and Spatial Observations in a Logic for the Pi-calculus.” The information density of that paper, which is one of my favorite measures, is perfect. Every page is packed with enough information to keep you completely engaged, but not so much that you’re drowning. I encourage everyone who wants to learn about behavioral types to read that paper. 

Having been strongly influenced by that paper (and other background material) I wrote a paper called “Namespace logic.” If you want to understand how the Rho calculus is behaviorally typed, that paper, which was published in the trustworthy computing at Etabs in 2005, provides that. There’s an implementation of the core concepts of that paper in a repo called Rhocamel under the lighthouse Github account. 

You can see that it will do the model checking, at least in part, for a lot of the different kinds of formulae. Then there’s also another piece of software called the Spatial Logic Model Checker (SLMC), which implements Caires’s behavioral types for the Pi calculus. In the SLMC, you can write down models in a Pi-like language; you can see the family resemblance to Rholang. Then you can type check them on the basis of the models. 

I strongly encourage people to go through those examples. If you go to the Spatial Logic Model Checker website, they provide a number of fairly convincing examples, starting from really simple things like a gossip protocol. In that example, you have a network of processes, one of which knows a secret. They all follow a protocol whereby they whisper to each other. In the end, you can prove that by following this whisper protocol, they all know the secret. 

You can do more sophisticated things, and I encourage people to play around with it, where you can have them only whisper to parties that have certain properties—they know of a username and a password or they know some other kind of authentication mechanism. In that case, only those authorized parties will learn the secret. Those who are not authorized will not learn the secret. 

Those are properties that you can write down, both at the protocol level and at the property level, and check that that’s the case. They do some considerably more sophisticated protocols in those examples. SLMC and Rhocamel both provide running examples. 

What’s interesting about this approach, as opposed to TLA, is TLA has no vision into the structure of programs. The temporal logic of actions is only talking about the evolution of programs. It doesn’t talk about how programs are structured. For example, you can write down a formula in the spatial logic or the namespace logic, which represents a notion of single-threadedness. You can prove structurally that the program is single-threaded. That’s something you can’t see with the temporal logic of actions. 

I don’t remember whether TLA is expressive enough to do this, but you can write down deadlock freedom properties in behavioral logic. 

Pawel: Yes, you can do that.

Greg: Notice that in the two kinds of examples that we’ve talked about, one is a security property. We want to make sure that information isn’t leaked from one group to another. In fact, we can be very sophisticated about these kinds of information control flow properties. The example I often give, because it’s both relevant and poignant, is you can imagine a doctor coming back from Sierra Leone having done volunteer work. They dutifully get checked to see if they have Ebola. They get checked, but there’s a mistake in the lab, so they’re cleared to go back into the general populace. But in fact, they have Ebola. 

Once it’s discovered that we now have an agent in the general populace in a big city that has Ebola, there’s a kind of control flow that you want. You don’t want to alert the media before you alert all the major health care providers. First, you want the group of major healthcare providers to know that there is a serious risk. After they are all prepped and everyone is apprised of the situation, then you want to alert the media. Being able to prove that information reaches a particular group before it reaches another group is the kind of thing you can actually write down in these kinds of logics.

Another kind of thing that you can do is not about security or information flow control. You want to be able to write down whether or not progress is made or whether or not things get stuck. These kinds of control flow or liveliness properties are also things that you can write down in these kinds of logics. The nice thing, in the case of Rholang, these logics allow our types. You just do a type check as a part of your overall development process. This is important. I thought about this quite a bit when I was thinking about the best place to introduce formal methods in the developer flow. 

Any developer will tell you that the biggest challenge to developing is context switching. If I’m working in a particular tool and I have to go over and work in a different tool, I can lose my concentration or lose the thread. Those kinds of interruptions are the most dangerous in terms of the committing errors. But if programmers are used to a particular right edit-compile, and that compilation step includes a type check, if that’s a part of their existing flow, you minimize the committing errors. That’s why it’s very natural to place it at that point in the development cycle. 

Let me widen the discussion, as this has largely been Pawel and myself. We’ve got these two mathematicians in the room. I want to hear their input. When you look at the behavioral types from the perspective of the LADL algorithm, which widens this to not just a programming language or toy programming language like the Rho calculus, then you can start to see these wider connections.

For example, the formula that picks out single-threadedness in the Rho calculus or the Pi calculus is the same formula that for a monoid picks out primality. It picks out those elements in the monoid which are prime. It says that there’s something about primality that is single-threadedness and this captures that notion exactly. It’s a pretty cool connection. We don’t see until we’re working at a particular level of generality, but once you establish those kinds of connections, my feeling is that it becomes another tool in the toolbox and helps make the connection between theoreticians and practitioners. Let me open it up to Christian and Isaac and get your feedback on the discussion so far.

Isaac: I have a question about TLA+ as I don’t have much exposure to it. I did watch a few of the videos and I have to agree with Pawel: Leslie’s videos are pretty hilarious. He teaches really well. He’s been in the field for 40+ years and obviously knows what he’s doing. I would definitely recommend checking those out. 

I understand TLA+ is a language for specification and you’re essentially writing in the language of set theory and temporal logic. You said that there was a way that you ran some models. I was just wondering how you actually found the particular bugs that you did find in running these models and how they actually present themselves. I feel that’s a disconnect that I currently have.

Pawel: The bugs that I found, I put a single temporal property on the system. First of all, I’ve modeled a concept of a node. A node was nothing else but just a few variables. One of the variables was just the state that the node can be. I don’t know how familiar you are with node transition states, but it was just a few states in which the node can be, they all end up in this state called “running.” Essentially, when the node is in state “running,” you can consider it a healthy running node that can do all the cool RChain stuff. 

But you have to get there. There’s a process of how you get there, depending on if you’re instantiating a new network. So you’re running the Genesis ceremony, or maybe you have a new node that is joining an existing network. I put a few of those nodes on array. I already have a concept of a network of nodes running on the network. 

Now I need the nodes to communicate. Each of the nodes can have a variable: outgoing messages and incoming messages. Those variables will be queues and queues can be represented as sequences: first in, first out. Then I need to model a network layer. 

Now the question is: What does it mean for a message to go from one node to another? If I’m trying to send a message, I’m putting this into an outgoing queue and sending messages, removing that message from an outgoing queue on one node and adding it to an incoming queue on another node. That’s the whole specifying the network layer. It’s less than ten lines of TLA specification. There’s no additional magic to it. We only define messages transitioning from one node to another. We also modeled a concept of messages being drugged. They don’t arrive at their destination.

Greg: Notice that the process calculi really have this covered in spades right there. The state transition is communication. 

Isaac: Right, instead of almost the other way around. 

Greg: Pawel has explicitly modeled that part of the picture, whereas for the process calculi, it’s built in. It’s that notion. It’s done at a level of abstraction where you could treat that as message passing across the network or you can treat it as shared memory. Names can be modeled as memory locations and now it’s communication via shared memory. In fact, all of those styles of communication have a natural representation in the process calculi. You don’t have to model that part of it. The modal operator with the logic is now directly related to the communication action. 

Pawel: If that modeling of communication layer was more or less understood, as I said at the beginning, there was this concept of a state in which each of the nodes can be. Those are state transitions where we go from one state to another, depending on which messages arrive and what kind of messages and which state you’re apparently in. 

I’ve written this simple temporal formula in which I said, eventually, all statuses for each of the nodes are equal to running. In other words, eventually all nodes are in state “running.” This is a single-line formula. You use quantifiers from logic for all nodes such that status of given node equals to run it. Eventually, over time, this is where I want to be. 

This is pretty much the liveness property of our system. I put some basic safety properties to check if things are working, such as no weirdness will happen. But I was more concerned with the liveness, because that was the initial area that I was assigned to in Jira was the liveness issue. The node was not progressing in the state transition. Now I have these elaborate properties, I will check that all the nodes eventually should end up running state. Then I run it and I found the first bug in the protocol itself, which was fixed. And then we found another one, which we also fixed.

Isaac: When you say that you found a bug and then you fixed it, how exactly do these bugs present themselves to you? As something that’s not satisfied that you expect to be satisfied? 

Pawel: We agree that I specified how nodes interact and how they communicate and how they send messages. I also specified a liveness property. I eventually want this thing to happen. I want this to be in this particular state. Then I run the model checker. I did something like a very simple environment: imagine that there are those three nodes which are running and here’s a node which is in state “new,” where each node joining the existing network starts in the state “new.” Then I run the model checker and after a few minutes or seconds stopped and said “temporal property broken”—I don’t exactly remember the link, whatever was being used—and then they gave me another example. They gave me exactly why what happened. That’s pretty nice because you follow the steps and then each of the steps is the name of the formula that we created. You write this next step formula, but you can combine it from set formulas, and then you can join the formulas with logic or this or that or that or that. Then each state gets its own name. You almost read it in plain English: messages being sent, messages being dropped, a node transitions to this state. But you can also get a God’s eye view of not only the given node, you understand all the nodes and all the states at each of the steps, how the variables were we’re changing and all that.

If things like temporal formula breaks, you either end up in a stuttering step or the loop and they just never reached whatever was supposed to be reached, you can see exactly why that happened. It was straight forward. I was going through this counterexample: okay, he sent this, that’s good; he sent that and lost his message, that’s not a problem because he’ll resend it, and he did resend it, but then he did something crazy and then I go back to code and see that it’s wrong. Then I fixed it and it worked.

Isaac: That’s really cool. It makes sense because, of course, how does it check that your formula’s not satisfied? A counterexample that doesn’t satisfy the formula and then there you go. It’s not satisfied. They can just give you that counterexample and then you can trace that execution path, all the state changes from the initial state to that point, then figure out exactly what went wrong. 

Greg: Notice that that’s exactly the kind of feedback that our type checker will be providing to developers. The type checker will say, sorry, your code doesn’t satisfy the formula, and here’s a trace that is the counterexample. This is just like what happens in Scala today when you fail the pattern match. It provides a counterexample. It gives you a data shape that won’t pattern match against your case selections. 

Pawel: Yes, so having something like that on the compile-time level from the user perspective.

Greg: Yes, exactly. This is really the thing that will begin to set us apart. The behavioral types set RChain apart at one level because it provides the kind of security that we’re going to need at the point where we’re running at 40,000 transactions per second. Imagine having these kinds of liveness issues or these security issues and no way to automatically check them. As a result, you’ve got these smart contracts, which are the vast majority of the exposure for these kinds of errors. There are only a handful of developers that are writing the core protocol, but then there’s the entire world of developers we hope will write mart contracts. The exposure to the commitment of these kinds of errors is much greater in the smart contracts than it is in the core protocol. 

Code auditing does not cut it. The DAO contracts were audited front to back, left to right, top to bottom, and there were still errors. This is the same as the parity wallet issue. Code auditing doesn’t cut it and we have to do something else. That’s what these tools and techniques provide. 

If we don’t do something else, when we get to Blockchain 2.0 kinds of speeds where we’re running with Visa and other world-class financial systems, then you get an error like this and you’re toast. In a heartbeat, at 40,000 transactions per second, in one second the network is taken down. And the only way to recover is a hard fork.

Pawel: Right now you’re addressing people who are already sold to this idea of formal methods and all that. But there will be people listening to this podcast who follow that idea that the only things you need are discipline and a little bit of testing.Tests are really good to link for reaching correctness, but they are not the only tool that we have up our sleeve. 

There’s also a very cool example of where they might fail. A few years ago back in the extreme programming area when they had this Wiki page where they were discussing different problems. Each week somebody would show up with a problem. The idea was to write a set of unit tests that will show where the error is and potentially how to fix it. There was a person who came up with a simple Java program that was running too few threats. Some threats were adding elements to the queue, other elements we’re removing elements from the queue.

He said there’s a bug in this code. The idea was to write a test that will show it. When people tried to do that and they spend like weeks on it and some like very well known people in the industry were reaching the conclusion that there is no bug in the code, this code is correct. Then this person who showed that challenge that there is a bug. He showed them a set of execution steps that were deadlocking this program. Even after he showed them where the deadlock is, he asked if they could now write a unit test that will show the error. 

One guy did, and he measured that in 16 hours working on that unit test, he found twenty-something or thirty-something more threats, and this test had to be run 200 times for it to fail with a probability of 50%. Someone else specified this problem with TLA+ within 30 minutes and already had an answer.

It’s really nice that the industry finally grasps this concept that writing tests for your software is not such a bad idea after all, but we shouldn’t stop here. The nineties already happened. We are in the 21st century. We should move forward. People should explore more, because no matter how cool the behavioral types will be, no matter how amazing RChain and Rholang will be, people still have to use that in order to explore those techniques. Don’t stop with a status score, always look for other ways to make yourself for more robust. 

Greg: I never stop looking for ways to make our code more robust. In particular, it’s not just the tools and technologies, it’s also methodology. Really simple things like: don’t check in code that doesn’t compile. 

In this case, if someone were to have the behavioral types, then they would make sure that they never check in. That doesn’t at least satisfy certain liveness safety and security properties, which I think is going to be a huge win. When we’re talking about something replacing the financial and transactional infrastructure, we have to have a level of assurance beyond the client-based website, where it literally says, “Don’t click the pay now button twice.” We’re talking about a different level of assurance of the quality of the code, which is a requirement in order for these networks to actually have utility.