Subscribe to RCast on iTunes | Google Play | Stitcher


Greg Meredith and Kayvan Kazeminejad discuss dApp development on RChain through the lens of RCAT.

Transcript


Greg: As we went out to the stratosphere over the last few podcasts, I wanted to bring it back home to the very practical, which is building applications on top of RChain. Kayvan, I know that you’ve done a lot of work with the RChain Asset Tracker [RCAT] backend. Recently, you and Kent did some bug hunting and I was wondering if you could talk about your adventure—what brought it about and how you guys ended up resolving the matter, and how RCAT fits into the whole ecosystem here. 

Kayvan: I think it was version .92 of RNode. We started seeing what we thought was a race condition—basically, intermittent results in testing the RSong and RCAT infrastructure. Without getting into a lot of detail, the way RSong and RCAT persists assets into blockchain—in particular into RChain—and then tracks them through RNode and through RNode APIs—in particular, the deploy API, and finally the propose API—in order to save bandwidth and throughput and be more efficient, we bulk the payloads. 

Instead of just doing one asset at a time and storing it on the chain and finalizing it, we stream it and then we bulk it into—depending on the circumstances, depending on what else is going on into anywhere from a handful of assets to a hundred assets—then we do a deploy and a final propose. 

As you can imagine, when we start debugging, it’s going to be hard, because you’ve got to iterate through that bulk list to figure out where things went wrong. That’s where a lot of the debugging was coming into the picture. What we were seeing was some of the deployments were getting through and some of them weren’t, then it will pick up again, so it was intermittent. Naturally, after talking to Kent, talking to you, going through the logs, we were leaning toward the race condition. As Kent and I dove into the code, debugged it, we just couldn’t identify any race conditions. 

Greg: I remember looking at the Rholang code and going, “There are no race conditions.” 

Kayvan: Yeah, you didn’t see one, we didn’t see one. Interestingly enough, this is the same Rholang code with very minor modifications since day one. I was hard-pressed to imagine it from there. We were looking at all the hard places to figure out where this thing was coming in. 

Finally, Kent had the a-ha moment. It turned out that as part of the deployment, one of the parameters was expecting—even though the type was long—but it was expecting an integer and I was passing it on max long, and naturally, that was creating an overflow in someone of the deploys. 

In addition to that, some of the exceptions weren’t coming up into the console, and that made debugging harder. Once Kent identified that we were able to fix that part, add additional logging, and now we get very nice, concise error messages when such things are occurring. At the same time, the team is looking into changing the type so it would be more declarative as far as what type of parameters need to be passed in.

Greg: If it’s responding that way probably was swallowing an exception, which is dangerous, at a minimum. It was really good to find this particular roughness around the edges early because these kinds of swelling exceptions lead to these erratic behaviors that look like race conditions, when in fact they’re integer overflows or things like that. It was really good that we found this and that started to shore up that piece of view of the API.

Kayvan: I totally agree. I’m just repeating what you have been saying. There are a lot of different opinions about RSong and RCAT, but I think from day one it’s been an extremely valuable way for us to measure performance to stability and all different aspects of RChain, RNode, and our infrastructure. 

It’s a lot of code. No matter how much we try and how many unit tests and integration tests we have, unless you put it into real-life dApp development, you won’t really know. That’s where RSong has been valuable. 

Greg: I agree. Absolutely. One of the ways. 

Kayvan: That debugging definitely taught me a lot of valuable lessons. It also opened my eyes into how the persistent model works, cause that’s where we thought the race condition might be coming from. I started looking in that part of the code—in particular, RSpace—so much so that I joined the RSpace team to start doing core platform development, which I’m extremely grateful and excited about.

Greg: That’s great. There’s a powerhouse team, to say the least, but it’s great to have you on board there. That’s really good.

Kayvan: It certainly is a powerhouse. I’ve been doing software for a long time and this team is awesome. It’s really a pleasure to work with them. It’s an honor.

Greg: Tell us a little bit about RCAT deployment and how people might use RCAT.

Kayvan: Thank you for asking. That’s a really good question. One of the things that we’ve been working on is to make the deployment model of RSong, and in particular the infrastructure of RCAT and RNode more seamless. What we want to do is have a deployment model for RCAT, and at the same time RCAT by itself is not going to work. We also want to have a deployment model for RNode, where we bring a number RNodes up as validators, and then have RSong start interacting with those. 

These two are wired together. RCAT builds everything on Google Cloud. We could have easily done it on any other cloud, but we chose Google Cloud. The whole thing basically boils down to a number of declarative YAML configuration files.

Once the code is checked in, there was a cloud build YAML file that Google caught picks up and start building. It builds a docker image. It basically builds everything inside of a docker, and then it publishes into a Google registry. We don’t have any dependency on any particular built machines since we build inside Docker; we don’t have any particular dependency on any infrastructure because we deploy as a Docker image. Hopefully, that makes some sense. 

Greg: That makes a lot of sense. I guess that means it would be pretty easy to deploy this on Azure, for example. 

Kayvan: Yes, it will be very easy to deploy on Azure. It would be very easy to build it on Azure as well—both, for that particular reason we built and deployed it as Docker images. When I was looking for references as to how to build this, a lot of the material that I looked from helm and helm charts and Kubernetes and Google Cloud infrastructure, were coming from Azure. They’re heavily into this way of building a great zero dependency on platforms. 

What we had to do was add two more of these descriptive files, these YAML files, to the main RChain project. So now as RChain is being built, we also build RNode inside Google Cloud, and then register those images, and then deploy them and with the SRE help—in particular, Tomash—we have a handful of Python scripts that we’ve added to this. 

Basically once the Dev branch in RChain is checked in, these built scripts get triggered, the associated Docker image will get registered with Google Docker registry, and then they get deployed into a Kubernetes cluster where we have persistent storage for RNode. We form three or four validators. It’s configured, but we haven’t really tested it to that extent. That takes care of our RNode deployment. 

At the same time, the same philosophy applies to RSong and RCAT. Once the code gets modified and checked into Github, we start building and deploying through Google Cloud, and we deploy to Google Cloud, and then these connections just start working. Everything is hot deployment, meaning there is no downtime. That’s one of the nice things that we get from being able to run in a Kubernetes cluster, as well as being able to handle with a volume of load. 

We haven’t done a lot of experimentation with being able to scale up and down with RNode yet. The RNode part is work in progress right now. So stay tuned. I’ll keep you up to date as far as our progress goes.

Greg: Are there materials, where someone who might want to take RCAT and tweak it a little bit and turn it into something else, is that available? Do we have any documentation or is that something we still need to cook up? 

Kayvan: The answer is yes and no. I need to spend more time on documentation for that very specific use case that you just mentioned. There is some, but it could use some love. 

Greg: Understood. We’ve got a lot of irons in the fire. 

Kayvan: Certainly, as long as they replace it every time they see a song or music, just replace that with a binary asset. Then yes, then that should work. I do need to add more documentation to add more color to that part of it. 

However, as we’ve discussed, there is really no restriction, whether these are songs or images or videos. 

Greg: Basically, any digital asset, it seems like. And it’s not just a digital lesson, it’s a packaging of digital assets. So it’s digital asset plus metadata associated with it. Is that correct? 

Kavyan: Yes. The way it works, we try to loosely couple different aspects of RCAT and basically look at things as different entities—or different assets that come at different times from different locations—and there will be a manifest file that would declaratively tell us how we can aggregate these different parts into products (or into assets). 

For instance, as you mentioned, a song could have a couple of binary representations of the actual music asset. Then it will have metadata. A manifest file would tell us that for this song, these are the particular parts that need to exist in order for this song to be publishable. In terms of, I’m using some of the RCAT terminology side—the acquisition part, we look for this manifest file. When the manifests fall exist and we realize that all the parts are there, we provision the asset. The act of provisioning the asset means that we have all the parts and we can formulate a product. 

Once it goes through that pipeline and we vet that everything is valid, the metadata is valid, we know this asset, we know the owner of this asset, and all the things are lining up, now the asset becomes publishable. When an asset is publishable, it’s searchable. Basically, at that time, the asset has gone through the entire pipeline and is ready to be consumed. We’re keeping track of all this—as well as the consumption pattern—through the contract. Tokens are associated with the consumption pattern and those tokens are managed through the contract.

Greg: That’s great. It feels like a pretty comprehensive architecture. There are two aspects to digital assets. One is, when the digital asset is the thing in itself—we’re talking about a song or a video—the digital asset is a thing in itself. Sometimes, the digital asset is a record of another thing. 

For example, when you’re talking about telecommunications, oftentimes you’ll have these digital asset databases of actual network hardware. That’s how people keep track of what’s out in the field. It sounds like, based on the way you’ve done it, the publishable part could correspond to, “yes, we’ve done a verification that in fact the record corresponds to what we normally check to see that the record is faithful to the representation in the field,” which is often an important thing. 

Sometimes an engineer will change out a network part and forget to update the database or they’ll update the database in advance of actually getting a part out to the field. They have to go through this process of verifying that there’s a faithful representation of the record to the physical asset. It sounds like the way you’ve organized it, that would be easy to slot in.

Kayvan: Yes, exactly. That use case could be met. I can’t tell you that we’re doing that today. We would need to add a little bit of coding to do that. The whole concept of separating the provisioning and publishing is to address such use cases. 

The one that I had in mind when I was doing this was: these assets often have policies that are associated with them, as far as where a certain asset can be consumed, what time they can be consumed, on what dates, expiration dates, start date, so forth and so on. 

It’s a prudent approach to separate the whole concept of provisioning and publishing. You could provision the asset, even though the policies that are associated with this may dictate that this asset cannot be seen in such and such geographical locations. The publication will be focused on publishing it in certain other geographical location. Or you might want a temporary takedown on an asset. Even though it’s provision, you want to take it down because there is some controversy. It’s a prudent approach to separate them, to address the very use case that you mentioned and these additional ones as well.

Greg: Those kinds of use cases, they span both the digital asset domain, which a lot of people are familiar with, a takedown on Youtube, but also a physical service kind of access. In telecoms, we know what it means to deny service because someone hasn’t met a billing requirement or whatever it is. 

It sounds like it’s moving in the direction that supports a very wide range of use cases. Another aspect that I think about—and I know there’s a lot of debate on this—but why would one put data on the chain? I know for myself, I don’t think it’s a necessary component for certain kinds of applications. When we were first contemplating this, quite apart from the fact that we just wanted to show off that RChain could do this, the debate that needs to be covered is: if you don’t store these assets on chain, that means that some third party is holding the assets. I’m curious, what are your thoughts about this, even Derek, what do you think about this idea of being beholden to a third party versus community ownership of the data? 

Derek: We’re seeing on a regular basis the problems of centralization of the ownership of data. When you have one point of entry for an attack from a hacker, then they have everything that you need. To me, working a year in this industry now, one of the most important evolutions of this entire data structure and everything that’s happening with blockchain is the ability that hackers will no longer have one point of entry. 

Besides that, do you really want one person having control of all of your information? I was listening to Michael Lewis’s new podcast, where Citi Group was hacked and someone got control of his social security number and was able to rack up $16,000 in charges that had nothing to with him. Now for months, he’s been trying to get this problem taken care of. 

The law right puts the onus of this completely on him, not Citi Group for being hacked. Everyone wipes their hands clean. His credit score has been ruined and he can no longer take out loans or anything of that nature because people were able to get into one point of entry and then basically ruin lives. From my perspective, that’s one of the most important aspects of what we’re doing here. 

Greg: I have similar kinds of concerns. There are also other concerns. There are datasets that are terrifically sensitive. One of the attacks that Lawrence Lessig talks about with respect to blockchain is: What happens if someone attempts to store child pornography on the chain? Does that infect the whole chain? 

If you have the entire community essentially footing the bill for the data, then you can also point the finger. You can say, “This is the community that’s footing the bill for this data.” Without going around identity concerns, you can give a lot of information to law enforcement with respect to sensitive data if the community is responsible for maintaining the data. Now they’re the ones that have decided that they want to support the data being there. 

If there isn’t something like that, if it’s squirreled away in a dark hole on the internet somewhere, it becomes a lot harder for law enforcement to go after. Now, of course, if it’s just a single point of entry, you might argue that that might be easier to track. My sense is that if there’s a larger surface to go after, that gives law enforcement a better shot at being able to track down the sensitive data. At a minimum, I think it’s worthy of debate. We’re at a point now where we can have these kinds of debates. 

Before, when it was just the Google, Facebook, Amazon model, we couldn’t really have these kinds of debates.  Blockchain, and in particular RChain, where we’ve got the kind of scalability that makes these things possible when now we can actually open up the debate. Kayvan, any thoughts from your side?

Kayvan: The part that concerns me a lot, everything today has been really consolidated. You run on AWS or you run on GCP. There is nothing else. They can dictate as to a lot of the things that you just mentioned. I definitely don’t want to wake up one day and then realize that Amazon has a different policy and now all my assets that have been stored on Amazon would have to comply with some policy that they’ve decided. 

I look at this thing, first of all, as an awesome technical challenge that’s extremely intriguing, and at the same time, de-referencing and getting rid of this third trusted one individual or a handful of individual parties, is extremely valuable. Particularly, where I came from, my background and all that, it’s very intriguing for me. I definitely feel better when that third party is a community I’m part of, whereas a community that I had to join because I had no other choice, or I was born into it or, or any other circumstances like that.

Greg: I certainly understand what you’re saying. That speaks to the geopolitical appeal of blockchain in various regions around the world. I wanted to ask one last question. What do you think about this kind of approach to a blockchain-based Github? I wake up often at night wondering what happens if Github says, “We think your code is very valuable to use, it would be a shame if something happened to it.” I’ve been wondering about RChain (and blockchain in general) as the next-generation code repository. It certainly is going to be that way for smart contracts, but not all code is smart contracts. There’s going to be lots and lots of code, which are digital assets that are not smart contracts. I wonder if RCAT might also be the basis for a code management system or code check-in system.

Kayvan: I have the same concerns. It wasn’t very long ago when Github was purchased by Microsoft. I’m not saying anything one way or the other about Microsoft, but the very fact that we woke up one day, as people that depend so much on everything that we do on Github, and the ownership changed right beneath us, it puts those kinds of thoughts that you just mentioned in our head, that we have no say on any of this. It’s quite possible that tomorrow a different order comes in, or a different policy gets exercised with Github, and we may or may not agree with those line of thought. Everything that we do from the time we wake up until the time we go to sleep is in Github. It is for me, it is for you, as it is for almost all developers.

If we take the technical aspects of Github out, it’s a social sharing, gathering, tracking, versioning, asset-type management. It’s a perfect match. It’s a natural fit for blockchains, where communities can collaborate and vet on certain things, whether it’s code or music or assets, it shouldn’t matter. I’m totally with you on that. It is a huge concern. Blockchains are positioned well to provide those kinds of functionality as Github does. That’s the way that we should move. 

As far as RCAT goes, if we take all the fluff out and just look at the kernel of it, it is an asset tracking system. It started as a POC for an asset tracking system, and then we added more functionalities into it so we could do acquisition. We added more functionalities to it so we could easily deploy in any environment. We basically added more layers to it. But at the heart of it, it’s an asset tracking system. It’s really geared well to do those kinds of things as Github does. Naturally, we would need more love to get there, more development, but the kernel is there. I’m definitely on board on that initiative. It is a scary thought that we rely so much on something that’s owned by one corporation.