For part 2 of the series on designing Nervos CKB-VM we discuss the design process and benefits of utilizing a RISC-V instruction set.
In part 1 of this series, we introduced the Nervos CKB-Virtual Machine (CKB-VM), a RISC-V instruction set based VM for executing smart contracts and written in Rust. We covered what the program does and why RISC-V was a good fit for this project. Now let’s look at its additional design benefits.
Figure 1. The layered architecture of Nervos .
We can design a VM that has advanced language features. For example, a VM for static verification or a VM that supports advanced data structures or various cryptographic algorithms. However, all blockchain projects must run under a Von Neumann CPU architecture (x86, x86_64, ARM) and all advanced VM features must be mapped to assembly instructions on current generation of CPU architecture, there is no way around this restriction.
For example, V8 might maintain the illusion that an infinite amount of memory is available, but deep down in V8, a sophisticated garbage collection algorithm must be implemented to support this behavior on a limited amount of linear memory space. Similarly, Haskell or Idris might have advanced static type checking that can prove (in a way) that software works correctly, but after type checking is done, there will still be a layer to translate type checked operations to unchecked raw x86_64 assembly instructions.
The main takeaway here is that we cannot deviate much from current system architectures. In other words, any VM will be implemented via plain assembly instructions at the very bottom of the software stack.
Why not Use a Real CPU Instruction Set?
An obvious point comes to mind: why not use a real CPU instruction set that complies with the current system architecture for our VM?
In this way, we will not lose any possibility for adding static verification, advanced data structures or cryptographic algorithms and we can maximize VM flexibility regardless of the data structures or algorithms that we provide in the VM. In addition, with a real CPU instruction set, we can really enable developers to develop any contracts that suit their needs, providing the maximum level of flexibility with the CKB.
It might appear that a higher level VM would make implementing certain things easier, but we believe this is not the case: any higher level VM, no matter how flexible, will inevitably introduce certain semantic constraints in its design. As a consequence, a higher level VM might embrace a variety of languages with different syntaxes, but these languages almost always share the same semantics underneath for performance reasons. The flexibility of a VM will be limited this way, which is not something we want to see in CKB.
Additionally, a higher level VM will most likely include higher level data structures and algorithms. We understand that as CKB developers, we can never imagine all of the different possible ways people might use the CKB. So any higher level data structures and algorithms we build may be suitable for one set of applications, but not for other applications. These embedded data structures or algorithms might slowly become a burden, serving no purpose beyond compatibility.
Additional Benefits of a VM with a Real CPU Instruction Set
In addition to flexibility, a VM with real CPU instruction set will provide additional benefits:
In contrast to instruction sets designed for VM implemented as software, CPU instruction sets designed for hardware are hard to modify once finalized and used in chips, which makes for a very stable base compared to software VM instruction sets. This property coincides with the requirement of a layer 1 blockchain VM, because a stable instruction set will mean less hard forks, without sacrificing flexibility.
A physical CPU only requires registers and a memory block to run and space in memory is conventionally specified as it is used in the stack during operation. By specifying the memory space size during VM operation, we can obtain the stack space usage during program execution based on the stack pointer within the VM, maximizing visibility of runtime status.
CKB VM programs can adjust the stack pointer, change area allocation in memory and even enlarge or shrink the stack area size as needed, providing increased VM flexibility. Current CPU instruction sets also provide a count of elapsed cycles, allowing a running overhead status query for the VM.
For a VM with real CPU instruction set, runtime overhead is easily managed.
When a CPU is provided, the number of cycles required for each instruction execution (without considering the pipeline) is fixed. We can use a real CPU as the implementation base for the VM of the CKB.
We can design the overhead requirements for the CKB VM based on the number of cycles required for running each instruction on a real CPU. Data obtained in this way will accurately reflect the required overhead of new algorithms when they are implemented.
Utilization of a real CPU instruction set has one key disadvantage when compared to VMs that implement cryptographic algorithms through opcodes or native VM instruction sets: performance.
After the decision has been made to go with a real CPU instruction set, the next step is to choose which set to use. Commonly used instruction sets include x86 or ARM instruction sets. According to our evaluation, we consider RISC-V instruction set to be the best choice for the CKB VM.