Minimum viable light client via FlyClient-style checkpointing

mathoticus · June 18, 2020, 11:13pm

Using header_deps, we can use CKB to capture previous block headers and accumulate them into a “Difficulty MMR” as described in the FlyClient white paper, forming the base of a super-light client that can be implemented today without consensus changes. More information about this idea can be found here.

The light client can request the latest MMR commitment transaction and with adequate SPV-style block confirmation be certain of that commitment transaction’s validity.

The MMR root produced through this transaction provides the light client the ability to validate any prior block’s inclusion in the chain via a Merkle path. (header deps do lag the chain tip by 4 epochs, SPV-style verification can be used for any blocks in this window)

To validate the integrity of this value, the light client does FlyClient-style validation, requesting random blocks (randomness based on latest block header) and proofs of their inclusion in the MMR root.

FlyClient CKB info cell

The implementation of FlyClient on CKB begins with an “anyone can unlock” cell that allows for a user to add to the MMR. More information about MMRs or “Merkle Mountain Ranges” can be found here: 1, 2, 3, 4.

FlyClient uses a “Difficulty MMR” to capture difficulty information at each MMR node and use it later to do difficulty-based sampling (so sampling is spread according to accumulated difficulty instead of number of blocks). The maximum capacity the data of this cell would occupy is 3,103 bytes.

The cell contains the following fields:

byte32 MMR_ROOT //current MMR root
struct MMR_PEAK {

byte32 peakValue //root of sub-tree

uint128 accumulatedDifficulty //accumulated difficulty of sub-tree below the peak

}
uint8 HIGHEST_PEAK //the highest peak of the mountain range
uint64 PREVIOUS_BLOCK //previous block processed by cell

A type script should enforce the following constraints:

The cell is used as input 1
The transaction contains 1 input cell and 1 output cell or 2 input cells and 2 output cells (no fee vs fee paid)
The header_deps referenced begin at (PREVIOUS_BLOCK+1)
Input 1’s lock script is equal to lock script applied to output 1
Capacity of input 1 is equal to capacity of output 1

The type script of this cell will allow any user to use header_deps to read the block(s) following PREVIOUS_BLOCK, add them to the MMR and store the accumulated difficulty value.

The type script could require a delay of a certain number of blocks since the last commitment to reduce the frequency of commitment transactions.

MMR_ROOT is a linked list of all current peaks, as recently recommended by Peter Todd (inventor of MMR).

FlyClient CKB proofs

Process for delivering FlyClient CKB proofs:

A client will request the latest block header from full nodes;
Full node will deliver the latest MMR commitment transaction, along with a Merkle path proving its inclusion in a commitment root in a valid block;
Full node will deliver all block headers (SPV-style verification) between the block referenced in (2) and the latest block header;

(header_deps can only read header information following 4 epochs of block confirmation)
Full node will deliver Merkle paths to show inclusion of the randomly sampled blocks in the MMR root contained in (2). Randomness is derived from a hash of the latest block header (FlyClient protocol).

This technique produces proofs of about 2 megabytes, more information about proof size and additional details can be found here.

Censorship attacks

This solution is unfortunately prone to miner censorship of transactions. If miners are censoring a commitment transaction, the light client will fall back to SPV verification since the last commitment. Commitments will continue once the censorship ends.

If CKB is prone to censorship attacks, we are in an unfortunate situation overall.

Considerations for more robust implementation

In the interest of improving this, we can examine consensus rule changes to ensure this data is always updated.

In Grin and Beam, miners commit to the previous MMR state instead of previous block in the block header. CKB consensus rules could be changed to commit to previous MMR state (instead of previous block) or an additional field could be added to the block header for the MMR.

Putting hard fork considerations aside, the MMR value is a compressed value, rather than data the node has (blocks). In thinking about this, I have an intuition that it complicates fork resolution and feel fairly certain that it complicates uncle validation, would greatly appreciate thoughts from anyone who has examined this.

From RFC0020- “Our uncle definition is different from that of Ethereum, in that we do not consider how far away the two blocks’ first common ancestor is, as long as the two blocks are in the same epoch.”

If 1) an extra value is added to the block header: to validate the MMR value in an uncle header, a node would need the peak values of the MMR in the previous block, or if 2) previous block commitment is changed to MMR commitment: the node would need the MMR values at the point a fork occurred in order to validate the MMR value being committed to in the second block of the fork.

A second path to improve robustness is adding a consensus rule via soft fork adopting the transaction-based solution, this is described in Section 8.3 of the FlyClient paper. The rule would require that the second transaction of a valid block uses the on-chain contract to add the previous block (which is in the block header) to the MMR.

Looking forward to your observations and optimizations, this also needs a cool name, I am inspired by CKB’s unique ability to “see itself”, so thinking about things around eyes, and introspection!

*Many thanks to jjy, Cipher, Jan and Tannr for entertaining my thoughts at various times around the idea of using CKB to capture data for light clients.

janx · June 19, 2020, 8:32am

@mathoticus good work Sounds like a perfect grants project.

An interesting problem is how to incentivize people to update the MMR cell in this user-level flyclient protocol.

mathoticus · June 19, 2020, 2:08pm

Appreciate your feedback! Originally I had in the write-up “the shared benefit of a super-light client should provide the necessary incentives to ensure stakeholders keep this data updated” but looks like it did not make it into the final version.

There’s a couple things here. We do have coordination problems and tragedy of the commons, but because anyone can update maybe it’s not so bad.

If it’s just running the infrastructure to collect the blocks and pay a nominal transaction fee, you only need 1 person to do it. Could be any of us. Rough calculation of updating every 100 blocks (3200 bytes for headers + 65 byte signature + 72 bytes for inputs + 128 bytes for outputs) * (1800/100) blocks per epoch * 6 epochs per day @ 1,000 shannons/KB is 0.0037422 CKB per day in fees.

Any wallet services that use the data would have an interest in keeping it relatively updated to minimize data their users need to sync.

The issue comes in when transaction fees go up imo. So someone would have to have enough interest in seeing this working even if it will cost them some money. Maybe they do, maybe they don’t. Worst case is if no one does, everyone downloads block headers until someone feels compelled to improve the situation and update the data.

A way around this is if miners would be willing to accept the transactions with no fee. Which is a bit awkward but does get us around the incentive problem.

I have mixed thoughts around bolting incentives onto this, the task of light client data feels like it is a protocol level concern. Curious if you or anyone else has any ideas.

janx · June 19, 2020, 3:13pm

Using tricks like DCKB it’s possible to create a special DAO pool for flyclient cell update. A wallet/user can support flyclient update by depositing into the pool. The user can withdraw from the pool at any time with no 2nd issuance compensation. Anyone can post a flyclient cell update to claim a small fee from the pool, funding by accrued compensation.

Cipher · June 20, 2020, 2:01am

Great job! Any ideas about how to use this light client to build a super light wallet based on current infrastructure? I think it’s not practical to update full node code for the minimum viable light client recently, so how could the client get the proof from a full node?

Is it secure enough if we use some centralized light client servers to provide the proof, combined with ordinary full nodes to provide the trustless data. If it’s feasible, we could distribute some servers to kick-off light client based applications before we have full confidence about a fork to support the light client protocol at consensus level.

mathoticus · June 20, 2020, 5:38am

thank you @Cipher for your feedback!

I was seeing this as primarily run as an off-chain service but hadn’t thought through all of the nuances that would need to be addressed.

The data that is contained in the MMR will allow a user to verify the integrity of the information they are getting (the hashes check out), however this will only deliver certainty that it is properly formed, not that it is accurate.

FlyClient has been designed so the light client can request proofs from a number of nodes and then choose the best one. For a centralized solution, we don’t want users to have to compare multiple sources.

Addressing data accuracy

I think ideally the user should be able to validate the information they are receiving strictly based on what is contained in the transaction.

In order for the user to trust what they are receiving, we can use a static value that identifies the FlyClient info cell. If the user is aware of this value, they can simply check the transaction for it. My idea is to use a TypeID to identify the FlyClient info cell.

(Without this, a different user could deploy a cell with the same data structure and use the FlyClient CKB type script to create bad data)

Any user with a full node can validate that the information that created the cell referenced by the TypeID was correct. After this is widely known and verified, users only need the TypeID and can trust the execution of CKB.

I came up with a scheme to do this, I haven’t grokked all of the intricacies of CKB programming yet so please check if this makes sense. We should probably also hard-code a value for number of header_deps to simplify execution.

flyclient

The centralized server can deliver the latest commitment transaction and merkle inclusion proof, the client will see the TypeID associated with the FlyClient info cell in the transaction and establish certainty that it is correct. The client can then request the traditional FlyClient proof from the server as well, and download block headers to gain confidence in the transaction’s inclusion in the chain.

Users can get the block headers from a centralized server or network of full nodes, similar to SPV in Bitcoin.

Cipher · June 20, 2020, 12:09pm

So it seems that a centralized server is necessary to provide the FlyClient info cell and its proof, and besides, the client needs to connect with ordinary full nodes to avoid the eclipse attack.

prestwich · June 22, 2020, 5:52pm

And presumably an inclusion proof under some header?

mathoticus · June 24, 2020, 10:46pm

yep. I think I could have been clearer, I wanted to stress that the user would establish certainty just with transaction information instead of having to compare results from different sources or having no certainty that the info in the cell is correct.

“The centralized server can deliver the latest commitment transaction and merkle inclusion proof, the client will see the TypeID associated with the FlyClient info cell in the transaction and establish certainty that it is correct. The client can then request the traditional FlyClient proof from the server as well, and download block headers to gain confidence in the transaction’s inclusion in the chain.”

Zachary_Mitton · July 8, 2020, 3:06pm

If 1) an extra value is added to the block header: to validate the MMR value in an uncle header, a node would need the peak values of the MMR in the previous block,

The peak values at any previous point in the MMR are always available. They are no longer peaks, but they are still nodes in the full tree. Their node indexes can be found from a simple calculation based on the block height (leafLength) of the proposed header.

or if 2) previous block commitment is changed to MMR commitment: the node would need the MMR values at the point a fork occurred in order to validate the MMR value being committed to in the second block of the fork.

Regardless whether the prevHash is part of a blockHeader, all of them are available to the full node. So it still can do this calculation based on a function given a blockHeight.

Zachary_Mitton · July 9, 2020, 6:24pm

I proposed a grant here

mathoticus · July 10, 2020, 2:20pm

Thanks for filling in the gaps in my assessment

CKB Consensus is an optimized version of Nakamoto Consensus, more info here and here. I think any modifications to the block validation rules would require re-evaluating benchmarks, as well as ensuring it wouldn’t jeopardize anything based on the optimizations that have been made, an analysis which could be substantial.

The permissionless path is much more viable today. It allows for assessments of light client functionality prior to considering any changes to the node, or requiring anyone to upgrade their client software/infrastructure. Ultimately if a similar result can be achieved through application layer functionality, it is preferable.

I think hard forks have been normalized in some communities more than others. For example, the idea of hard forking Bitcoin for a feature (even a valuable one) is unthinkable, and this supports a decentralized community.

Other blockchains have a cadence of hard forking in arbitrary new features every ~6 months, but CKB has been designed to be a basic foundation and to push as much functionality as possible to the application layer to avoid this kind of situation.

The enormous coordination required for forks and possibility for fracturing the community or governance centralization means hard forks should ideally be very rare and uncontroversial.

Right now the community is still driven by imperfect off-chain governance, the thinking around governance is covered here in the positioning paper. The Nervos DAO can eventually be used for voting, once the proper mechanisms have been implemented on top of it.

Hopefully by the time the community approaches any major decisions we are well set up to live up to the values set forth with the project.

Some examples of potential hard fork decisions that have been mentioned:

Changing the default signature verification script
Adding the Vector (V) or Bit Manipulation (B) RISC-V instruction set extensions to CKB VM
Activation of the treasury fund

mathoticus · July 12, 2020, 1:42pm

I believe I have found additional information in support of a permissionless transaction-oriented design, or possibly an iteration on the design that doesn’t resemble FlyClient.

The TxChain paper is mostly unrelated, it describes a scheme that uses an on-chain transaction per each light client query, however it does point out that the validity of n transactions can be proven by the acceptance of a single new transaction that references the n transactions.

With header_deps on CKB, similar certainty can be established for a block history, based on the execution of a contract on CKB VM. Each FlyClient on CKB commitment transaction adds valid new block headers following the previous commitment, while consuming the cell produced in the previous FlyClient on CKB transaction, creating proper dependent transactions.

The properties described in TxChain are why we can use a centralized server to provide the latest FlyClient on CKB commitment transaction and the light client can establish trust in the data through block confirmations up to the chain tip.

Integrity is being derived from the execution of CKB VM and the honest majority assumption from the recent block headers, with FlyClient sampling as a secondary (and possibly unnecessary) proof.

matt_ckb · February 1, 2024, 6:56pm

Bumping this because it may be able to provide scripts running on-chain a readily available chain height/time

Not sure how it would could be functionally beneficial compared to header_deps but there was an intuition