CKBFS Protocol: A standard for Witnesses based content storage system

Sorry, still not seeing the full picture :thinking:

I’ll propose a simple ideas (that you likely already considered) and you tell me why you choose the proposed one. @xxuejie did the same with the iCKB design, so it’s a fair game.

Diffs

Reading once again your proposal, I noticed that the the underlying reason why you choose the linked list (instead of other more commonly used structures in FS like trees of inodes) is because you can recover the previous checksum and use it to checksum the full file. This is smart because:

Now we have a different issue, diffs.

Let’s assume we have a JSON split across 100 txs and I need to update the one in the first tx, then in your design I have to re-deploy all 100 txs, in correct? Could you detail more the Branch Forking File section?

Maybe it would be easier to support only immutable data, but having a CKBFS that fully support diffs would prime it for broader adoption.

Relaxing the assumption on the checksum of the full file and so the necessity of using a linked list, we could switch back to the usual trees of inodes seen in filesystem and Merkle trees.

Sure, we may lose the ability to reference a file with a checksum (as the same file may have many different checksum, depending on the underlying structure), but is this really relevant?

We can just anchor all references to the root inode OutPoint.

Pretty sure there are better and more formalized designs based on an immutable FS and Merkle trees, this is just to give an idea.

So, why did the presented design uses a Linked List instead of an inode Tree?

Homogeneity

The proposed design seems homogeneous, maybe too homogeneous.

Data:
  content_type: Bytes # String Bytes
  filename: Bytes # String Bytes
  index: Uint32Opt # referenced witnesses index.
  checksum: Uint32 # Adler32 checksum
  backlinks: Vec<BackLink>

Why does every element has to contain content_type and filename?

Every element in the list except the root does not seem to need this particular information.

Why putting the full backlinks: Vec<BackLink>?

Strictly speaking, only the last one is used in the checksum verification, the rest is only used by the off-chain system to get all the relevant information faster. (An inode design on the other side would make full use of this info tho)

Generally, why putting all this information in cell data?

Once we put the checksum as data and the cheksum is checked, the safety of information in witness is aussured, why not moving all this data to the witness?

Why not employing at least two types of cells?

One type could store the files metadata and link all the data, the other one would be used to store the actual data.

Why every lock needs to be <USER_DEFINED>?

Every element in the list (except possibly the root) does not seem to need this particular information as the inner cells are immutable, correct?

Why not directly indicating an always-failure lock?

For now back to working on iCKB, I’ll be waiting for your replies. In the meantime, I wish you a nice day :hugs:

Love & Peace, Phroi

1 Like