Motivation
RPC get_transaction
only looks for transactions in the chain and the transaction pool. It returns null
for transactions which have been accepted into the pool and later removed. It’s possible to subscribe to the rejected_transaction
events, but it requires a long TCP connection.
For most scenarios, it is too expensive to maintain a long connection to subscribe to the events. The alternative solution is polling the transaction via get_transaction
. For transactions submitted successfully by send_transaction
, if get_transaction
returns null
, the transaction is removed. If the returned status
is committed
, the transaction has been successfully committed into the chain. However, these solution has some issues:
- There is a bug (ckb#2907) in ckb-rs. Calling
get_transaction
immediately after a successsend_transaction
may returnnull
. It may take a while to becomepending
. - There is no way to distinguish whether the transaction provided is unknown or recently removed.
- The RPC
get_transaction
returns the wholetransaction
each time, which is an unnecessary performance overhead for the polling scenario.
This RFC proposes a solution to address these issues.
Specification
- Find the root cause of ckb#2907 and fix it.
- RPC
get_transaction
no longer returnsnull
, instead, the response fieldtransaction
may benull
. The fieldtx_status.status
adds two new statuses:rejected
andunknown
. A new fieldtx_status.reason
tells why the transaction is rejected. - RPC
get_transaction
will accept a new optional request parameter to tell it not to return thetransaction
field.
RPC get_transaction
RPC get_transaction
no longer returns null
, instead the response field transaction
may be null
.
Field tx_status.status
adds two statuses rejected
and unknown
.
rejected
: The transaction has been recently removed from the pool. Due to storage limitations, the node can only hold the most recently removed transactions.unknown
: The node has not seen the transaction, or it should berejected
but was cleared due to storage limitations.
When tx_status.status
is rejected
or unknown
, the return field transaction
must be null
.
If tx_status.status
is rejected
, tx_status.reason
tells why it is rejected. This is a new field of type string. New reasons may be added in future releases.
Also get_transaction
adds a request parameter verbosity
of type Uint32, which defaults to 2.
- When
verbosity
is 0 (deprecated): this is reserved for compatibility, and will be removed in the following release. It returnnull
as the RPC response when the status isrejected
orunknown
, mimicking the original behaviors. - When
verbosity
is 1: The RPC does not return the transaction content and the fieldtransaction
must benull
. - When
verbosity
is 2: iftx_status.status
ispending
,proposed
, orcommitted
, the RPC returns the transaction content as fieldtransaction
, otherwise the field isnull
.
To support the rejected
state, the node must store the recently removed transactions. These include transactions submitted via RPC, received via P2P networks, and from the reverted blocks due to the chain reorganization.
The default configuration only allows up to 10,000,000 transactions in the last 7 days. The configuration can be adjusted according to the node storage size. Since only transaction hash is stored, where each hash occupies 32 bytes, thus 10,000,000 entries will take about 300M of disk storage space without counting the additional overhead.
[tx_pool]
...
keep_rejected_tx_hashes_days = 7
keep_rejected_tx_hashes_count = 10_000_000
Scenarios
First of all, it is strongly recommended that the application keeps a copy of the transaction locally until the transaction is confirmed in the chain, or the application decided to discard it. Do not rely on the ckb node to persistent pending transactions. The current ckb implementation does not restore the transaction pool after reboot. The pull request ckb#2656 adds such a feature, but the node may still lose pending transactions, such as disk failure, or failing to save the dump because of sudden power loss.
Single Node
For applications that use only one node to send transactions and check their status, it’s recommended to poll get_transaction(verbosity = 1)
after a successful send_transaction
.
- If it returns
unknown
orrejected
, the transaction is considered as rejected. Because after the successsend_tranasction
, the node must have seen and accepted the transaction before. Of course, there are situations where a node loses the transaction after a restart. In this case the application can resend the local saved transaction viasend_transaction
. Ifsend_transaction
rejects the transaction, the application acts according to the error message. Ifsend_transaction
succeeds, the node must have lost the transaction and the application can continue to poll the transaction status. - If the transaction is confirmed as
rejected
, the application can either drop it or reassemble it with new cells depending on the use case. - If the transaction is confirmed as
committed
, it is better to wait for enough confirmations by new mined blocks. - If the transaction is not confirmed as
rejected
orcommitted
after a long time, the application should trigger the broadcast by recallsend_tranaction
, or just simply recallsend_tranction
regularly during status polling.
Multiple Nodes
Some applications use several CKB nodes behind a load balance for scalability and availability. It is recommended to use the Master-Standby load balance strategy, where the client always connects to the current master. When the master fails, a node is prompted to the new master from standby nodes. The Sticky Sessions Management also can help. Such load balance associates each client to its own master like, and uses other nodes as the fallback. A popular association method is hashing the client IP into a number and choosing the master using the number.
Simple Round Robin strategy, or in the case of a node failure in Master-Standby, there will be a delay that the transaction appears in get_transaction
after a success send_transaction
. The application must take this delay into account.
The appendix provides some suggestions to make transaction synchronization between multiple nodes more timely and reliable.
Workaround
For a CKB node which does not implement this RFC, here is a workaround.
The application uses get_tranasaction
to poll transaction status after a successful send_transaction
. If the RPC returns null
within 20s after the send_transaction
, the application can treat it as pending
. If the RPC returns null
after 20s, the application can consider it as rejected
and try to recall send_transaction
.
Related Work
- The CKB Transactions Management Guideline provides some advice on how to manage pending transactions.
- Currently the Block Explorer does not mark transactions as rejected when the dep cells have been used by other transactions.
Related Feedback
The following dialogues have been edited to avoid privacy leaks. There are two roles, A:
and B:
, where A is the person who submits the feedback and B is the technical support.
2021-07-23
A: When sending a transaction, there are two kinds of failures. The first one is due to cell preemption, the transaction is rejected; the other one is sending a transaction without error, but the transaction is not available in the explorer and the node has no related information as well. What is the general cause of this?
2021-07-29
A: The transaction has been stuck for half an hour, node version 0.42
B: The first two cell deps have been spent.
A: Those two cells will be updated continually. After such a long time, they must have been spent, so this is not the reason why the transaction has been stuck. In fact, if the cell had been spent when the transaction was first sent to the node, it would have reported an error. But it did not report an error and the transaction is still pending, so we can infer that it was not spent at that time.
A: This is the first problem, another problem is that if the time is too long, after one of the dep cell has been spent, calling the node rpc interface returns nothing. So we hope that the node can not return something, such as rejected, to help the subsequent error logic processing.
Unresolved Questions
Because of P2P and PoW, transaction status transition could be very complex even after implementing this RFC. For example, the transaction still can be unknown or rejected suddenly, and later become pending again. Because the transaction is removed from the pool first, and later it is relayed back from other peers.
There’s no RPC to tell whether a transaction has been broadcast to the P2P network. It is a useful hint to determine whether to rebroadcast the transaction.
Appendix
Multi-Node Transaction Synchronization Suggestions
Whitelist
The nodes should add each other to the whitelist via the option [network].whitelist_peers
in the configuration file ckb.toml
.
The node connects to the whitelist nodes first and retries connecting after disconnections. The configuration option is an array, which each item looks like:
"/ip4/10.0.0.1/tcp/8115/p2p/QmWxucJPjKpfZuG7kTzYQLzRfv1h8nyMjnLBFxHDWFENjA"
The part after ip4/
is the IP of the node. If the nodes are in the same LAN, using the intranet IP takes the advantage of the intranet bandwidth. The number 8115
is the p2p network listening port, which is configured via [network].listen_addresses
in the same configuration file.
The last part after p2p/
is the peer-id
of the node. The following command prints the node peer-id
.
ckb peer-id from-secret --secret-path data/network/secret_key
Note that the ID changes after deleting the secret key file, which requires updating whitelist_peers
to use the new peer ids.
Take an example of three nodes with IP 10.0.0.1
, 10.0.0.2
, and 10.0.0.3
. After the three nodes have been initialized and have run ckb run
at least once, use the command above to get their peer ids. Assume that the result is:
10.0.0.1
:QmWxucJPjKpfZuG7kTzYQLzRfv1h8nyMjnLBFxHDWFENjA
10.0.0.2
:QmTPYTsio5MGQkPTdVwYgM5xKcKGftx9qoBALhJi7oUKNt
10.0.0.3
:QmQ7k9RYAgvWt5mWvbGG85SiXf23hjGSVjmtnsHMqzs7Hx
The node 10.0.0.2 should add the other two nodes into the whitelist like below.
whitelist_peers = [
"/ip4/10.0.0.1/tcp/8115/p2p/QmWxucJPjKpfZuG7kTzYQLzRfv1h8nyMjnLBFxHDWFENjA",
"/ip4/10.0.0.3/tcp/8115/p2p/QmQ7k9RYAgvWt5mWvbGG85SiXf23hjGSVjmtnsHMqzs7Hx"
]
Configure the other two nodes accordingly.
Transaction Multicast
The most straightforward way is to send the transaction to multiple nodes at the same time by calling their RPC method send_transaction
.
Some load balance supports sending the matched requests to all backend nodes and returning the response from the fastest node. The application also can implement the send_transaction
gateway to send the transactions to all the nodes.
Another option is to deploy a transaction forwarder on each node. The transaction forwarder listens for new_transaction
events via the subscription RPC. When a new transaction is received, it forwards the transaction to other nodes via their RPC.