Spark Program | Ckb-probe: Deep Observability Tool for CKB Nodes Based on Aya Kernel eBPF/ckb-probe:基于 Aya 内核 eBPF 的 CKB 节点深度可观测性工具

Hi @Clair, all noted. Monday text-based sync on this thread works well, and March 23 for the first update is fine.

The first installment (113,156 CKB,20%) has been disbursed:
https://explorer.nervos.org/en/transaction/0xd568abfd7872aecb63b75845f1c717d735e4ce255e490745a7b83c09568c5e8d

Please confirm once received. Looking forward to the first progress update on March 23.

Hi @Clair,收到,每周一在本帖文字同步没问题,3月23日首次更新完全 OK。

首笔启动资金(113,156 CKB,20%)已发放:
https://explorer.nervos.org/en/transaction/0xd568abfd7872aecb63b75845f1c717d735e4ce255e490745a7b83c09568c5e8d

请确认到账。期待 3月23日的首次进展更新。

祝好,
行天
代表星火计划委员会

3 Likes

hi,行天
ckb已经收到!感谢,不过我的计划是3月16日开始工作 预计3.23日首次更新进展 请您再次确认一次

2 Likes

抱歉,是我工作的失误,我将编辑上方公告的内容,将时间改为23号

1 Like

Wow, I noticed only now these two messages regarding USDI!

Never expected my hunch to be right, but glad to see more transparency on USDI reserves, keep it up!!

Thank you for your services, Phroi

2 Likes

Week 1 周报

一、本周工作概述

本周为项目启动的第一周,核心目标是完成 CKB 底层源码的深度架构调研(P2P / 存储 / 同步层的完整调用链追踪)、本地开发环境的全面搭建、以及 CKB 测试网节点的部署与验证。以下按任务模块逐一汇报。


二、各项任务完成情况

1. CKB 源码架构调研(P2P / 存储 / 同步层调用链梳理)

目标: 从源码层面理解 CKB 节点的总体组装、P2P 通信、持久化存储与区块同步机制,为后续 eBPF 探针开发定位挂载点奠定基础。

完成情况:

1.1 总体架构与启动流程

CKB 采用分层模块化架构,核心组件通过 Shared 结构体(shared/src/shared.rs:58)互联。节点启动的完整调用链为:main()ckb_bin::run_app()ckb-bin/src/lib.rs:61)→ run 子命令入口(ckb-bin/src/subcommand/run.rs:17),在 run 中依次完成四个关键步骤:首先通过 Launcher::build_shared()util/launcher/src/lib.rs:191)构建 SharedSharedPackageShared 持有 RocksDB 存储实例、链顶快照(SnapshotMgr)、交易池等全局状态,是贯穿所有子系统的核心枢纽;然后通过 start_chain_service() 启动链服务获得 ChainController;接着通过 start_network_and_rpc():403)组装并启动网络层,该步骤内部依次创建 SyncShared::new(shared)Synchronizer::new()(注册为 Protocol ID 100)、Relayer::new()(Protocol ID 101)、BlockFilter::new()(Protocol ID 121),然后调用 NetworkService::new().start() 启动 P2P 服务;最后通过 tx_pool_builder.start(network_controller) 启动交易池服务。

1.2 P2P 网络层(network/ crate)

底层库 Tentacle: CKB 的 P2P 层基于自研的 Tentacle 库构建,核心类型包括 ServiceControl / ServiceAsyncControl(P2P 控制接口)、ProtocolId(u32) 协议编号、SessionId(u32) 会话编号、PeerId(公钥标识)。

核心抽象层: CKB 在 Tentacle 之上定义了自己的协议处理 trait CKBProtocolHandlernetwork/src/protocols/mod.rs:161),提供 init / connected / disconnected / received / notify 五个方法,通过 CKBHandler 适配层桥接到 Tentacle 的原生 ServiceProtocol。协议上下文封装为 CKBProtocolContext:47),提供发消息、断连、评分、Ban 等操作。全局网络状态由 NetworkStatenetwork/src/network.rs:73)持有,包含 peer_registrypeer_storelocal_peer_id 等关键字段。NetworkService:827)负责组装并启动 P2P Service,NetworkController:1334)作为对外 API 入口提供 add_node / remove_node / p2p_control 等方法。

协议注册表 SupportProtocols 定义于 network/src/protocols/support_protocols.rs:20,CKB 注册了 12 个 P2P 子协议:ID 0 Ping(心跳/RTT,最大帧 1KB)、ID 1 Discovery(节点发现,512KB)、ID 2 Identify(身份/能力交换,2KB)、ID 3 Feeler(探测节点可达性,1KB)、ID 4 DisconnectMsg(断连通知,1KB)、ID 100 Sync(区块同步,2MB)、ID 101 Relay(CompactBlock / 交易中继,4MB)、ID 102 Time(时间同步,1KB)、ID 110 Alert(告警,128KB)、ID 120 LightClient(轻客户端,2MB)、ID 121 BlockFilter(区块过滤器,2MB)、ID 130 HolePunching(NAT 穿透,512KB)。

Peer 管理: PeerRegistrypeer_registry.rs:22)管理内存中的活跃会话,控制入站/出站连接数上限,支持 Block-Relay-Only 连接(最多 2 个,只中继区块不中继交易),采用基于 RTT / 消息时间 / 连接年龄 / 网络组的驱逐策略。PeerStorepeer_store/peer_store_impl.rs:20)负责持久化地址库,内部包含 AddrManager(地址池)、BanList(按 IP 网段封禁,支持时间过期)、Anchors(Block-Relay-Only 锚定节点)三个子结构。网络分组规则(network_group.rs)按 IPv4 /16、IPv6 /64 划分,保障驱逐时的网络拓扑多样性。

消息流转: 发送方向为 Protocol → context.send_message()ServiceControl → Snappy 压缩(仅当消息 >1KB 时触发)→ TCP;接收方向为 TCP → 解压 → CKBHandler.received()CKBProtocolHandler.received()。广播使用 filter_broadcast() 支持带过滤条件的多 peer 发送。

1.3 存储层(db/ + db-schema/ + store/ + freezer/)

数据库引擎: CKB 底层使用 RocksDB 的 OptimisticTransactionDB 变体(db/src/db.rs:27),提供乐观事务语义——读写快照隔离,提交时进行冲突检测。引擎层配置了 HyperClockCache(256MB)和 Ribbon Filter 加速查找,数据压缩采用 Snappy + LZ4 策略。

列族定义(19 个): 定义于 db-schema/src/lib.rs:1-59。CKB 通过 19 个 Column Family 组织链上数据:Col 0 INDEX(区块号⇔哈希双向索引)、Col 1 BLOCK_HEADER(block_hash → HeaderView)、Col 2 BLOCK_BODY((block_hash, tx_index) → TransactionView)、Col 3 BLOCK_UNCLE(叔块)、Col 4 META(元数据,含 TIP_HEADER / CURRENT_EPOCH / db-version)、Col 5 TRANSACTION_INFO(tx_hash → (block_hash, block_number, index))、Col 6 BLOCK_EXT(区块扩展信息,含 total_difficulty)、Col 7 BLOCK_PROPOSAL_IDS(提案短 ID)、Col 8 BLOCK_EPOCH(区块→纪元映射)、Col 9 EPOCH(纪元数据)、Col 10 CELL(活跃 Cell / UTXO,OutPoint → CellEntry)、Col 11 UNCLES(叔块索引)、Col 12 CELL_DATA(Cell 数据内容)、Col 13 NUMBER_HASH(区块号-哈希对)、Col 14 CELL_DATA_HASH(Cell 数据哈希)、Col 15 BLOCK_EXTENSION(区块扩展字段)、Col 16 CHAIN_ROOT_MMR(MMR 树,position → HeaderDigest)、Col 17 BLOCK_FILTER(区块过滤器数据)、Col 18 BLOCK_FILTER_HASH(过滤器哈希)。

高层 Store API: ChainDBstore/src/db.rs:25)是存储层的主入口结构体,内部持有三个组件:db: RocksDB 实例、freezer: Option<Freezer> 冷存储引擎、cache: Arc<StoreCache> 应用层 LRU 缓存(覆盖 header / cell_data / uncles 等多个热点数据类型)。核心读取接口由 ChainStore trait(store/src/store.rs:27)定义,包含 50+ 个方法,如 get_block() / get_block_header() / get_block_body() / get_transaction() / get_transaction_info() / get_cell() / get_cell_data() / have_cell() / get_tip_header() / get_current_epoch_ext() 等。

写入路径: 所有区块写入通过 StoreTransactionstore/src/transaction.rs:31)实现原子性,完整流程为:StoreTransaction::begin()insert_block() 写入区块数据 [Col 1,2,3,7,13,15] → insert_block_ext() 写入扩展信息 [Col 6] → attach_block() 建立主链索引 [Col 0,5,11] → attach_block_cell() 创建新 Cell 并删除已花费 Cell [Col 10,12,14] → insert_tip_header() 更新链顶 [Col 4] → insert_epoch_ext() 更新纪元 [Col 8,9] → commit() RocksDB 原子提交。其中 insert_blockattach_block 是两个不同阶段:前者写入区块原始数据,后者建立主链索引并更新 UTXO 状态。

读取路径(三级回退): get_block(hash) 依次查找:① StoreCache(LRU 内存缓存)→ ② Freezer(冷存储,当 block_number < freezer.number() 时命中)→ ③ RocksDB(热数据)。

Freezer 冷存储: freezer/src/freezer.rs:30 实现了一套独立的归档存储系统,使用内存映射 + 追加写 + Snappy 压缩的归档文件,每个数据文件最大 2GB,索引条目 12 字节。后台线程定期将旧区块从 RocksDB 迁移到 Freezer,降低 RocksDB 的数据规模和 compaction 压力。读取时 get_block() 首先检查 freezer.number() 判断目标区块是否已归档。

1.4 同步层(sync/ + chain/)

Synchronizer — 区块同步协议 (Protocol ID 100): 定义于 sync/src/synchronizer/mod.rs:357,核心字段包括 chain: ChainController(链服务控制器)、shared: Arc<SyncShared>(同步层共享状态)、fetch_channel: Option<Sender<FetchCMD>>SyncSharedsync/src/types/mod.rs:991)包装了全局 Shared 并添加同步层专用状态如 header_map、InFlightBlocks 等。

定时器驱动模型: Synchronizer 的核心运行逻辑由 4 个定时器驱动:SEND_GET_HEADERS(1s 间隔,触发 start_sync_headers() 向 peer 请求 headers)、IBD_BLOCK_FETCH(40ms 间隔,find_blocks_to_fetch(IBD::In) IBD 期间快速拉块)、NOT_IBD_BLOCK_FETCH(200ms 间隔,find_blocks_to_fetch(IBD::Out) 非 IBD 常规拉块)、TIMEOUT_EVICTION(1s 间隔,eviction() 超时 peer 检查与驱逐)。IBD 期间拉块频率为非 IBD 的 5 倍(40ms vs 200ms),这是同步性能的关键设计。

消息分发: try_process(line 381)根据消息类型分发到不同处理器:GetHeaders → GetHeadersProcess、SendHeaders → HeadersProcess(验证连续性,更新 shared_best_header)、GetBlocks → GetBlocksProcess、SendBlock → BlockProcess(标记 BLOCK_RECEIVED,送入 chain service)、InIBD → InIBDProcess(peer 之间告知自身 IBD 状态)。

Headers-First 同步策略: ① 发送 GetHeaders 请求给 peer → ② HeadersProcess 接收 Headers,验证连续性 + POW,存入 header_map 并标记 HEADER_VALID → ③ BlockFetchersync/src/synchronizer/block_fetcher.rs:18)根据 header_map 决定向哪个 peer 请求哪些 block → ④ 发送 GetBlocks,peer 回复 SendBlock。

IBD 状态判定: 位于 shared/src/shared.rs:382,判定条件为:当 (当前时间 - tip_header.timestamp) > 24小时 时节点处于 IBD 中,一旦退出 IBD 状态则永不回退(单向性设计)。

Relayer — 中继协议 (Protocol ID 101): 定义于 sync/src/relayer/mod.rs:78,负责 CompactBlock 快速传播,是节点在非 IBD 稳态运行期间接收新区块的主要方式。处理流程为:收到 CompactBlock → 非上下文验证(格式、POW)→ 上下文验证(父块存在、时间戳、难度)→ 用 tx_pool 中已有交易重建完整区块 → 如有缺失交易则发送 GetBlockTransactions 请求补充 → 重建成功后送入 chain service。具体实现位于 sync/src/relayer/compact_block_process.rs:56

Chain Service — 区块验证与入链(多线程架构): ChainControllerchain/src/chain_controller.rs:16)通过 channel 将区块发送给后端的 ChainService 线程(chain/src/chain_service.rs:17)。ChainService 线程完成非上下文验证(POW + 交易语法 + 签名)并调用 insert_block() 写入 DB;随后 OrphanBroker::process_lonely_block() 做路由判断:若父块已存储,则送入 ConsumeUnverifiedBlocks 线程;若父块不存在,则进入 OrphanBlockPoolchain/src/utils/orphan_block_pool.rs:129)等待,OrphanBlockPool 内部使用 HashMap<ParentHash, HashMap<BlockHash, LonelyBlock>> 存储孤块,父块验证通过后通过 search_orphan_leader() 递归处理其子孙块。

ConsumeUnverifiedBlocks: 位于 chain/src/verify.rs:31,作为独立线程运行,负责完整的上下文验证(UTXO 检查 / 脚本执行 / 提案窗口 / 叔块验证),计算 total_difficulty 并判断是否产生新最佳链,如需分叉切换则执行 detach 旧块 + attach 新块,最终通过 StoreTransaction::commit() 原子写入 RocksDB,更新 SnapshotMgr 生成新链顶快照,并执行 verify_callback 回调通知同步层验证结果,形成闭环。

1.5 完整调用链:区块从 Peer 到落盘

[Peer]
  │ TCP / Tentacle
  ▼
NetworkService (接收并解压消息)
  │
  ├─ Protocol ID 100 ──→ Synchronizer.received()
  │                        └─ BlockProcess::execute()
  │
  └─ Protocol ID 101 ──→ Relayer.received()
                           └─ CompactBlockProcess::execute()
                                └─ tx_pool 重建完整区块
  │
  ▼
ChainController::asynchronous_process_remote_block()
  │  通过 channel 发送
  ▼
ChainService 线程
  │  ① 非上下文验证 (POW, 交易语法, 签名)
  │  ② insert_block() → RocksDB [Col 1,2,3,7,13,15]
  ▼
OrphanBroker::process_lonely_block()
  │  父块已存? ──No──→ OrphanBlockPool (等待父块到达)
  │       │
  │      Yes
  ▼
ConsumeUnverifiedBlocks 线程
  │  ① 完整上下文验证 (UTXO, Script, Epoch, Uncle)
  │  ② 计算 total_difficulty
  │  ③ 新最佳链? → 分叉切换 (detach_block + attach_block)
  │  ④ StoreTransaction::commit() → RocksDB 原子写入
  │     [Col 0,4,5,6,8,9,10,11,12,14]
  │  ⑤ 更新 SnapshotMgr → 新链顶快照
  │  ⑥ verify_callback → 通知同步层结果
  ▼
[区块持久化完成]
  │
  ▼ (后台)
Freezer::freeze() — 定期将旧区块迁移到冷存储

1.6 关键文件索引

启动入口      ckb-bin/src/subcommand/run.rs:17
服务组装      util/launcher/src/lib.rs:191 (build_shared), :403 (start_network)
Shared 枢纽  shared/src/shared.rs:58 (struct), :382 (IBD判定)
网络服务      network/src/network.rs:73 (State), :827 (Service), :1334 (Controller)
协议 trait    network/src/protocols/mod.rs:47 (Context), :161 (Handler)
协议 ID       network/src/protocols/support_protocols.rs:20-57
Peer 管理     network/src/peer_registry.rs:22
DB 引擎       db/src/db.rs:27 (OptimisticTransactionDB)
列族定义      db-schema/src/lib.rs:1-59
Store API     store/src/store.rs:27 (ChainStore trait)
Store 写入    store/src/transaction.rs:131-415
ChainDB       store/src/db.rs:25
Freezer       freezer/src/freezer.rs:30-177
Synchronizer  sync/src/synchronizer/mod.rs:357 (struct), :869 (Handler impl)
BlockFetcher  sync/src/synchronizer/block_fetcher.rs:18
Relayer       sync/src/relayer/mod.rs:78
CompactBlock  sync/src/relayer/compact_block_process.rs:56
SyncShared    sync/src/types/mod.rs:991
ChainController  chain/src/chain_controller.rs:16
ChainService  chain/src/chain_service.rs:17
上下文验证    chain/src/verify.rs:31
孤块池        chain/src/utils/orphan_block_pool.rs:129
DB 迁移       db-migration/src/lib.rs:34-394

2. 开发环境搭建

本地开发机操作系统为 Ubuntu 24.04 LTS(内核版本 5.15+,满足 eBPF 特性需求)。Rust 工具链通过 rustup 安装,stable 版本用于用户态开发,nightly 版本用于 eBPF 内核态编译(rust-toolchain.toml 中指定),同时安装了 rust-src 组件和 bpf-linker 以支持 BPF 目标的交叉编译。CKB 源码编译依赖的系统库(clangllvmpkg-configlibssl-devlibrocksdb-dev 等)已全部安装就绪。IDE 使用 VS Code + rust-analyzer 插件,配置了 clippy 自动检查和 cargo fmt 自动格式化。此外,安装了 bpftoolbpftrace 等辅助调试工具用于 eBPF 程序的 inspect 和 trace。


3. CKB 测试网节点部署

通过Github下载release发行版方式完成了 CKB 测试网节点的部署。具体步骤为:使用 ckb init --chain testnet 生成测试网配置文件(ckb.tomlckb-miner.toml);在 ckb.toml 中确认 bootnodes 配置正确后,执行 ckb run 启动节点。节点启动后通过 RPC 接口(默认 http://127.0.0.1:8114)验证连通性,使用 curl 调用 get_tip_block_number 确认区块高度持续增长,调用 get_peers 确认已连接到多个测试网对端节点。初始同步阶段 IBD 速度约为每秒1个区块,目前已同步至最新高度。


三、下周计划(Week 2)

下周的核心目标是对 CKB 二进制进行全面的符号侦察与分析,并将成果工程化为 ckb-probe 工具的首个子命令。具体包括以下四个方面:

CKB 二进制符号全面侦察: 分别获取 CKB 官方 GitHub Release 页面发布的预编译二进制(Linux x86_64)和本地从源码自行编译的 debug / release 两种版本,使用 nmreadelf --symsobjdump -T 等工具对三份二进制进行符号表提取,对比分析官方 Release 是否执行了 strip、符号可见性差异(exported / local / undefined),以及两种构建方式下符号集合的覆盖度差异,形成官方版与自编译版的符号差异矩阵。

RocksDB 链接方式深度分析: 重点分析 CKB 中 RocksDB 的链接策略。通过 ldd 检查动态链接依赖,通过 nm -D 检查动态符号表中是否暴露 rocksdb_* 系列 C API 符号,结合 ckb-db crate 的 Cargo.tomlbuild.rs 确认是静态链接(bundled)还是动态链接系统库。进一步分析 RocksDB 的关键函数(如 rocksdb_openrocksdb_putrocksdb_getrocksdb_write)在最终二进制中的符号名称与 mangling 情况,为后续 uprobe 挂载点选择提供依据。

分级报告生成: 将侦察结果整理为结构化的分级报告,按「可直接 uprobe 挂载的符号」「需要 DWARF 辅助定位的符号」「已被内联或 strip 不可用的符号」三个级别分类。报告将覆盖 P2P 层、存储层(含 RocksDB)、同步层和交易池等核心模块的关键函数,作为后续 eBPF 探针开发的选点参考手册。

实现 ckb-probe symbols 子命令: 基于上述侦察方法论,在 ckb-probe 项目中实现 symbols 子命令。该命令接收 CKB 二进制路径作为输入,自动化完成符号提取、分类与报告输出。技术实现上计划使用 goblinobject crate 解析 ELF 文件,通过 clap 构建 CLI 框架,输出支持终端表格(彩色分级标注)和 JSON 两种格式,便于后续流水线集成。


本周工作自评: 各项任务按计划完成,对 CKB 节点从启动组装到 P2P 通信、存储持久化、区块同步入链的完整代码路径建立了精确到源码行号级别的认知。开发环境与测试网节点均已就绪,为下周的二进制符号分析与 ckb-probe 工具开发奠定了坚实基础。

4 Likes

Week 1 Report

I. Overview

This was the first week of the project. The core objectives were to complete an in-depth architectural study of the CKB codebase (full call-chain tracing across the P2P, storage, and synchronization layers), set up the local development environment, and deploy and verify a CKB testnet node. The following is a module-by-module summary.


II. Task Completion Details

1. CKB Source Code Architecture Study (P2P / Storage / Sync Layer Call-Chain Analysis)

Objective: Understand the CKB node’s overall assembly, P2P communication, persistent storage, and block synchronization mechanisms at the source-code level, laying the groundwork for identifying eBPF probe attachment points in subsequent phases.

Completion:

1.1 Overall Architecture and Startup Flow

CKB adopts a layered, modular architecture in which core components are interconnected through the Shared struct (shared/src/shared.rs:58). The complete node startup call chain is: main()ckb_bin::run_app() (ckb-bin/src/lib.rs:61) → run subcommand entry (ckb-bin/src/subcommand/run.rs:17). Within run, four key steps are executed sequentially: first, Launcher::build_shared() (util/launcher/src/lib.rs:191) constructs Shared and SharedPackageShared holds the RocksDB storage instance, the chain-tip snapshot manager (SnapshotMgr), the transaction pool, and other global state, serving as the central hub that spans all subsystems; then start_chain_service() launches the chain service and returns a ChainController; next, start_network_and_rpc() (:403) assembles and starts the network layer—internally it creates SyncShared::new(shared), Synchronizer::new() (registered as Protocol ID 100), Relayer::new() (Protocol ID 101), and BlockFilter::new() (Protocol ID 121), then calls NetworkService::new().start() to bring up the P2P service; finally, tx_pool_builder.start(network_controller) starts the transaction pool service.

1.2 P2P Network Layer (network/ crate)

Underlying Library — Tentacle: CKB’s P2P layer is built on the in-house Tentacle library. Key types include ServiceControl / ServiceAsyncControl (P2P control interfaces), ProtocolId(u32) (protocol identifiers), SessionId(u32) (session identifiers), and PeerId (public-key-based identity).

Core Abstraction Layer: On top of Tentacle, CKB defines its own protocol-handling trait CKBProtocolHandler (network/src/protocols/mod.rs:161), which exposes five methods: init, connected, disconnected, received, and notify. An adapter layer called CKBHandler bridges this trait to Tentacle’s native ServiceProtocol. The protocol context is encapsulated as CKBProtocolContext (:47), providing operations such as sending messages, disconnecting peers, adjusting scores, and banning. Global network state is held by NetworkState (network/src/network.rs:73), which contains key fields including peer_registry, peer_store, and local_peer_id. NetworkService (:827) is responsible for assembling and starting the P2P service, while NetworkController (:1334) serves as the external API entry point, exposing methods such as add_node, remove_node, and p2p_control.

Protocol Registry — SupportProtocols: Defined at network/src/protocols/support_protocols.rs:20, CKB registers 12 P2P sub-protocols: ID 0 Ping (heartbeat/RTT, max frame 1 KB), ID 1 Discovery (peer discovery, 512 KB), ID 2 Identify (identity/capability exchange, 2 KB), ID 3 Feeler (probing node reachability, 1 KB), ID 4 DisconnectMsg (disconnect notification, 1 KB), ID 100 Sync (block synchronization, 2 MB), ID 101 Relay (CompactBlock / transaction relay, 4 MB), ID 102 Time (time synchronization, 1 KB), ID 110 Alert (alerts, 128 KB), ID 120 LightClient (light client, 2 MB), ID 121 BlockFilter (block filters, 2 MB), and ID 130 HolePunching (NAT traversal, 512 KB).

Peer Management: PeerRegistry (peer_registry.rs:22) manages in-memory active sessions and enforces upper limits on inbound/outbound connections. It supports Block-Relay-Only connections (at most 2, relaying blocks only—not transactions) and employs an eviction strategy based on RTT, message timing, connection age, and network group. PeerStore (peer_store/peer_store_impl.rs:20) handles persistent address storage and internally comprises three sub-structures: AddrManager (address pool), BanList (banning by IP subnet with expiration support), and Anchors (Block-Relay-Only anchor nodes). Network grouping rules (network_group.rs) partition addresses by IPv4 /16 and IPv6 /64 to ensure topological diversity during eviction.

Message Flow: On the send path: Protocol → context.send_message()ServiceControl → Snappy compression (triggered only when the message exceeds 1 KB) → TCP. On the receive path: TCP → decompression → CKBHandler.received()CKBProtocolHandler.received(). Broadcasting uses filter_broadcast() to support conditional multi-peer delivery.

1.3 Storage Layer (db/ + db-schema/ + store/ + freezer/)

Database Engine: CKB’s underlying storage engine uses the OptimisticTransactionDB variant of RocksDB (db/src/db.rs:27), providing optimistic transaction semantics—snapshot isolation for reads and writes with conflict detection at commit time. The engine is configured with a HyperClockCache (256 MB) and Ribbon Filters to accelerate lookups. Data compression employs a Snappy + LZ4 strategy.

Column Family Definitions (19 total): Defined at db-schema/src/lib.rs:1-59. CKB organizes on-chain data across 19 Column Families: Col 0 INDEX (bidirectional block-number ⇔ hash index), Col 1 BLOCK_HEADER (block_hash → HeaderView), Col 2 BLOCK_BODY ((block_hash, tx_index) → TransactionView), Col 3 BLOCK_UNCLE (uncle blocks), Col 4 META (metadata including TIP_HEADER, CURRENT_EPOCH, and db-version), Col 5 TRANSACTION_INFO (tx_hash → (block_hash, block_number, index)), Col 6 BLOCK_EXT (block extension info including total_difficulty), Col 7 BLOCK_PROPOSAL_IDS (proposal short IDs), Col 8 BLOCK_EPOCH (block-to-epoch mapping), Col 9 EPOCH (epoch data), Col 10 CELL (live Cells / UTXOs, OutPoint → CellEntry), Col 11 UNCLES (uncle index), Col 12 CELL_DATA (Cell data content), Col 13 NUMBER_HASH (block-number–hash pairs), Col 14 CELL_DATA_HASH (Cell data hashes), Col 15 BLOCK_EXTENSION (block extension fields), Col 16 CHAIN_ROOT_MMR (MMR tree, position → HeaderDigest), Col 17 BLOCK_FILTER (block filter data), and Col 18 BLOCK_FILTER_HASH (filter hashes).

High-Level Store API: ChainDB (store/src/db.rs:25) is the primary entry point for the storage layer. It internally holds three components: db: RocksDB (the database instance), freezer: Option<Freezer> (the cold-storage engine), and cache: Arc<StoreCache> (an application-level LRU cache covering multiple hot data types such as headers, cell_data, and uncles). The core read interface is defined by the ChainStore trait (store/src/store.rs:27), which contains 50+ methods including get_block(), get_block_header(), get_block_body(), get_transaction(), get_transaction_info(), get_cell(), get_cell_data(), have_cell(), get_tip_header(), get_current_epoch_ext(), and more.

Write Path: All block writes are performed atomically via StoreTransaction (store/src/transaction.rs:31). The complete flow is: StoreTransaction::begin()insert_block() writes block data [Cols 1, 2, 3, 7, 13, 15] → insert_block_ext() writes extension info [Col 6] → attach_block() establishes main-chain indexes [Cols 0, 5, 11] → attach_block_cell() creates new Cells and deletes spent Cells [Cols 10, 12, 14] → insert_tip_header() updates the chain tip [Col 4] → insert_epoch_ext() updates the epoch [Cols 8, 9] → commit() performs the atomic RocksDB commit. Notably, insert_block and attach_block are two distinct phases: the former writes raw block data, while the latter builds main-chain indexes and updates UTXO state.

Read Path (Three-Tier Fallback): get_block(hash) searches in order: ① StoreCache (LRU in-memory cache) → ② Freezer (cold storage, hit when block_number < freezer.number()) → ③ RocksDB (hot data).

Freezer Cold Storage: freezer/src/freezer.rs:30 implements a standalone archival storage system using memory-mapped, append-only, Snappy-compressed archive files, with a maximum data file size of 2 GB and 12-byte index entries. A background thread periodically migrates old blocks from RocksDB to the Freezer, reducing RocksDB’s data volume and compaction pressure. At read time, get_block() first checks freezer.number() to determine whether the target block has already been archived.

1.4 Synchronization Layer (sync/ + chain/)

Synchronizer — Block Sync Protocol (Protocol ID 100): Defined at sync/src/synchronizer/mod.rs:357. Key fields include chain: ChainController (chain service controller), shared: Arc<SyncShared> (sync-layer shared state), and fetch_channel: Option<Sender<FetchCMD>>. SyncShared (sync/src/types/mod.rs:991) wraps the global Shared and adds sync-specific state such as header_map and InFlightBlocks.

Timer-Driven Model: The Synchronizer’s core runtime logic is driven by four timers: SEND_GET_HEADERS (1 s interval, triggers start_sync_headers() to request headers from peers), IBD_BLOCK_FETCH (40 ms interval, find_blocks_to_fetch(IBD::In) for rapid block fetching during IBD), NOT_IBD_BLOCK_FETCH (200 ms interval, find_blocks_to_fetch(IBD::Out) for regular block fetching outside IBD), and TIMEOUT_EVICTION (1 s interval, eviction() for checking and evicting timed-out peers). During IBD, the block-fetch frequency is 5× that of non-IBD (40 ms vs. 200 ms)—a key design choice for synchronization performance.

Message Dispatch: try_process (line 381) dispatches messages to different processors based on message type: GetHeaders → GetHeadersProcess, SendHeaders → HeadersProcess (validates continuity, updates shared_best_header), GetBlocks → GetBlocksProcess, SendBlock → BlockProcess (marks BLOCK_RECEIVED, forwards to the chain service), and InIBD → InIBDProcess (peers inform each other of their IBD status).

Headers-First Sync Strategy: ① Send a GetHeaders request to a peer → ② HeadersProcess receives Headers, validates continuity + PoW, stores them in header_map and marks them HEADER_VALID → ③ BlockFetcher (sync/src/synchronizer/block_fetcher.rs:18) decides which blocks to request from which peer based on header_map → ④ Send GetBlocks; the peer responds with SendBlock.

IBD State Determination: Located at shared/src/shared.rs:382. The condition is: when (current_time − tip_header.timestamp) > 24 hours, the node is considered to be in IBD. Once the node exits IBD, it never re-enters (a unidirectional design).

Relayer — Relay Protocol (Protocol ID 101): Defined at sync/src/relayer/mod.rs:78. Responsible for rapid CompactBlock propagation, this is the primary mechanism for receiving new blocks during non-IBD steady-state operation. The processing flow is: receive CompactBlock → non-contextual verification (format, PoW) → contextual verification (parent block exists, timestamp, difficulty) → reconstruct the full block using transactions already present in the tx_pool → if any transactions are missing, send a GetBlockTransactions request to fill the gaps → once reconstruction succeeds, forward to the chain service. The concrete implementation is at sync/src/relayer/compact_block_process.rs:56.

Chain Service — Block Verification and Insertion (Multi-Threaded Architecture): ChainController (chain/src/chain_controller.rs:16) sends blocks via channel to the backend ChainService thread (chain/src/chain_service.rs:17). The ChainService thread performs non-contextual verification (PoW + transaction syntax + signatures) and calls insert_block() to write to the DB. Then OrphanBroker::process_lonely_block() routes the block: if the parent block is already stored, it forwards the block to the ConsumeUnverifiedBlocks thread; if the parent is absent, it enters the OrphanBlockPool (chain/src/utils/orphan_block_pool.rs:129) to wait. The OrphanBlockPool internally uses HashMap<ParentHash, HashMap<BlockHash, LonelyBlock>> to store orphan blocks; once a parent is verified, search_orphan_leader() recursively processes its descendant blocks.

ConsumeUnverifiedBlocks: Located at chain/src/verify.rs:31, running as an independent thread. It performs full contextual verification (UTXO checks, script execution, proposal window, uncle validation), computes total_difficulty, and determines whether a new best chain has emerged. If a fork switch is needed, it executes detach of old blocks + attach of new blocks, then atomically commits via StoreTransaction::commit() to RocksDB, updates SnapshotMgr to produce a new chain-tip snapshot, and fires a verify_callback to notify the sync layer of the verification result, completing the feedback loop.

1.5 Complete Call Chain: Block from Peer to Disk

[Peer]
  │ TCP / Tentacle
  ▼
NetworkService (receive and decompress message)
  │
  ├─ Protocol ID 100 ──→ Synchronizer.received()
  │                        └─ BlockProcess::execute()
  │
  └─ Protocol ID 101 ──→ Relayer.received()
                           └─ CompactBlockProcess::execute()
                                └─ Reconstruct full block via tx_pool
  │
  ▼
ChainController::asynchronous_process_remote_block()
  │  Send via channel
  ▼
ChainService thread
  │  ① Non-contextual verification (PoW, tx syntax, signatures)
  │  ② insert_block() → RocksDB [Cols 1,2,3,7,13,15]
  ▼
OrphanBroker::process_lonely_block()
  │  Parent stored? ──No──→ OrphanBlockPool (wait for parent)
  │       │
  │      Yes
  ▼
ConsumeUnverifiedBlocks thread
  │  ① Full contextual verification (UTXO, Script, Epoch, Uncle)
  │  ② Compute total_difficulty
  │  ③ New best chain? → Fork switch (detach_block + attach_block)
  │  ④ StoreTransaction::commit() → atomic RocksDB write
  │     [Cols 0,4,5,6,8,9,10,11,12,14]
  │  ⑤ Update SnapshotMgr → new chain-tip snapshot
  │  ⑥ verify_callback → notify sync layer of result
  ▼
[Block persisted]
  │
  ▼ (background)
Freezer::freeze() — periodically migrate old blocks to cold storage

1.6 Key File Index

Startup entry       ckb-bin/src/subcommand/run.rs:17
Service assembly    util/launcher/src/lib.rs:191 (build_shared), :403 (start_network)
Shared hub          shared/src/shared.rs:58 (struct), :382 (IBD determination)
Network service     network/src/network.rs:73 (State), :827 (Service), :1334 (Controller)
Protocol trait      network/src/protocols/mod.rs:47 (Context), :161 (Handler)
Protocol IDs        network/src/protocols/support_protocols.rs:20-57
Peer management     network/src/peer_registry.rs:22
DB engine           db/src/db.rs:27 (OptimisticTransactionDB)
Column families     db-schema/src/lib.rs:1-59
Store API           store/src/store.rs:27 (ChainStore trait)
Store writes        store/src/transaction.rs:131-415
ChainDB             store/src/db.rs:25
Freezer             freezer/src/freezer.rs:30-177
Synchronizer        sync/src/synchronizer/mod.rs:357 (struct), :869 (Handler impl)
BlockFetcher        sync/src/synchronizer/block_fetcher.rs:18
Relayer             sync/src/relayer/mod.rs:78
CompactBlock        sync/src/relayer/compact_block_process.rs:56
SyncShared          sync/src/types/mod.rs:991
ChainController     chain/src/chain_controller.rs:16
ChainService        chain/src/chain_service.rs:17
Contextual verify   chain/src/verify.rs:31
Orphan pool         chain/src/utils/orphan_block_pool.rs:129
DB migration        db-migration/src/lib.rs:34-394

2. Development Environment Setup

The local development machine runs Ubuntu 24.04 LTS (kernel version 5.15+, meeting eBPF feature requirements). The Rust toolchain was installed via rustup: the stable channel is used for userspace development, and the nightly channel is used for eBPF kernel-side compilation (as specified in rust-toolchain.toml). The rust-src component and bpf-linker were also installed to support cross-compilation for the BPF target. All system libraries required for compiling the CKB source (clang, llvm, pkg-config, libssl-dev, librocksdb-dev, etc.) have been installed and verified. The IDE is VS Code with the rust-analyzer extension, configured with automatic clippy checks and cargo fmt formatting. Additionally, auxiliary debugging tools such as bpftool and bpftrace were installed for eBPF program inspection and tracing.


3. CKB Testnet Node Deployment

A CKB testnet node was deployed using the official release binary downloaded from GitHub. The specific steps were: ckb init --chain testnet was used to generate testnet configuration files (ckb.toml and ckb-miner.toml); after confirming the bootnode configuration in ckb.toml, ckb run was executed to start the node. Once the node was running, connectivity was verified via the RPC interface (default http://127.0.0.1:8114): curl was used to call get_tip_block_number to confirm that block height was continuously increasing, and get_peers was called to confirm successful connections to multiple testnet peers. During the initial sync phase, the IBD speed was approximately one block per second, and the node has since synced to the latest height.


III. Next Week’s Plan (Week 2)

The core objective for next week is to conduct a comprehensive symbol reconnaissance and analysis of the CKB binary, and to engineer the results into the first subcommand of the ckb-probe tool. This breaks down into the following four areas:

Comprehensive CKB Binary Symbol Reconnaissance: Obtain the precompiled binary published on CKB’s official GitHub Release page (Linux x86_64) as well as locally compiled debug and release builds from source. Use nm, readelf --syms, objdump -T, and similar tools to extract symbol tables from all three binaries. Perform a comparative analysis of whether the official release has been stripped, differences in symbol visibility (exported / local / undefined), and coverage differences between the two build methods. The output will be a symbol-difference matrix comparing the official release against self-compiled builds.

In-Depth Analysis of RocksDB Linkage: Focus on analyzing CKB’s RocksDB linking strategy. Use ldd to check dynamic link dependencies and nm -D to check whether rocksdb_* C API symbols are exposed in the dynamic symbol table. Cross-reference with the ckb-db crate’s Cargo.toml and build.rs to determine whether RocksDB is statically linked (bundled) or dynamically linked to the system library. Further analyze the symbol names and mangling of key RocksDB functions (e.g., rocksdb_open, rocksdb_put, rocksdb_get, rocksdb_write) in the final binary, providing a basis for selecting uprobe attachment points.

Tiered Report Generation: Organize the reconnaissance results into a structured, tiered report, classifying symbols into three levels: “directly uprobe-attachable symbols,” “symbols requiring DWARF-assisted location,” and “symbols that have been inlined or stripped and are unavailable.” The report will cover key functions across the P2P layer, storage layer (including RocksDB), synchronization layer, and transaction pool, serving as a selection reference manual for subsequent eBPF probe development.

Implement the ckb-probe symbols Subcommand: Based on the above reconnaissance methodology, implement a symbols subcommand in the ckb-probe project. This command will accept the CKB binary path as input and automate symbol extraction, classification, and report generation. On the technical implementation side, the plan is to use the goblin or object crate for ELF file parsing, clap for the CLI framework, and support two output formats: terminal tables (with color-coded tier annotations) and JSON for downstream pipeline integration.


Self-Assessment: All tasks were completed on schedule. A precise, source-line-level understanding has been established of the complete code path from CKB node startup and assembly through P2P communication, storage persistence, and block synchronization and insertion. The development environment and testnet node are fully operational, providing a solid foundation for next week’s binary symbol analysis and ckb-probe tool development.

6 Likes

wow this is great resource you are building here!

2 Likes

Hi @Clair

非常高兴看到你准时发布了本周的进度更新!

CKB Probe 项目本周的周更内容详实且清晰,体现了你的专业态度和高效执行力,这对整个 Spark Program 的项目节奏起到了很好的示范作用。

继续保持这种良好的节奏,我们相信 CKB Probe 项目会按计划稳步推进,并为 CKB 生态带来更多有价值的探索。有什么需要 Committee 支持或反馈的地方,请随时在周同步中提出。

感谢你的努力和坚持,期待下周继续看到你的精彩更新!


Hi @clair ,

Great to see you posted this week’s progress update on time!

The weekly update for the CKB Probe project this week was detailed and clear, reflecting your professionalism and efficient execution, which sets a strong example for the overall pace of the Spark Program.

Keep up this excellent momentum — we believe the CKB Probe project will proceed steadily as planned and bring more valuable exploration to the CKB ecosystem. If there’s anything you need the Committee’s support or feedback on, please raise it in the weekly sync.

Thank you for your effort and persistence. Looking forward to seeing more great updates from you next week!

Best,
Xingtian
On behalf of the Spark Program Committee

3 Likes

Week 2 周报:CKB 二进制符号全面侦察

周期: 2026-03-23 ~ 2026-03-29
作者: clair
项目: ckb-probe — 基于 eBPF 的 CKB 全节点深度可观测性工具
仓库地址: GitHub - clairjoestar/ckb-probe · GitHub


一、本周目标

  1. 扫描官方 Release 和自编译版本的 ELF 二进制,提取全量符号信息
  2. 分析 RocksDB 链接方式(静态 vs 动态),确认 uprobe 可行性
  3. 建立三级符号分类体系(Tier 1 / 2 / 3),生成分级报告
  4. 实现 ckb-probe symbols 子命令,提供终端彩色输出和 JSON 导出
  5. 排除关键风险: CKB 二进制被 strip 导致 RocksDB 无符号的可能性

二、完成情况

2.1 ckb-probe symbols 子命令实现

本周核心交付物是 ckb-probe symbols 子命令,可对任意 CKB ELF 二进制进行自动化符号侦察。

架构设计

main.rs                     CLI 入口,clap 解析后分发
  └── cli.rs                命令行定义(SymbolsArgs)
  └── commands/
        └── symbols.rs      分析引擎(876 行)
ckb-probe-common/
  └── lib.rs                共享类型 + ProbeTargets 注册表(333 行)

设计决策:

  • 使用 goblin crate 解析 ELF,纯 Rust 实现,无需系统依赖(binutils / readelf)
  • 使用 rustc-demangle 处理 Rust 符号反混淆,支持 {:#} 格式输出无 hash 后缀的路径
  • 分析引擎与 I/O 解耦:build_report() 产出 SymbolReport 结构体,emit_terminal() / emit_json() 各自负责渲染
  • 共享类型独立到 ckb-probe-common crate,为 Week 3 的 eBPF 程序预留共享接口

命令行接口

ckb-probe symbols <BINARY> [OPTIONS]

选项:
  --json              JSON 格式输出
  -v, --verbose       显示详细信息(mangled 名、地址、大小)
  --filter <PATTERN>  按子串过滤(不区分大小写)
  --tier <N>          仅显示 Tier N(1, 2, 或 3)

示例用法:

# 基础扫描
ckb-probe symbols ./ckb

# JSON 导出 + jq 查询 Tier 1 数量
ckb-probe symbols ./ckb --json | jq '.tier1 | length'

# 过滤 transaction 相关符号
ckb-probe symbols ./ckb --tier 1 --filter transaction

# 详细模式
ckb-probe symbols ./ckb -v

核心分析流程

commands/symbols.rsbuild_report() 函数的分析管线:

步骤 操作 输出
1 读取 ELF 元数据(class、architecture) ElfOverview
2 统计 .symtab / .dynsym 符号数,检测 DWARF 和 strip 状态 strip_status
3 枚举动态依赖(DT_NEEDED dynamic_deps
4 RocksDB 链接方式检测:检查 .dynsym + .symtab 中的 rocksdb_* 符号 RocksdbLinkage
5 构建 demangled 查找表:HashMap<String, Vec<ResolvedSym>> lookup
6 Tier 1 精确匹配:按 C API 符号名直接查找 tier1: Vec<SymbolInfo>
7 Tier 2 子串匹配:is_direct_match() 智能过滤编译器噪声 tier2: Vec<SymbolInfo>
8 Tier 3 追踪缺失:合并未命中的 Tier 2 + 预期缺失列表 tier3_missing
9 汇总 + 建议 ReportSummary

Tier 2 符号匹配的噪声过滤

Rust 二进制中大量编译器生成符号(drop glue、GenFuture、Box wrapper)包含目标路径作为泛型参数,如:

core::ptr::drop_in_place<tokio::..::Cell<NetworkService::start<Handle>::{{closure}}>>

is_direct_match() 函数通过以下策略过滤噪声:

  1. 前缀黑名单:排除 core::ptr::drop_in_place<GenFuture<Box< 等已知噪声前缀
  2. 角括号深度追踪:遍历字符串维护 <> 嵌套深度,仅接受深度 0(顶层作用域)处的匹配
  3. 闭包后缀兼容Foo::bar::{{closure}} 仍视为 Foo::bar 的有效匹配

2.2 官方 Release vs 自编译版本扫描

对 CKB v0.205.0 的两个版本进行了全面扫描。

二进制概览

维度 官方 Release 自编译
文件大小 51.6 MB 903.2 MB
架构 ELF 64-bit x86_64 ELF 64-bit x86_64
.symtab 符号数 78,847 152,937
.dynsym 符号数 522 512
DWARF 调试信息
Strip 状态 debuginfo-stripped(.symtab 保留) 未 strip(完整 .symtab + DWARF)
函数符号总数 53,522 87,004 (+62.6%)
函数符号分布 GLOBAL 9,819 / LOCAL 39,407 / WEAK 4,296 GLOBAL 18,160 / LOCAL 64,558 / WEAK 4,286
对象符号 9,095 22,957

关键发现: 自编译版本的函数符号数是官方 Release 的 1.63 倍,主要增量来自 LOCAL 绑定的 crate 内部函数(64,558 vs 39,407),这些在官方版本中被 LTO 消除。


2.3 RocksDB 链接方式分析

判定结果:两个版本均为静态链接

官方 Release 自编译
链接方式 静态 静态
rocksdb_\* C API 符号数 151 155
动态依赖中的 librocksdb.so
.dynsym 中的 rocksdb 符号

判定逻辑(build_report() 中实现)

if DT_NEEDED 包含 "rocksdb" OR .dynsym 包含 rocksdb_*  →  Dynamic
else if .symtab 包含 rocksdb_* 函数符号 > 0              →  Static
else                                                       →  Unknown

CKB 通过 librocksdb-sys crate 从源码编译 RocksDB 并静态链接。判定依据:

  1. 动态依赖列表(DT_NEEDED)中不包含 librocksdb.so — 排除动态链接
  2. .dynsym rocksdb_* 前缀符号 — 非通过 PLT 调用
  3. .symtab 中存在 151-155 个 rocksdb_* 前缀函数符号 — 确认静态嵌入

动态依赖差异

官方 Release 自编译
libstdc++.so.6 libstdc++.so.6
libgcc_s.so.1 libgcc_s.so.1
libpthread.so.0
libm.so.6 libm.so.6
libdl.so.2
libc.so.6 libc.so.6
ld-linux-x86-64.so.2 ld-linux-x86-64.so.2

官方 Release 额外依赖 libpthread.so.0libdl.so.2,这两个库在 glibc 2.34+ 中已合并入 libc.so.6

评估

静态链接是 uprobe 探测的最佳配置。 RocksDB C API 符号(extern "C" 声明,无 name mangling)直接嵌入 CKB 二进制,uprobe 可按函数名直接挂载,无需解析动态库路径。


2.4 风险排除:CKB 二进制被 strip 导致 RocksDB 无符号

本风险在 Week 1 风险登记中列为影响度 高 / 发生概率 中,本周通过实际扫描予以确认排除。

风险描述

若 CKB 二进制在发布时执行了完全 strip(strip --strip-all),则 .symtab 段会被彻底移除,所有 RocksDB C API 符号名将不可见,uprobe 将无法按函数名挂载,ckb-probe 的核心观测能力将完全丧失。

Week 2 排查结论:风险已排除

通过 ckb-probe symbols 对官方 Release 和自编译版本的实际扫描,得到以下确认:

检查项 官方 Release 自编译版本 结论
.symtab 是否存在 :white_check_mark: 存在(78,847 符号) :white_check_mark: 存在(152,937 符号) 均未完全 strip
rocksdb_* C API 符号 151 个 155 个 uprobe 目标充足
Strip 级别 debuginfo-stripped(仅去除 DWARF) 未 strip 符号表完整保留
Tier 1 覆盖率 15/20 (75%) 16/20 (80%) 主读写路径全部覆盖

核心结论: 官方 Release 采用 debuginfo-stripped 策略——仅移除 DWARF 调试信息(减小体积),保留 .symtab 符号表。所有 RocksDB extern "C" 函数符号均完整可用,uprobe 方案不受影响。

2.5 三级符号分级报告

Tier 1 — RocksDB C API(直接可探测)

稳定的 extern "C" 符号,无 name mangling,跨 CKB 版本不变,首选 uprobe 目标。

符号 说明 官方 Release 自编译
rocksdb_get 通用点读 0x02291720 (358 B) 0x021d0340 (348 B)
rocksdb_get_cf 带 CF 的点读 0x021d04e0 (261 B)
rocksdb_get_pinned 固定点读(零拷贝) 0x02293280 (575 B) 0x021d2080 (446 B)
rocksdb_get_pinned_cf 带 CF 的固定读 — CKB 主读路径 0x02293500 (538 B) 0x021d22a0 (451 B)
rocksdb_put 通用单次写入 0x022912f0 (403 B) 0x021cffd0 (151 B)
rocksdb_put_cf 带 CF 的单次写入 0x022914b0 (280 B) 0x021d00a0 (280 B)
rocksdb_delete 通用单次删除
rocksdb_delete_cf 带 CF 的单次删除
rocksdb_write WriteBatch 原子提交 0x022915f0 (249 B) 0x021d01f0 (266 B)
rocksdb_multi_get_cf 带 CF 的批量读取
rocksdb_transaction_put_cf 事务写入 — CKB 主写路径 0x02292f90 (155 B) 0x021d1be0 (159 B)
rocksdb_transaction_delete_cf 事务删除 0x02293050 (138 B) 0x021d1cc0 (146 B)
rocksdb_transaction_get_cf 事务读取
rocksdb_transaction_commit 事务提交 0x02292b70 (112 B) 0x021d15b0 (121 B)
rocksdb_optimistictransaction_begin 开始乐观事务 0x022931a0 (108 B) 0x021d1f60 (113 B)
rocksdb_create_iterator_cf 创建带 CF 的迭代器 0x022919d0 (183 B) 0x021d0710 (204 B)
rocksdb_iter_seek 迭代器定位到键 0x02291f30 (71 B) 0x021d0c70 (70 B)
rocksdb_iter_seek_to_first 迭代器定位到首条 0x02291f10 (13 B) 0x021d0c50 (13 B)
rocksdb_iter_next 迭代器前进 0x02291fd0 (13 B) 0x021d0d10 (13 B)
rocksdb_iter_destroy 销毁迭代器 0x02291ed0 (36 B) 0x021d0c10 (48 B)

覆盖率: 官方 Release 15/20 (75%) | 自编译 16/20 (80%)

缺失符号分析:

缺失符号 官方 自编译 原因
rocksdb_get_cf 缺失 存在 CKB 主要使用 get_pinned_cf,官方 LTO 消除
rocksdb_delete 缺失 缺失 CKB 删除走 Transaction 路径,未直接调用
rocksdb_delete_cf 缺失 缺失 同上,CKB 使用 transaction_delete_cf
rocksdb_multi_get_cf 缺失 缺失 CKB 未使用批量读取 API
rocksdb_transaction_get_cf 缺失 缺失 CKB 事务读走其他路径

两个版本共同缺失的 4 个符号是 CKB 代码确实未调用的 RocksDB 函数,被链接器的 dead code elimination 正常移除。

Tier 2 — Rust 跨 Crate 函数(版本绑定)

Mangled Rust 函数,每次编译的 hash 后缀不同,需要 rustc-demangle 动态解析。

符号路径 说明 分类 官方 自编译
NetworkService::start P2P 服务启动 P2P 网络 存在 存在
CKBHandler::received 协议消息接收回调 P2P 网络
ServiceControl::send_message_to 向特定节点发消息 P2P 网络
ServiceControl::disconnect 断开节点连接 P2P 网络
Synchronizer::received 同步协议消息处理 同步
Synchronizer::try_process 同步消息分发 同步
Relayer::received 中继协议消息处理 同步
HeadersProcess::execute 处理接收的区块头 同步 存在 存在
BlockProcess::execute 处理接收的区块 同步 存在
BlockFetcher::fetch 决定获取哪些区块 同步 存在 存在
CompactBlockProcess::execute 处理紧凑区块中继 同步 存在
ChainService::process_block 链服务区块处理入口 链服务
ChainController::asynchronous_process_remote_block 异步远程区块提交 链服务 存在
ConsumeUnverifiedBlocks::verify_block 完整上下文区块验证 链服务
ChainDB::get_block 高层区块检索 存储
StoreTransaction::insert_block 写入区块数据到 DB 存储 存在 存在
StoreTransaction::attach_block 构建主链索引 存储 存在 存在
StoreTransaction::commit 原子提交存储事务 存储 存在
RocksDB::get_pinned 底层固定读封装 存储 存在 存在
Freezer::freeze 迁移旧区块到冷存储 存储 存在
Freezer::retrieve 从冷存储读取区块 存储 存在 存在

覆盖率: 官方 Release 8/21 (38%) | 自编译 11/21 (52%)

自编译版本多出关键符号:BlockProcess::executeChainController::asynchronous_process_remote_blockFreezer::freezeStoreTransaction::commit。这些在官方 Release 的 prod profile(lto = true, codegen-units = 1)下被跨 crate 内联消除。

Tier 3 — 不可用符号(内联/消除)

Release 构建中通常被内联消除的 crate 内部函数,不适合作为 uprobe 目标。

符号路径 说明 缺失原因 官方 自编译
compress Snappy 压缩 release 内联
decompress Snappy 解压 release 内联
StoreCache::get_header LRU 缓存头读取 release 内联
SyncShared::insert_new_block crate 内部辅助函数 release 内联
SyncShared::is_initial_block_download IBD 检查标志 release 内联
OrphanBlockPool::insert 孤块池插入 crate 内部 存在
OrphanBlockPool::search_orphan_leader 孤块池搜索 crate 内部
PeerRegistry::accept 接受入站节点 可能被内联 存在
PeerRegistry::try_outbound_peer 尝试出站连接 可能被内联
PeerRegistry::remove 移除节点 可能被内联 存在 存在
ServiceControl::filter_broadcast 过滤广播辅助 可能被内联
ChainDB::get_block_header 区块头检索 可能被内联

Tier 3 缺失数: 官方 Release: 24 | 自编译: 19


三、覆盖率汇总

级别 官方 Release 自编译 差异
Tier 1(RocksDB C API) 15/20 (75%) 16/20 (80%) +1
Tier 2(Rust 函数) 8/21 (38%) 11/21 (52%) +3
Tier 3(缺失) 24 19 -5
函数符号总数 53,522 87,004 +62.6%
RocksDB C API 符号 151 155 +4

四、代码实现详情

4.1 项目结构

ckb-probe/
├── Cargo.toml                          # workspace 根配置
├── ckb-probe/                          # 主 CLI 二进制
│   ├── Cargo.toml                      # 依赖:goblin, rustc-demangle, clap, serde, colored
│   └── src/
│       ├── main.rs                     # 入口:CLI 解析 + 命令分发(17 行)
│       ├── cli.rs                      # clap 命令行定义(52 行)
│       └── commands/
│           ├── mod.rs                  # 模块声明
│           └── symbols.rs             # 分析引擎(876 行)
├── ckb-probe-common/                   # 共享类型库
│   ├── Cargo.toml
│   └── src/
│       └── lib.rs                      # 数据结构 + ProbeTargets 注册表(333 行)
└── ckb-probe-ebpf/                     # Week 3+ eBPF 程序(桩)

4.2 依赖选择

依赖 版本 用途
goblin 0.9 纯 Rust ELF 解析,无需 binutils
rustc-demangle 0.1 Rust 符号反混淆
clap 4 CLI 参数解析(derive 宏)
serde + serde_json 1 报告 JSON 序列化
colored 2 终端彩色输出
anyhow 1 错误处理

4.3 关键数据结构(ckb-probe-common

// 符号分级
enum SymbolTier { Tier1, Tier2, Tier3 }

// 功能分类
enum SymbolCategory { RocksdbCApi, P2pNetwork, Sync, ChainService, Storage, TxPool, Other }

// 符号信息
struct SymbolInfo {
    raw_name, demangled_name, address, size, binding,
    tier, category, is_probe_target, description
}

// RocksDB 链接方式
enum RocksdbLinkage { Static, Dynamic, Unknown }

// ELF 概览
struct ElfOverview { elf_class, has_symtab, symtab_count, has_dynsym, ... }

// 完整报告
struct SymbolReport { binary_path, file_size, elf, rocksdb_linkage, tier1, tier2, tier3_missing, summary }

// 探测目标注册表
struct ProbeTargets  // tier1(): 20 targets, tier2(): 21 targets, tier3_expected_missing(): 12 targets

4.4 测试覆盖

symbols.rs 包含 9 个单元测试:

测试 验证内容
tier1_targets_all_start_with_rocksdb Tier 1 目标均为 rocksdb_* 前缀
tier1_count Tier 1 注册 20 个目标
tier2_count Tier 2 注册 21 个目标
tier2_targets_are_rust_paths Tier 2 目标均包含 ::
demangle_c_symbol_is_identity C 符号反混淆为恒等变换
demangle_rust_symbol Rust mangled 名正确解析
filter_ci_works 不区分大小写过滤
direct_match_accepts_closure 闭包后缀视为有效匹配
direct_match_rejects_drop_in_place drop_in_place 噪声被正确过滤

五、关键技术决策

5.1 为什么使用三级分类?

级别 稳定性 适用场景 理由
Tier 1 跨版本稳定 生产环境 uprobe extern "C" 无 mangling,地址变化但名字不变
Tier 2 版本绑定 开发/测试环境 Rust mangled 名含 hash 后缀,需每次动态解析
Tier 3 不可用 仅 debug 构建 LTO/内联消除,release 中不存在

5.2 为什么 Tier 2 匹配需要噪声过滤?

Rust 编译器生成的泛型实例化(drop glue、Future poll、Box wrapper)会将目标函数路径嵌入 <> 泛型参数中。简单的 contains() 子串匹配会产生大量误报。is_direct_match() 通过角括号深度追踪,仅接受顶层作用域的匹配,将 Tier 2 误报率降至零。

5.3 为什么选择 goblin 而非 readelf/nm?

  • 跨平台:纯 Rust 实现,macOS/Linux 均可运行(方便 CI)
  • 无外部依赖:不依赖 binutils 安装
  • 可编程:直接操作 Elf 结构体,无需解析命令行输出
  • 性能:单次读取 + 解析,避免多次 fork/exec

六、风险登记更新

已排除风险

风险 影响度 发生概率 Week 2 状态 排查结论
CKB 二进制被 strip,RocksDB 无符号 :white_check_mark: 已排除 官方 Release 为 debuginfo-stripped,.symtab 完整保留,151 个 rocksdb_* C API 符号可用

七、结论与建议

本周成果

  • ckb-probe symbols 子命令完整实现,支持终端彩色输出 + JSON 导出 + 过滤
  • 完成官方 Release 和自编译版本的全量扫描对比
  • 确认两个版本均为 RocksDB 静态链接,uprobe 方案可行
  • 建立了 53 个探测目标的三级分类注册表(20 Tier 1 + 21 Tier 2 + 12 Tier 3)
  • 排除关键风险: CKB 二进制未被完全 strip,RocksDB 符号完整可用。

八、Week 3 下周计划

四项 eBPF 可行性验证

  • RocksDB uprobe 延迟测量: 在真实 CKB 进程上挂载 uprobe,测量 rocksdb_get_pinned_cf 等核心函数的调用延迟,验证 entry/return 时间戳采集链路
  • 多函数 uprobe: 同时挂载多个 Tier 1 符号(读 / 写 / 事务 / 迭代器),验证并发探测的稳定性与性能开销
  • TCP kprobe: 挂载 tcp_sendmsg / tcp_recvmsg 等内核函数,捕获 CKB P2P 网络流量特征,验证 kprobe 程序类型可用性
  • sys_enter tracepoint: 挂载 tracepoint/syscalls/sys_enter_* 系统调用入口,观测 CKB 文件 I/O 与网络 syscall 模式

ckb-probe check 子命令实现

  • 自动化环境预检:内核版本、BPF 支持、CAP_BPF 权限、CKB 进程状态、符号可用性一站式检测

里程碑 1 达成条件(Week 3 末)

  • 三种 BPF 程序类型(uprobe / kprobe / tracepoint)在 CKB 进程上验证通过
  • checksymbols 子命令交付可用
2 Likes

Week 2 Report: Comprehensive CKB Binary Symbol Reconnaissance

Period: 2026-03-23 ~ 2026-03-29
Author: clair
Project: ckb-probe — Deep observability tool for CKB full nodes, powered by eBPF
Repository: GitHub - clairjoestar/ckb-probe · GitHub


1. Goals for This Week

  1. Scan both the official Release and self-compiled builds of the CKB ELF binary, extracting full symbol information
  2. Analyze RocksDB linkage method (static vs. dynamic) and confirm uprobe feasibility
  3. Establish a three-tier symbol classification system (Tier 1 / 2 / 3) and generate a tiered report
  4. Implement the ckb-probe symbols subcommand, providing colored terminal output and JSON export
  5. Eliminate a key risk: the possibility that a stripped CKB binary lacks RocksDB symbols

2. Deliverables

2.1 ckb-probe symbols Subcommand Implementation

The core deliverable this week is the ckb-probe symbols subcommand, which performs automated symbol reconnaissance on any CKB ELF binary.

Architecture Design

main.rs                     CLI entry point, clap parsing and dispatch
  └── cli.rs                CLI definitions (SymbolsArgs)
  └── commands/
        └── symbols.rs      Analysis engine (876 lines)
ckb-probe-common/
  └── lib.rs                Shared types + ProbeTargets registry (333 lines)

Design Decisions:

  • Uses the goblin crate for ELF parsing — pure Rust implementation with no system dependencies (binutils / readelf)
  • Uses rustc-demangle for Rust symbol demangling, supporting {:#} format output for hash-free paths
  • Analysis engine decoupled from I/O: build_report() produces a SymbolReport struct; emit_terminal() / emit_json() each handle rendering independently
  • Shared types extracted into the ckb-probe-common crate, reserving a shared interface for the Week 3 eBPF programs

Command-Line Interface

ckb-probe symbols <BINARY> [OPTIONS]

Options:
  --json              Output in JSON format
  -v, --verbose       Show detailed info (mangled names, addresses, sizes)
  --filter <PATTERN>  Filter by substring (case-insensitive)
  --tier <N>          Show only Tier N (1, 2, or 3)

Example Usage:

# Basic scan
ckb-probe symbols ./ckb

# JSON export + jq query for Tier 1 count
ckb-probe symbols ./ckb --json | jq '.tier1 | length'

# Filter transaction-related symbols
ckb-probe symbols ./ckb --tier 1 --filter transaction

# Verbose mode
ckb-probe symbols ./ckb -v

Core Analysis Pipeline

The analysis pipeline in build_report() within commands/symbols.rs:

Step Operation Output
1 Read ELF metadata (class, architecture) ElfOverview
2 Count .symtab / .dynsym symbols; detect DWARF and strip status strip_status
3 Enumerate dynamic dependencies (DT_NEEDED) dynamic_deps
4 RocksDB linkage detection: check rocksdb_* symbols in .dynsym + .symtab RocksdbLinkage
5 Build demangled lookup table: HashMap<String, Vec<ResolvedSym>> lookup
6 Tier 1 exact match: look up C API symbol names directly tier1: Vec<SymbolInfo>
7 Tier 2 substring match: is_direct_match() with smart compiler-noise filtering tier2: Vec<SymbolInfo>
8 Tier 3 missing tracking: merge unmatched Tier 2 + expected-missing list tier3_missing
9 Summarize + recommendations ReportSummary

Noise Filtering for Tier 2 Symbol Matching

Rust binaries contain a large number of compiler-generated symbols (drop glue, GenFuture, Box wrappers) that embed target paths as generic parameters, e.g.:

core::ptr::drop_in_place<tokio::..::Cell<NetworkService::start<Handle>::{{closure}}>>

The is_direct_match() function filters noise using the following strategies:

  1. Prefix blocklist: Rejects known noise prefixes such as core::ptr::drop_in_place<, GenFuture<, Box<, etc.
  2. Angle-bracket depth tracking: Traverses the string maintaining <> nesting depth; only accepts matches at depth 0 (top-level scope)
  3. Closure suffix compatibility: Foo::bar::{{closure}} is still treated as a valid match for Foo::bar

2.2 Official Release vs. Self-Compiled Build Scan

A comprehensive scan was performed on two builds of CKB v0.205.0.

Binary Overview

Dimension Official Release Self-Compiled
File size 51.6 MB 903.2 MB
Architecture ELF 64-bit x86_64 ELF 64-bit x86_64
.symtab symbol count 78,847 152,937
.dynsym symbol count 522 512
DWARF debug info Absent Present
Strip status debuginfo-stripped (.symtab retained) Not stripped (full .symtab + DWARF)
Total function symbols 53,522 87,004 (+62.6%)
Function symbol breakdown GLOBAL 9,819 / LOCAL 39,407 / WEAK 4,296 GLOBAL 18,160 / LOCAL 64,558 / WEAK 4,286
Object symbols 9,095 22,957

Key Finding: The self-compiled build has 1.63× more function symbols than the official Release. The primary increase comes from LOCAL-binding crate-internal functions (64,558 vs. 39,407), which are eliminated by LTO in the official build.


2.3 RocksDB Linkage Analysis

Determination: Both Builds Use Static Linking

Official Release Self-Compiled
Linkage method Static Static
rocksdb_\* C API symbol count 151 155
librocksdb.so in dynamic deps Absent Absent
rocksdb symbols in .dynsym Absent Absent

Detection Logic (implemented in build_report())

if DT_NEEDED contains "rocksdb" OR .dynsym contains rocksdb_*  →  Dynamic
else if .symtab contains rocksdb_* function symbols > 0         →  Static
else                                                             →  Unknown

CKB compiles RocksDB from source via the librocksdb-sys crate and links it statically. The determination is based on:

  1. The dynamic dependency list (DT_NEEDED) does not contain librocksdb.so — ruling out dynamic linking
  2. .dynsym contains no rocksdb_*-prefixed symbols — not invoked via PLT
  3. .symtab contains 151–155 rocksdb_*-prefixed function symbols — confirming static embedding

Dynamic Dependency Differences

Official Release Self-Compiled
libstdc++.so.6 libstdc++.so.6
libgcc_s.so.1 libgcc_s.so.1
libpthread.so.0
libm.so.6 libm.so.6
libdl.so.2
libc.so.6 libc.so.6
ld-linux-x86-64.so.2 ld-linux-x86-64.so.2

The official Release has additional dependencies on libpthread.so.0 and libdl.so.2, both of which were merged into libc.so.6 starting with glibc 2.34+.

Assessment

Static linking is the optimal configuration for uprobe instrumentation. RocksDB C API symbols (declared extern "C", no name mangling) are embedded directly in the CKB binary, allowing uprobes to attach by function name without resolving dynamic library paths.


2.4 Risk Elimination: CKB Binary Stripped of RocksDB Symbols

This risk was listed in the Week 1 risk register as Impact: High / Probability: Medium and has been confirmed eliminated through actual scanning this week.

Risk Description

If the CKB binary undergoes a full strip (strip --strip-all) at release time, the .symtab section would be completely removed, making all RocksDB C API symbol names invisible. Uprobes would be unable to attach by function name, and ckb-probe’s core observability capabilities would be entirely lost.

Week 2 Investigation Conclusion: Risk Eliminated

Through ckb-probe symbols scans of both the official Release and self-compiled builds, the following was confirmed:

Check Item Official Release Self-Compiled Conclusion
.symtab present :white_check_mark: Present (78,847 symbols) :white_check_mark: Present (152,937 symbols) Neither is fully stripped
rocksdb_* C API symbols 151 155 Sufficient uprobe targets
Strip level debuginfo-stripped (DWARF only) Not stripped Symbol table fully retained
Tier 1 coverage 15/20 (75%) 16/20 (80%) All primary read/write paths covered

Core Conclusion: The official Release uses a debuginfo-stripped strategy — only DWARF debug information is removed (to reduce file size), while the .symtab symbol table is fully retained. All RocksDB extern "C" function symbols are intact and available; the uprobe approach is unaffected.


2.5 Three-Tier Symbol Classification Report

Tier 1 — RocksDB C API (Directly Probeable)

Stable extern "C" symbols with no name mangling, unchanged across CKB versions — the preferred uprobe targets.

Symbol Description Official Release Self-Compiled
rocksdb_get Generic point read 0x02291720 (358 B) 0x021d0340 (348 B)
rocksdb_get_cf Point read with CF 0x021d04e0 (261 B)
rocksdb_get_pinned Pinned point read (zero-copy) 0x02293280 (575 B) 0x021d2080 (446 B)
rocksdb_get_pinned_cf Pinned read with CF — CKB primary read path 0x02293500 (538 B) 0x021d22a0 (451 B)
rocksdb_put Generic single write 0x022912f0 (403 B) 0x021cffd0 (151 B)
rocksdb_put_cf Single write with CF 0x022914b0 (280 B) 0x021d00a0 (280 B)
rocksdb_delete Generic single delete
rocksdb_delete_cf Single delete with CF
rocksdb_write WriteBatch atomic commit 0x022915f0 (249 B) 0x021d01f0 (266 B)
rocksdb_multi_get_cf Batch read with CF
rocksdb_transaction_put_cf Transaction write — CKB primary write path 0x02292f90 (155 B) 0x021d1be0 (159 B)
rocksdb_transaction_delete_cf Transaction delete 0x02293050 (138 B) 0x021d1cc0 (146 B)
rocksdb_transaction_get_cf Transaction read
rocksdb_transaction_commit Transaction commit 0x02292b70 (112 B) 0x021d15b0 (121 B)
rocksdb_optimistictransaction_begin Begin optimistic transaction 0x022931a0 (108 B) 0x021d1f60 (113 B)
rocksdb_create_iterator_cf Create iterator with CF 0x022919d0 (183 B) 0x021d0710 (204 B)
rocksdb_iter_seek Seek iterator to key 0x02291f30 (71 B) 0x021d0c70 (70 B)
rocksdb_iter_seek_to_first Seek iterator to first entry 0x02291f10 (13 B) 0x021d0c50 (13 B)
rocksdb_iter_next Advance iterator 0x02291fd0 (13 B) 0x021d0d10 (13 B)
rocksdb_iter_destroy Destroy iterator 0x02291ed0 (36 B) 0x021d0c10 (48 B)

Coverage: Official Release 15/20 (75%) | Self-Compiled 16/20 (80%)

Missing Symbol Analysis:

Missing Symbol Official Self-Compiled Reason
rocksdb_get_cf Missing Present CKB primarily uses get_pinned_cf; eliminated by LTO in official build
rocksdb_delete Missing Missing CKB deletes via Transaction path; never called directly
rocksdb_delete_cf Missing Missing Same as above; CKB uses transaction_delete_cf
rocksdb_multi_get_cf Missing Missing CKB does not use the batch read API
rocksdb_transaction_get_cf Missing Missing CKB transaction reads go through other paths

The 4 symbols missing from both builds are RocksDB functions that CKB code genuinely never calls, normally removed by the linker’s dead code elimination.

Tier 2 — Rust Cross-Crate Functions (Build-Specific)

Mangled Rust functions whose hash suffixes differ with each compilation, requiring dynamic resolution via rustc-demangle.

Symbol Path Description Category Official Self-Compiled
NetworkService::start P2P service startup P2P Network Present Present
CKBHandler::received Protocol message receive callback P2P Network
ServiceControl::send_message_to Send message to specific peer P2P Network
ServiceControl::disconnect Disconnect a peer P2P Network
Synchronizer::received Sync protocol message handler Sync
Synchronizer::try_process Sync message dispatch Sync
Relayer::received Relay protocol message handler Sync
HeadersProcess::execute Process received block headers Sync Present Present
BlockProcess::execute Process received blocks Sync Present
BlockFetcher::fetch Decide which blocks to fetch Sync Present Present
CompactBlockProcess::execute Process compact block relay Sync Present
ChainService::process_block Chain service block processing entry Chain Service
ChainController::asynchronous_process_remote_block Async remote block submission Chain Service Present
ConsumeUnverifiedBlocks::verify_block Full-context block verification Chain Service
ChainDB::get_block High-level block retrieval Storage
StoreTransaction::insert_block Write block data to DB Storage Present Present
StoreTransaction::attach_block Build main-chain index Storage Present Present
StoreTransaction::commit Atomically commit storage transaction Storage Present
RocksDB::get_pinned Low-level pinned read wrapper Storage Present Present
Freezer::freeze Migrate old blocks to cold storage Storage Present
Freezer::retrieve Read blocks from cold storage Storage Present Present

Coverage: Official Release 8/21 (38%) | Self-Compiled 11/21 (52%)

The self-compiled build retains key additional symbols: BlockProcess::execute, ChainController::asynchronous_process_remote_block, Freezer::freeze, and StoreTransaction::commit. These are eliminated by cross-crate inlining in the official Release’s prod profile (lto = true, codegen-units = 1).

Tier 3 — Unavailable Symbols (Inlined / Eliminated)

Crate-internal functions typically inlined away in release builds, unsuitable as uprobe targets.

Symbol Path Description Missing Reason Official Self-Compiled
compress Snappy compression Release inlining
decompress Snappy decompression Release inlining
StoreCache::get_header LRU cache header read Release inlining
SyncShared::insert_new_block Crate-internal helper Release inlining
SyncShared::is_initial_block_download IBD check flag Release inlining
OrphanBlockPool::insert Orphan block pool insert Crate-internal Present
OrphanBlockPool::search_orphan_leader Orphan block pool search Crate-internal
PeerRegistry::accept Accept inbound peer Possibly inlined Present
PeerRegistry::try_outbound_peer Attempt outbound connection Possibly inlined
PeerRegistry::remove Remove peer Possibly inlined Present Present
ServiceControl::filter_broadcast Broadcast filter helper Possibly inlined
ChainDB::get_block_header Block header retrieval Possibly inlined

Tier 3 missing count: Official Release: 24 | Self-Compiled: 19


3. Coverage Summary

Tier Official Release Self-Compiled Difference
Tier 1 (RocksDB C API) 15/20 (75%) 16/20 (80%) +1
Tier 2 (Rust functions) 8/21 (38%) 11/21 (52%) +3
Tier 3 (Missing) 24 19 −5
Total function symbols 53,522 87,004 +62.6%
RocksDB C API symbols 151 155 +4

4. Code Implementation Details

4.1 Project Structure

ckb-probe/
├── Cargo.toml                          # Workspace root config
├── ckb-probe/                          # Main CLI binary
│   ├── Cargo.toml                      # Deps: goblin, rustc-demangle, clap, serde, colored
│   └── src/
│       ├── main.rs                     # Entry: CLI parsing + command dispatch (17 lines)
│       ├── cli.rs                      # clap CLI definitions (52 lines)
│       └── commands/
│           ├── mod.rs                  # Module declarations
│           └── symbols.rs             # Analysis engine (876 lines)
├── ckb-probe-common/                   # Shared types library
│   ├── Cargo.toml
│   └── src/
│       └── lib.rs                      # Data structures + ProbeTargets registry (333 lines)
└── ckb-probe-ebpf/                     # Week 3+ eBPF programs (stub)

4.2 Dependency Choices

Dependency Version Purpose
goblin 0.9 Pure Rust ELF parsing; no binutils required
rustc-demangle 0.1 Rust symbol demangling
clap 4 CLI argument parsing (derive macros)
serde + serde_json 1 Report JSON serialization
colored 2 Colored terminal output
anyhow 1 Error handling

4.3 Key Data Structures (ckb-probe-common)

// Symbol tier
enum SymbolTier { Tier1, Tier2, Tier3 }

// Functional category
enum SymbolCategory { RocksdbCApi, P2pNetwork, Sync, ChainService, Storage, TxPool, Other }

// Symbol info
struct SymbolInfo {
    raw_name, demangled_name, address, size, binding,
    tier, category, is_probe_target, description
}

// RocksDB linkage method
enum RocksdbLinkage { Static, Dynamic, Unknown }

// ELF overview
struct ElfOverview { elf_class, has_symtab, symtab_count, has_dynsym, ... }

// Full report
struct SymbolReport { binary_path, file_size, elf, rocksdb_linkage, tier1, tier2, tier3_missing, summary }

// Probe target registry
struct ProbeTargets  // tier1(): 20 targets, tier2(): 21 targets, tier3_expected_missing(): 12 targets

4.4 Test Coverage

symbols.rs contains 9 unit tests:

Test Validates
tier1_targets_all_start_with_rocksdb All Tier 1 targets have rocksdb_* prefix
tier1_count 20 Tier 1 targets registered
tier2_count 21 Tier 2 targets registered
tier2_targets_are_rust_paths All Tier 2 targets contain ::
demangle_c_symbol_is_identity C symbol demangling is identity transform
demangle_rust_symbol Rust mangled name correctly resolved
filter_ci_works Case-insensitive filtering works
direct_match_accepts_closure Closure suffixes treated as valid matches
direct_match_rejects_drop_in_place drop_in_place noise correctly filtered

5. Key Technical Decisions

5.1 Why a Three-Tier Classification?

Tier Stability Use Case Rationale
Tier 1 Stable across versions Production uprobe extern "C" with no mangling; address changes but name is constant
Tier 2 Build-specific Dev/test environments Rust mangled names contain hash suffixes; require dynamic resolution each time
Tier 3 Unavailable Debug builds only Eliminated by LTO/inlining; absent in release builds

5.2 Why Does Tier 2 Matching Require Noise Filtering?

The Rust compiler generates generic instantiations (drop glue, Future poll, Box wrappers) that embed target function paths inside <> generic parameters. A naive contains() substring match would produce a large number of false positives. is_direct_match() uses angle-bracket depth tracking to accept only matches at the top-level scope, reducing the Tier 2 false positive rate to zero.

5.3 Why goblin Instead of readelf/nm?

  • Cross-platform: Pure Rust implementation; runs on both macOS and Linux (convenient for CI)
  • No external dependencies: Does not require binutils installation
  • Programmable: Directly operates on the Elf struct; no need to parse command-line output
  • Performance: Single read + parse; avoids multiple fork/exec invocations

6. Risk Register Update

Eliminated Risks

Risk Impact Probability Week 2 Status Investigation Conclusion
CKB binary stripped, RocksDB symbols missing High Medium :white_check_mark: Eliminated Official Release is debuginfo-stripped; .symtab fully retained with 151 rocksdb_* C API symbols available

7. Conclusions and Recommendations

This Week’s Achievements

  • ckb-probe symbols subcommand fully implemented with colored terminal output + JSON export + filtering
  • Completed full scan and comparison of official Release and self-compiled builds
  • Confirmed both builds use static RocksDB linking; the uprobe approach is viable
  • Established a three-tier classification registry of 53 probe targets (20 Tier 1 + 21 Tier 2 + 12 Tier 3)
  • Key risk eliminated: CKB binary is not fully stripped; RocksDB symbols are fully available

8. Week 3 Plan

Four eBPF Feasibility Validations

  • RocksDB uprobe latency measurement: Attach uprobes to a live CKB process and measure call latency for core functions such as rocksdb_get_pinned_cf, validating the entry/return timestamp collection pipeline
  • Multi-function uprobe: Simultaneously attach to multiple Tier 1 symbols (read / write / transaction / iterator) to verify the stability and performance overhead of concurrent probing
  • TCP kprobe: Attach to kernel functions such as tcp_sendmsg / tcp_recvmsg to capture CKB P2P network traffic characteristics, validating kprobe program type availability
  • sys_enter tracepoint: Attach to tracepoint/syscalls/sys_enter_* system call entry points to observe CKB file I/O and network syscall patterns

ckb-probe check Subcommand Implementation

  • Automated environment pre-check: kernel version, BPF support, CAP_BPF capabilities, CKB process status, and symbol availability — all-in-one detection

Milestone 1 Completion Criteria (End of Week 3)

  • All three BPF program types (uprobe / kprobe / tracepoint) validated on a CKB process
  • check and symbols subcommands delivered and functional
5 Likes

以及从下周起,后续所有周报将统一按照WEEK2的格式进行编写和提交,以确保格式的一致性和规范性。

3 Likes

Hi @clair

再次准时交付周报,委员会一致表示敬意~

一个小建议:
目前周报中包含了较多调试和排除 Bug 的技术细节,对于社区中非深度技术背景的关注者来说,阅读门槛偏高。
后续的周报可以考虑适当精简这部分内容,将技术细节留在最后的报告中完成,而周报可以尝试用 视频、截图或者发布预先体验版 等方式做一些阶段性成果的预告和展示。
这样应该能节省你编辑周报的时间,也更容易让社区成员直观感受到项目的进展,有助于提升社区的参与感和关注度。

期待下周的更新。

祝好,

行天


Hi @clair,

Delivered the weekly report on time again— the committee unanimously expressed their respect~

A small suggestion:
The current weekly reports include a lot of technical details about debugging and fixing bugs, which raises the reading barrier for community members without deep technical backgrounds.

For future weekly reports, consider streamlining this section and keeping the technical details in the final report. The weekly updates could instead preview and showcase milestone progress using videos, screenshots, or by releasing early demo versions.

This should save you time editing the weekly reports and make it easier for community members to intuitively perceive the project’s progress, helping to boost engagement and attention.

Looking forward to next week’s update.

Best regards,

Xingtian

On behalf of the Xinghuo Plan Committee

2 Likes

Week 3 Report: eBPF Feasibility Validation Passed + check Subcommand Shipped

Period: 2026-03-30 ~ 2026-04-05
Author: ckb-probe Development Team
Project: ckb-probe — eBPF-Based Deep Observability Tool for CKB Full Nodes


1. Goals for This Week

Per the project plan, Week 3 required completing four eBPF feasibility validations and reaching Milestone 1 by end of week:

  1. RocksDB uprobe latency measurement — entry/return pair attachment, measuring function execution duration
  2. Multi-function uprobe simultaneous attachment — all 19 Tier 1 symbols tested individually
  3. TCP kprobe — attaching to kernel tcp_sendmsg / tcp_recvmsg, capturing P2P network events
  4. sys_enter tracepoint — attaching to raw_syscalls/sys_enter, profiling syscall distribution
  5. Implement ckb-probe check subcommand — one-command environment detection + eBPF probe validation

2. Milestone 1 Completion Status

Milestone 1 (End of Week 3): Feasibility validation complete; three BPF program types verified against the CKB process. check and symbols subcommands delivered.

Deliverable Status Notes
uprobe/uretprobe validation :white_check_mark: 4 RocksDB function entry/return pairs attached successfully
Multi-function uprobe validation :white_check_mark: All 19 Tier 1 symbols tested individually; 15 confirmed attachable
kprobe/kretprobe validation :white_check_mark: tcp_sendmsg / tcp_recvmsg 4/4 attached, real-time event capture
tracepoint validation :white_check_mark: raw_syscalls/sys_enter attached successfully, real-time syscall capture
ckb-probe check subcommand :white_check_mark: 8 environment checks + eBPF probe validation
ckb-probe symbols subcommand :white_check_mark: Delivered in Week 2, no changes this week
eBPF kernel-side programs :white_check_mark: Built ckb-probe-ebpf from scratch: 6 uprobe pairs + 2 kprobe pairs + 1 tracepoint
xtask build system :white_check_mark: Dual-target eBPF compilation management

Milestone 1 achieved.


3. Feature Demos

3.1 ckb-probe check — One-Command Environment Detection + eBPF Validation

When invoked without --pid, only environment checks are performed (8 items). When --pid is provided, actual attachment tests for all three eBPF probe types are appended:

$ ckb-probe check --binary ./ckb --pid $(pgrep -x ckb)

╔══════════════════════════════════════════════════════════════╗
║  ckb-probe environment check                               ║
╠══════════════════════════════════════════════════════════════╣
  ✅ Kernel version            6.8.0-106-generic (need >= 5.8)
  ✅ BPF config                BPF=y SYSCALL=y JIT=y
  ✅ BTF support               /sys/kernel/btf/vmlinux exists
  ✅ Permissions               running as root
  ✅ bpf() syscall             available
  ✅ uprobe support            /sys/kernel/debug/tracing/uprobe_events exists
  ✅ CKB process               1 instance(s), pid=3127545
  ✅ CKB symbols               2/3 key symbols found (symtab)
╚══════════════════════════════════════════════════════════════╝

  Result: 8/8 checks passed
  🎉 All checks passed!

3.2 uprobe Validation — RocksDB Function Attachment

A full scan of all 19 Tier 1 trace targets confirmed that 15 are uprobe-attachable in the official CKB v0.205.0 binary:

╔══════════════════════════════════════════════════════════════╗
║  ckb-probe eBPF validation                                 ║
╠══════════════════════════════════════════════════════════════╣
  ✅ ── uprobe latency ──      entry/return pair attach test
  ✅   rocksdb_get_pinned_cf   entry + return attached
  ✅   rocksdb_put             entry + return attached
  ✅   rocksdb_write           entry + return attached
  ❌   rocksdb_delete          symbol not in binary (expected)
  ✅   rocksdb_create_iterator_cf  entry + return attached
  ❌   rocksdb_multi_get_cf    symbol not in binary (expected)
  ✅ ── uprobe Tier 1 ──       all 19 Tier 1 symbol attach test
  ✅   rocksdb_get             symbol found, uprobe-attachable
  ✅   rocksdb_get_pinned      symbol found, uprobe-attachable
  ✅   rocksdb_get_pinned_cf   symbol found, uprobe-attachable
  ✅   rocksdb_put             symbol found, uprobe-attachable
  ✅   rocksdb_put_cf          symbol found, uprobe-attachable
  ✅   rocksdb_write           symbol found, uprobe-attachable
  ❌   rocksdb_delete          not found in binary
  ❌   rocksdb_delete_cf       not found in binary
  ❌   rocksdb_multi_get_cf    not found in binary
  ✅   rocksdb_transaction_put_cf  symbol found, uprobe-attachable
  ✅   rocksdb_transaction_delete_cf  symbol found, uprobe-attachable
  ❌   rocksdb_transaction_get_cf  not found in binary
  ✅   rocksdb_transaction_commit  symbol found, uprobe-attachable
  ✅   rocksdb_optimistictransaction_begin  symbol found, uprobe-attachable
  ✅   rocksdb_create_iterator_cf  symbol found, uprobe-attachable
  ✅   rocksdb_iter_seek       symbol found, uprobe-attachable
  ✅   rocksdb_iter_seek_to_first  symbol found, uprobe-attachable
  ✅   rocksdb_iter_next       symbol found, uprobe-attachable
  ✅   rocksdb_iter_destroy    symbol found, uprobe-attachable
  ✅ uprobe summary            latency pairs: 4/6, Tier 1 symbols: 15/19
╚══════════════════════════════════════════════════════════════╝

The 4 :cross_mark: entries are RocksDB functions not called by CKB (removed by the linker’s dead code elimination). These are expected absences and do not affect core monitoring capabilities.

3.3 Real-Time Event Capture — uprobe / kprobe / tracepoint

After successful attachment, ckb-probe check automatically collects 3 seconds of live event data. Below is actual output from a CKB v0.204.0 node (pid=3127545):

  ⏳ Collecting live events for 3 seconds...

  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=84.7μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=67.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=53.6μs
  [uprobe] pid=3127545 tid=3310904 func=write                    latency=44.9μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=5783.8μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=3848.8μs
  [uprobe] pid=3127545 tid=3310904 func=create_iterator_cf       latency=23.2μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=82.1μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=59.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=54.6μs
  [syscall] pid=3127545 tid=3322756 nr=232 (epoll_wait)
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=3293.9μs
  [syscall] pid=3127545 tid=3310527 nr=1 (write)
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=8292.1μs
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=3363.4μs
  [syscall] pid=3127545 tid=3322756 nr=232 (epoll_wait)
  [uprobe] pid=3127545 tid=3310904 func=write                    latency=51.6μs
  [syscall] pid=3127545 tid=3310904 nr=1 (write)
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=6467.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=9306.0μs
  [uprobe] pid=3127545 tid=3310904 func=create_iterator_cf       latency=25.4μs
  [uprobe] pid=3127545 tid=3310902 func=get_pinned_cf            latency=6416.9μs
  [tcp] pid=3127545 tid=3322756 dir=TX bytes=1471
  [tcp] pid=3127545 tid=3322757 dir=RX bytes=469

  📊 Captured 1873 uprobe, 2 tcp, 621 syscall events in 3s

Key observations:

  • uprobe latency measurement is workingget_pinned_cf (CKB’s primary read path) latency ranges from 41.8μs to 9306.0μs; write runs at ~44–51μs; create_iterator_cf at ~23–25μs
  • kprobe captured P2P network TCP send/receive events (TX 1471 bytes / RX 469 bytes)
  • tracepoint captured syscall distribution (predominantly epoll_wait and write)
  • 3-second capture volume: 1873 uprobe + 2 tcp + 621 syscall = 2496 total events

4. Validation Results Summary

# Probe Type Target Attachment Result Event Capture
1 uprobe/uretprobe RocksDB latency measurement (entry/return pairs) 4/6 pairs :white_check_mark: 1873 events in 3s
2 uprobe symbol scan All 19 Tier 1 targets tested individually 15/19 :white_check_mark: Attachable
3 kprobe/kretprobe tcp_sendmsg / tcp_recvmsg 4/4 :white_check_mark: 2 events in 3s
4 tracepoint raw_syscalls/sys_enter 1/1 :white_check_mark: 621 events in 3s

All three BPF program types (uprobe / kprobe / tracepoint) validated successfully.


5. Currently Supported Commands

$ ckb-probe --help

Commands:
  check     Check environment and validate eBPF probes
  symbols   Analyse a CKB binary for uprobe-attachable symbols
Usage Description
ckb-probe check Environment detection (8 items)
ckb-probe check --binary ./ckb + CKB symbol check
ckb-probe check --binary ./ckb --pid 1234 + eBPF three-probe-type validation
ckb-probe check --binary ./ckb --pid 1234 --probe uprobe Validate uprobe only
ckb-probe symbols ./ckb Symbol analysis (Tier 1/2/3)
ckb-probe symbols ./ckb --json JSON format output
ckb-probe symbols ./ckb --tier 1 --filter get Filter by tier and keyword

6. Next Week’s Plan

Entering Phase 2: RocksDB Core Probe Development.

Per the project plan, Week 4 will build on the 15 Tier 1 symbols validated this week to implement full RocksDB storage layer deep tracing:

  1. uprobe/uretprobe for five core operationsget_pinned_cf (primary read), transaction_put_cf (primary write), write (batch write), create_iterator_cf (iterator creation), transaction_commit (transaction commit)
  2. Three types of BPF Maps — operation count statistics, latency distribution histograms (log2 buckets), slow-operation events exceeding thresholds
  3. RocksDbCollector — user-space data collector that periodically polls Map data and formats output

The goal is to reach Milestone 2 by end of Week 5: ckb-probe rocksdb running on a testnet node, outputting meaningful RocksDB performance data with basic anomaly detection.

2 Likes

Week 3 周报:eBPF 可行性验证通过 + check 子命令上线

周期:2026-03-30 ~ 2026-04-05
作者:clair
项目:ckb-probe — 基于 eBPF 的 CKB 全节点深度可观测性工具


一、本周目标

根据项目计划,Week 3 需完成四项 eBPF 可行性验证,并在 Week 3 末达成 里程碑 1

  1. RocksDB uprobe 延迟测量 — entry/return 配对挂载,测量函数执行耗时
  2. 多函数 uprobe 同时挂载 — 全部 19 个 Tier 1 符号逐一验证
  3. TCP kprobe — 挂载内核 tcp_sendmsg / tcp_recvmsg,捕获 P2P 网络事件
  4. sys_enter tracepoint — 挂载 raw_syscalls/sys_enter,统计系统调用分布
  5. 实现 ckb-probe check 子命令 — 一键环境检测 + eBPF 探针验证

二、里程碑 1 完成情况

里程碑 1(Week 3 末): 可行性验证完成,三种 BPF 程序类型在 CKB 进程上验证通过。check 和 symbols 子命令交付。

交付项 状态 说明
uprobe/uretprobe 验证 :white_check_mark: 4 组 RocksDB 函数 entry/return 配对挂载成功
多函数 uprobe 验证 :white_check_mark: 19 个 Tier 1 符号逐一测试,15 个确认可挂载
kprobe/kretprobe 验证 :white_check_mark: tcp_sendmsg / tcp_recvmsg 4/4 挂载,实时捕获事件
tracepoint 验证 :white_check_mark: raw_syscalls/sys_enter 挂载成功,实时捕获 syscall
ckb-probe check 子命令 :white_check_mark: 8 项环境检测 + eBPF 探针验证
ckb-probe symbols 子命令 :white_check_mark: Week 2 已交付,本周无变更
eBPF 内核态程序 :white_check_mark: 从零构建 ckb-probe-ebpf,6 对 uprobe + 2 对 kprobe + 1 个 tracepoint
xtask 构建系统 :white_check_mark: eBPF 双目标编译管理

里程碑 1 达成。


三、功能演示

3.1 ckb-probe check — 一键环境检测 + eBPF 验证

不传 --pid 时只做环境检测(8 项),传 --pid 后追加三种 eBPF 探针的实际挂载测试:

$ ckb-probe check --binary ./ckb --pid $(pgrep -x ckb)

╔══════════════════════════════════════════════════════════════╗
║  ckb-probe environment check                               ║
╠══════════════════════════════════════════════════════════════╣
  ✅ Kernel version            6.8.0-106-generic (need >= 5.8)
  ✅ BPF config                BPF=y SYSCALL=y JIT=y
  ✅ BTF support               /sys/kernel/btf/vmlinux exists
  ✅ Permissions               running as root
  ✅ bpf() syscall             available
  ✅ uprobe support            /sys/kernel/debug/tracing/uprobe_events exists
  ✅ CKB process               1 instance(s), pid=3127545
  ✅ CKB symbols               2/3 key symbols found (symtab)
╚══════════════════════════════════════════════════════════════╝

  Result: 8/8 checks passed
  🎉 All checks passed!

3.2 uprobe 验证 — RocksDB 函数挂载

对 19 个 Tier 1 追踪目标全量扫描,15 个在官方 CKB v0.205.0 二进制中确认可 uprobe 挂载

╔══════════════════════════════════════════════════════════════╗
║  ckb-probe eBPF validation                                 ║
╠══════════════════════════════════════════════════════════════╣
  ✅ ── uprobe latency ──      entry/return pair attach test
  ✅   rocksdb_get_pinned_cf   entry + return attached
  ✅   rocksdb_put             entry + return attached
  ✅   rocksdb_write           entry + return attached
  ❌   rocksdb_delete          symbol not in binary (expected)
  ✅   rocksdb_create_iterator_cf  entry + return attached
  ❌   rocksdb_multi_get_cf    symbol not in binary (expected)
  ✅ ── uprobe Tier 1 ──       all 19 Tier 1 symbol attach test
  ✅   rocksdb_get             symbol found, uprobe-attachable
  ✅   rocksdb_get_pinned      symbol found, uprobe-attachable
  ✅   rocksdb_get_pinned_cf   symbol found, uprobe-attachable
  ✅   rocksdb_put             symbol found, uprobe-attachable
  ✅   rocksdb_put_cf          symbol found, uprobe-attachable
  ✅   rocksdb_write           symbol found, uprobe-attachable
  ❌   rocksdb_delete          not found in binary
  ❌   rocksdb_delete_cf       not found in binary
  ❌   rocksdb_multi_get_cf    not found in binary
  ✅   rocksdb_transaction_put_cf  symbol found, uprobe-attachable
  ✅   rocksdb_transaction_delete_cf  symbol found, uprobe-attachable
  ❌   rocksdb_transaction_get_cf  not found in binary
  ✅   rocksdb_transaction_commit  symbol found, uprobe-attachable
  ✅   rocksdb_optimistictransaction_begin  symbol found, uprobe-attachable
  ✅   rocksdb_create_iterator_cf  symbol found, uprobe-attachable
  ✅   rocksdb_iter_seek       symbol found, uprobe-attachable
  ✅   rocksdb_iter_seek_to_first  symbol found, uprobe-attachable
  ✅   rocksdb_iter_next       symbol found, uprobe-attachable
  ✅   rocksdb_iter_destroy    symbol found, uprobe-attachable
  ✅ uprobe summary            latency pairs: 4/6, Tier 1 symbols: 15/19
╚══════════════════════════════════════════════════════════════╝

4 个 :cross_mark: 均为 CKB 未调用的 RocksDB 函数(被链接器 dead code elimination 移除),属于预期缺失,不影响核心监控能力。

3.3 实时事件采集 — uprobe / kprobe / tracepoint

挂载成功后,ckb-probe check 会自动采集 3 秒实时事件数据。以下是在 CKB v0.204.0 节点(pid=3127545)上的真实输出:

  ⏳ Collecting live events for 3 seconds...

  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=84.7μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=67.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=53.6μs
  [uprobe] pid=3127545 tid=3310904 func=write                    latency=44.9μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=5783.8μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=3848.8μs
  [uprobe] pid=3127545 tid=3310904 func=create_iterator_cf       latency=23.2μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=82.1μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=59.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=54.6μs
  [syscall] pid=3127545 tid=3322756 nr=232 (epoll_wait)
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=3293.9μs
  [syscall] pid=3127545 tid=3310527 nr=1 (write)
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=8292.1μs
  [uprobe] pid=3127545 tid=3310899 func=get_pinned_cf            latency=3363.4μs
  [syscall] pid=3127545 tid=3322756 nr=232 (epoll_wait)
  [uprobe] pid=3127545 tid=3310904 func=write                    latency=51.6μs
  [syscall] pid=3127545 tid=3310904 nr=1 (write)
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=6467.0μs
  [uprobe] pid=3127545 tid=3310904 func=get_pinned_cf            latency=9306.0μs
  [uprobe] pid=3127545 tid=3310904 func=create_iterator_cf       latency=25.4μs
  [uprobe] pid=3127545 tid=3310902 func=get_pinned_cf            latency=6416.9μs
  [tcp] pid=3127545 tid=3322756 dir=TX bytes=1471
  [tcp] pid=3127545 tid=3322757 dir=RX bytes=469

  📊 Captured 1873 uprobe, 2 tcp, 621 syscall events in 3s

关键观测:

  • uprobe 延迟测量生效get_pinned_cf(CKB 主读路径)延迟从 41.8μs 到 9306.0μs 不等,write 约 44-51μs,create_iterator_cf 约 23-25μs
  • kprobe 捕获到 P2P 网络 TCP 收发事件(TX 1471 bytes / RX 469 bytes)
  • tracepoint 捕获到 syscall 分布(以 epoll_waitwrite 为主)
  • 3 秒采集量:1873 个 uprobe + 2 个 tcp + 621 个 syscall = 2496 个事件

四、验证结果汇总

# 探针类型 目标 挂载结果 事件采集
1 uprobe/uretprobe RocksDB 延迟测量(entry/return 配对) 4/6 对 :white_check_mark: 3s 内采集 1873 事件
2 uprobe 符号扫描 全部 19 个 Tier 1 目标逐一测试 15/19 :white_check_mark: 可挂载
3 kprobe/kretprobe tcp_sendmsg / tcp_recvmsg 4/4 :white_check_mark: 3s 内采集 2 事件
4 tracepoint raw_syscalls/sys_enter 1/1 :white_check_mark: 3s 内采集 621 事件

三种 BPF 程序类型(uprobe / kprobe / tracepoint)全部验证通过。


五、当前支持的命令

$ ckb-probe --help

Commands:
  check     Check environment and validate eBPF probes
  symbols   Analyse a CKB binary for uprobe-attachable symbols
用法 说明
ckb-probe check 环境检测(8 项)
ckb-probe check --binary ./ckb + CKB 符号检查
ckb-probe check --binary ./ckb --pid 1234 + eBPF 三种探针验证
ckb-probe check --binary ./ckb --pid 1234 --probe uprobe 只验证 uprobe
ckb-probe symbols ./ckb 符号分析(Tier 1/2/3)
ckb-probe symbols ./ckb --json JSON 格式输出
ckb-probe symbols ./ckb --tier 1 --filter get 按级别和关键字过滤

六、下周计划

进入第二阶段:RocksDB 核心探针开发(Phase 2)。

根据项目计划,Week 4 将基于本周验证通过的 15 个 Tier 1 符号,实现完整的 RocksDB 存储层深度追踪:

  1. 五种核心操作的 uprobe/uretprobeget_pinned_cf(主读)、transaction_put_cf(主写)、write(批量写)、create_iterator_cf(迭代器创建)、transaction_commit(事务提交)
  2. 三类 BPF Map — 操作计数统计、延迟分布直方图(log2 桶)、超阈值慢操作事件
  3. RocksDbCollector — 用户态数据采集器,定时轮询 Map 数据并格式化输出

目标是在 Week 5 末达成 里程碑 2ckb-probe rocksdb 在测试网节点上可用,输出有意义的 RocksDB 性能数据并含基础异常检测。

4 Likes

您好,打扰了,这边请问我的月度总结是否也可以文字形式发在本贴下呢?

Hi @clair,

Thanks for the latest update — great to see Milestone 1 reached, with the feasibility validations completed and the ckb-probe check subcommand shipped (including real outputs and a clear “what’s next” plan).

Funding / milestone-based disbursement (optional)
As noted in the approval post, the remaining 80% follows a flexible model — you can request funds on-demand during weekly syncs, or receive the balance at project closure.

Now that Milestone 1 is achieved, please let us know whether you’d like to request a milestone-based disbursement at this point. If yes, please include:

the requested amount (in USD equivalent),
the intended use,
and how it maps to upcoming deliverables (e.g. entering Phase 2: RocksDB core probe development, Milestone 2 targets, demo evidence, etc.).
About a text-based monthly summary
We appreciate your high standards and discipline. That said, we generally don’t recommend spending too much time on heavy mid-term summaries, as it can take time away from core delivery.
If you’d like to publish a monthly recap in text, we suggest keeping it lightweight. A more comprehensive summary can be consolidated in the final closure report.

Looking forward to the next update.

Best,
xingtian
On behalf of the Spark Program Committee

Hi @clair

感谢你发布最新周报——很高兴看到你已经达成 里程碑 1:可行性验证完成,并上线了 ckb-probe check 子命令(含真实输出示例与清晰的下一步计划)。

资金 / 阶段性支付(可选)
如审批帖所述,剩余 80% 资金采用灵活模式:可在每周同步时按需申请,也可在结项时统一领取。

目前里程碑 1 已达成,也请你确认一下:是否希望在此阶段申请一笔阶段性支付?如果希望申请,请在周报或评论中补充:

申请金额(按 USD 等值),
资金用途,
与后续交付(例如进入 Phase 2:RocksDB 核心探针开发、里程碑 2 目标、demo 证据等)的对应关系。
关于“文字月总结”
我们非常欣赏你对自己的严格要求与自律节奏。但委员会一贯的建议是:不太建议把大量时间用在中期长篇总结上,以免挤占核心研发与交付时间。
如果你希望以文字形式发布月总结,我们建议尽量轻量化,更完整的总结可以统一放在最终结项报告中。

期待你的下一次更新。

祝好,
行天
代表星火计划委员会

cc @zz_tovarishch , @Hanssen , @yixiu.ckbfans.bit

Hi 行天,您好,

感谢您的回复,也感谢委员会对当前进展的认可。

目前 Milestone 1(可行性验证) 已完成。我希望在这一阶段申请一笔阶段性支付 400 USD(按等值 CKB 支付),用于支持项目继续推进到 Milestone 2,并为后续 Milestone 3 的测试、优化和最终交付做好准备。

申请金额

400 USD

资金用途

这笔阶段性资金将主要用于以下几个方面:

  1. 开发与测试环境持续投入
    继续维持 Linux eBPF 开发环境、CKB 测试网节点运行环境,以及后续 probe 开发所需的测试资源,以及部分开发者补贴。

    Phase 2 / Milestone 2:RocksDB 核心探针开发
    重点完成 ckb-probe rocksdb 的核心能力,包括:

    • RocksDB 五类操作的 uprobe/uretprobe 探针实现
    • 延迟、吞吐、字节数统计
    • log2 延迟直方图
    • 慢操作事件采集与阈值过滤
    • 用户态 RocksDbCollector
    • CLI 展示能力,包括 --slow--histogram--json
    • 基础异常检测(EWMA baseline + threshold alert)
  2. 为 Milestone 3 提前准备所需的开发与验证工作
    在 Milestone 2 完成后,项目还需要继续投入到 Milestone 3 的关键工作中,包括:

    • Docker 可复现验证环境搭建
    • demo 脚本与验收 smoke test 完善
    • 48 小时稳定性测试
    • CPU / RSS / event loss / sync degradation 等性能影响评估
    • 鲁棒性处理(SIGINT/SIGTERM、CKB 退出、资源清理)
    • 文档、演示视频与最终 release 准备

因此,本次申请的 400 USD 不仅对应 Milestone 2 的核心开发,也会帮助我更稳定地衔接后续 Milestone 3 的测试与交付阶段。

与后续交付的对应关系

这笔阶段性支付将直接支撑以下目标落地:

对应 Milestone 2

  • ckb-probe rocksdb 子命令可运行
  • 在真实 CKB 节点上输出 RocksDB 性能数据
  • 支持基础异常检测
  • 形成可展示的阶段性 demo 和输出证据

衔接 Milestone 3

  • 为 Docker 复现环境、48h 稳定性测试、性能评估和最终交付材料提前投入开发资源
  • 降低后期集中推进测试与发布时的资源压力
  • 确保最终版本能更顺利达到 proposal 中定义的验收标准

关于后续剩余资金申请

按照目前的计划,我希望:

  • 本阶段先申请 400 USD
  • 剩余 400 USD 在项目最终开发完成、Milestone 3 交付收尾后再统一申请

也就是说,后续待核心功能、测试验证、文档与最终交付全部完成后,我再申请剩余部分资金。这样也更方便将后续申请与最终成果、demo 证据和结项材料直接对应起来。

当前阶段说明

当前我会继续以核心研发和交付为优先,尽量减少对长篇阶段总结的额外投入;如您建议的那样,月度文字总结会保持轻量,更多完整总结会放入后续阶段报告和最终结项报告中。

感谢委员会提供灵活的阶段性支付方式,也感谢您对项目推进节奏的建议与支持。

祝好,
Clair

1 Like

Hi XingTian,

Thank you for your reply, and also for the committee’s recognition of the current progress.

At this point, Milestone 1 (feasibility validation) has been completed. I would like to request a partial payment of 400 USD (to be paid in equivalent CKB) to support the project’s continued progress toward Milestone 2, while also helping prepare for the later development, testing, and final delivery work required in Milestone 3.

Requested Amount

400 USD

Intended Use of Funds

This partial payment will mainly support the following work:

  1. Ongoing development and testing environment costs
    This includes maintaining the Linux eBPF development environment, CKB testnet node environment, and related resources needed for continued probe development and validation.
  2. Phase 2 / Milestone 2: core RocksDB probe development
    The main focus of this stage is completing the core functionality of ckb-probe rocksdb, including:
    • uprobe/uretprobe implementation for five categories of RocksDB operations
    • latency, throughput, and bytes statistics
    • log2 latency histogram
    • slow operation event collection and threshold-based filtering
    • user-space RocksDbCollector
    • CLI output support, including --slow, --histogram, and --json
    • basic anomaly detection (EWMA baseline + threshold alert)
  3. Preparation for Milestone 3 development and validation
    After Milestone 2, the project will still require substantial work for Milestone 3, including:
    • building a reproducible Docker-based validation environment
    • improving demo scripts and acceptance smoke tests
    • running 48-hour stability tests
    • evaluating overhead and impact, including CPU, RSS, event loss, and sync degradation
    • robustness handling for SIGINT/SIGTERM, CKB exit cases, and resource cleanup
    • preparing documentation, demo videos, and the final release package

So this 400 USD request is not only for the Milestone 2 core implementation, but also to ensure a smoother transition into the Milestone 3 testing and final delivery phase.

Relation to Upcoming Deliverables

This partial payment will directly support the following outcomes:

For Milestone 2

  • ckb-probe rocksdb becomes functional
  • meaningful RocksDB performance metrics can be collected from a real CKB node
  • basic anomaly detection is available
  • a demonstrable intermediate-stage demo and supporting evidence can be provided

In preparation for Milestone 3

  • early investment into the reproducible Docker environment, 48-hour stability testing, performance evaluation, and final delivery materials
  • reduced pressure during the final testing and release stage
  • better alignment with the acceptance criteria defined in the original proposal

Plan for the Remaining Funds

My current plan is:

  • to request 400 USD at this stage
  • and to request the remaining 400 USD only after the final development work is completed and Milestone 3 is fully delivered

In other words, I would prefer to submit the request for the remaining amount only once the core functionality, testing, documentation, and final deliverables are all completed, so that the final payment request can directly correspond to the completed results, demo evidence, and closeout materials.

Current Stage

For now, I will continue prioritizing core development and deliverables. As you suggested, monthly written updates will remain lightweight, while more complete summaries will be included in the later milestone report and the final project report.

Thank you again for the flexible partial payment arrangement, and for your support and suggestions on the project pacing.

Best regards,
Clair

2 Likes

Week 4 周报:RocksDB 核心探针上线 + ckb-probe rocksdb 子命令

周期:2026-04-06 ~ 2026-04-12
作者:clair
项目:ckb-probe — 基于 eBPF 的 CKB 全节点深度可观测性工具


一、本周目标

进入第二阶段(Phase 2 — RocksDB 核心探针开发)。基于 Week 3 验证通过的 15 个 Tier 1 符号,完成 RocksDB 存储层深度追踪,并提前完成原计划在 Week 5 落地的 EWMA 异常检测模块,从而把里程碑 2 的核心代码工作整体收敛到 Week 4:

  1. 五种核心 RocksDB 操作的 uprobe/uretprobe 配对(GET / PUT / WRITE / ITER_NEW / TXN_COMMIT)
  2. 三类 BPF Map — 计数聚合、log2 延迟直方图、超阈值慢操作事件
  3. RocksDbCollector — 用户态采集器,从 Map 读取数据并支持四种展示模式
  4. 输出格式对齐立项书 — 表格 / 直方图 / 慢操作 / JSON 四种模式的视觉规范全部按计划书中落地
  5. EWMA 异常检测(原 Week 5 目标,提前到 Week 4 完成)— 滚动基线 + 5 分钟 warm-up + AVG/P99/绝对 P99 上限三路触发 + Compaction 风暴归因

二、本周交付物

交付项 状态 说明
五对 RocksDB uprobe/uretprobe :white_check_mark: GET / PUT / WRITE / ITER_NEW / TXN_COMMIT 全部挂载成功
OP_STATS 计数聚合 Map :white_check_mark: PerCpuArray,无锁聚合每 op 的 count/total/min/max/bytes
LATENCY_HIST 延迟直方图 Map :white_check_mark: log2 分桶,64 桶覆盖 1ns – 2^63 ns
SLOW_EVENTS 慢操作事件 Map :white_check_mark: PerfEventArray,仅超阈值才输出,常态零开销
RocksDbCollector 用户态采集器 :white_check_mark: 定时轮询、合并 per-CPU 数据、计算 QPS / 百分位 / Bytes/s
ckb-probe rocksdb 子命令 :white_check_mark: 表格 / --histogram / --slow / --json 四种模式
Bytes/s 字节吞吐统计(GET / PUT / TXN_COMMIT 三路) :white_check_mark: GET 走 PinnableSlice 偏移 8 字节读、PUT 走 ctx.arg(5)、TXN_COMMIT 走 per-tid PUT 累加器 snapshot
EWMA 异常检测 + 状态栏 :white_check_mark: 滚动基线、5 min warm-up、AVG×5/P99×3/硬上限三路触发、低 QPS 滑动窗口、抖动期不更新基线
异常归因提示 :white_check_mark: 触发时表格底部输出 ⚠️ ANOMALY DETECTED→ Probable cause: Compaction storm、引导用户运行 --slow
CKB 节点版本自动探测 :white_check_mark: 启动时调用 <binary> --version 写入表头 Node: 字段
BPF verifier 调优 :white_check_mark: log2 全展开、HashMap 限流、saturating_sub 防溢出
真实测试网节点验证 :white_check_mark: CKB v0.204.0 测试网节点上四种模式 + 异常检测稳定运行

Week 4 核心目标(RocksDB 探针 + 四种展示模式 + Bytes/s 字节吞吐 + EWMA 异常检测)全部达成。原计划留给 Week 5 的 EWMA 模块本周已经一并交付。


三、功能演示

3.1 默认模式 — 实时统计表

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428

  ✅ attached rocksdb_get_pinned_cf
  ✅ attached rocksdb_transaction_put_cf
  ✅ attached rocksdb_write
  ✅ attached rocksdb_create_iterator_cf
  ✅ attached rocksdb_transaction_commit

  Monitoring 5 operations on PID 3310428 (threshold: 1000μs, interval: 1s)
  Press Ctrl+C to stop.

╭─────────────── CKB RocksDB Monitor (PID: 3310428) ───────────────╮
│ Uptime: 00:00:04   Sampling: 2s   Node: CKB v0.204.0             │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │   573 │  1407.2 │    49.2 │ 12582.9 │  357.4 KB/s   │
│ PUT        │    87 │     9.0 │     6.1 │    49.2 │   3.2 KB/s    │
│ WRITE      │     2 │    43.9 │    49.2 │    49.2 │       —       │
│ ITER_NEW   │    30 │    32.4 │    24.6 │    49.2 │       —       │
│ TXN_COMMIT │    27 │   114.9 │    98.3 │   196.6 │   3.2 KB/s    │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯
  Status: ⏳ Warming up — Collecting baseline (296s remaining).

每秒刷新一次。表头自动探测出节点版本 CKB v0.204.0,表格下方的 Status: 行实时展示基线学习/正常/告警三态(详见 §3.5)。5 个 op 中 3 个有真实字节吞吐统计

  • GET ~357 KB/s — uretprobe 中 bpf_probe_read_user 直接 peek 返回的 PinnableSlice 在偏移 8 字节处的 size_ 字段。每次 GET 平均 ~624 B(CKB Block / Header / Cell 真实大小)
  • PUT ~3.2 KB/s — entry probe 从 ctx.arg(5) 读取 vlen 寄存器值
  • TXN_COMMIT ~3.2 KB/s — per-tid PUT_PENDING_BYTES 累加器在 commit 入口 snapshot 后清零;与 PUT 行的 3.2 KB/s 严格相等,正是「每个事务的所有 PUT 在 commit 时被打包结算」的预期行为,是端到端正确性的强信号
  • WRITE / ITER_NEWrocksdb_writeWriteBatch 内部 std::string rep_ 偏移依赖 RocksDB 版本与 libstdc++ 布局,过于脆弱故跳过;ITER_NEW 本身无 payload 概念

3.2 --histogram 模式 — log2 分桶延迟分布

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --histogram

╭─────────────── CKB RocksDB Monitor (PID: 3310428) ───────────────╮
│ Uptime: 00:00:02   Sampling: 2s   Node: CKB v0.204.0             │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │   524 │  1503.4 │    49.2 │ 12582.9 │       —       │
│ PUT        │    76 │     9.5 │     6.1 │    49.2 │   2.5 KB/s    │
│ WRITE      │     2 │    55.2 │    49.2 │    49.2 │       —       │
│ ITER_NEW   │    28 │    30.0 │    24.6 │    98.3 │       —       │
│ TXN_COMMIT │    25 │    98.8 │    98.3 │   196.6 │       —       │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯

  GET latency distribution:
         2μs │██                                       16
         4μs │█████████                                54
         8μs │███████████████                          92
        16μs │████████████████████████████████████████ 239
        32μs │███████████████████████████████████████  235
        65μs │█████████████████████████                155
       131μs │█                                        9
       262μs │                                         4
       524μs │
         1ms │                                         4
         2ms │███████                                  46
         4ms │█████████████████████████                152
         8ms │██████                                   41
        16ms │                                         1

  PUT latency distribution:
         4μs │████████████████████████████████████████ 92
         8μs │█████████████████████                    50
        16μs │███                                      9
        32μs │                                         2

  WRITE latency distribution:
        32μs │████████████████████████████████████████ 4

  ITER_NEW latency distribution:
        16μs │████████████████████████████████████████ 47
        32μs │██████                                   8
        65μs │                                         1

  TXN_COMMIT latency distribution:
        65μs │████████████████████████████████████████ 50
       131μs │                                         1

GET 的延迟分布是明显的双峰

  • 第一峰位于 ~16-65μs(命中 Block Cache 或 Bloom Filter 快速路径)
  • 第二峰位于 ~2-8ms(疑似 Block Cache miss + 磁盘读 / 大 SST 文件查找)

聚合统计的 Avg=1503μs 完全无法揭示这一双峰结构 — 直方图模式正是为捕捉长尾真实形状而存在。后续 Compaction 风暴诊断、Block Cache 容量调优都将依赖这个视图。

3.3 --slow 模式 — 慢操作 box 表格

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --slow --threshold 5000

╭───────────────── Slow Operations (threshold: 5000μs) ──────────────────╮
│ Timestamp     │ Op         │   Latency │     Size │ Note               │
├───────────────┼────────────┼───────────┼──────────┼────────────────────┤
│ 34:58.824     │ GET        │   6,046μs │   1.1 KB │                    │
│ 34:58.832     │ GET        │   7,641μs │     52 B │                    │
│ 34:58.839     │ GET        │   7,264μs │     52 B │                    │
│ 34:58.850     │ GET        │  10,692μs │     52 B │                    │
│ 34:58.864     │ GET        │   8,715μs │     52 B │                    │
│ 34:58.872     │ GET        │   7,291μs │     32 B │                    │
│ 34:58.877     │ GET        │   5,476μs │    240 B │                    │
│ 34:58.900     │ GET        │   7,280μs │     52 B │                    │
╰───────────────┴────────────┴───────────┴──────────┴────────────────────╯
  Showing 8 of 268 slow operations in last 3s.

--threshold 可调(默认 1000μs),仅延迟超过阈值的事件经 PerfEventArray 推送到用户态,常态零开销。表格保留最近 8 条,底部 Showing N of M in last Ks 展示窗口内累计计数。

Size 列直接揭露慢操作背后的实际数据量 — 上面这一段连续的 GET 慢请求几乎全是 32-52 B 的小读(CKB header hash / cell metadata),延迟却普遍在 5-10 ms 量级,说明慢的不是数据搬运而是 Block Cache miss → 磁盘 SST 查找,而非大对象读取。这是单凭聚合统计无法得到的结论。所有慢请求又集中在同一个 chain-service worker 线程,可作为后续锁竞争 / I/O 瓶颈定位的入口。

3.4 --json 模式 — 机器可读输出

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --json

{
  "operations": {
    "GET": {
      "avg_us": 1684.68,
      "bytes_per_sec": 309017,
      "p50_us": 49.15,
      "p99_us": 12582.91,
      "qps": 496
    },
    "PUT": {
      "avg_us": 9.53,
      "bytes_per_sec": 2542,
      "p50_us": 6.14,
      "p99_us": 24.58,
      "qps": 78
    },
    "WRITE": {
      "avg_us": 44.55,
      "bytes_per_sec": null,
      "p50_us": 49.15,
      "p99_us": 98.3,
      "qps": 4
    },
    "ITER_NEW": {
      "avg_us": 29.62,
      "bytes_per_sec": null,
      "p50_us": 24.58,
      "p99_us": 49.15,
      "qps": 30
    },
    "TXN_COMMIT": {
      "avg_us": 108.7,
      "bytes_per_sec": 2542,
      "p50_us": 98.3,
      "p99_us": 196.61,
      "qps": 26
    }
  },
  "anomalies": [],
  "pid": 3310428,
  "timestamp": "2026-04-10T13:07:23Z",
  "uptime_secs": 2
}

每周期一行 JSON,可直接 | jq 处理或喂给 Prometheus / ELK / Grafana 等下游管线。
关键字段:

  • timestamp — ISO8601 UTC 时间戳
  • bytes_per_sec — 字节吞吐,未追踪的 op 显式为 null
  • operations{} — 五个 op 的 QPS / Avg / P50 / P99 / Bytes/s 聚合数据
  • anomalies[] — 当前周期内 EWMA 检测到的延迟尖峰事件,平稳期为空数组,触发时每条含 operation / trigger / current_avg_us / baseline_avg_us / multiplier / current_p99_us / baseline_p99_us,下游可直接对接告警管线

3.5 EWMA 异常检测 — 状态栏三态 + Compaction 风暴归因

异常检测始终在线:默认表格每个采样周期都会在底部输出一行 Status:,三种状态:

  Status: ⏳ Warming up — Collecting baseline (174s remaining).

启动后 5 分钟内为 warm-up 期,逐 op 收集 EWMA 滚动基线(α=0.05),不报警,避免冷启动假阳性。

  Status: ✅ Normal — All latencies within baseline.

平稳期,五个 op 的 avg/P99 都在基线倍数与硬上限以内。

⚠️  ANOMALY DETECTED [13:42:08]
  → GET [P99+CAP]  avg 1842.3μs (base 312.4μs, ×5.9)  p99 68341.2μs (base 12480.0μs)
  → Probable cause: Compaction storm (WRITE P99 = 4.7ms)
  → Run `ckb-probe rocksdb --slow` for slow operation details.

检测到尖峰时,按触发原因(AVG / P99 / CAP 任意组合)打印每个异常 op,若 WRITE P99 > 1 ms 则附带 Compaction 风暴归因,并提示用户切换到 --slow 模式查看具体慢请求。

触发器同时具备冷启动不误报(5 分钟 warm-up)、瞬时抖动不误报(绝对底 50 μs + 抖动期不更新基线)、持续退化不漏报(绝对 P99 硬上限补盲)、低 QPS op 不被静默(5 秒滑动窗口降级)四项性质。算法参数与实现细节请见技术深度报告 docs/technical-deep-dive.md


四、五种监控操作

Op RocksDB 函数 CKB 调用路径 Bytes/s 来源
GET rocksdb_get_pinned_cf 主读路径(Block / Header / Cell 查询) :white_check_mark: uretprobe bpf_probe_read_user(ret + 8) 读 PinnableSlice 的 size_
PUT rocksdb_transaction_put_cf 事务内单条写入 :white_check_mark: entry probe ctx.arg::<usize>(5)vlen 寄存器
WRITE rocksdb_write WriteBatch 原子提交 WriteBatch::rep_std::string 偏移依赖 RocksDB 版本 + libstdc++ ABI,故跳过
ITER_NEW rocksdb_create_iterator_cf 区间扫描入口 — 无 payload 概念
TXN_COMMIT rocksdb_transaction_commit 事务提交 :white_check_mark: per-tid PUT_PENDING_BYTES HashMap 累加器,commit 入口 snapshot 后清零

五个操作覆盖 CKB 节点 RocksDB 的全部主要 I/O 类别,3/5 配套真实字节吞吐统计,支撑同步缓慢、Compaction 风暴、Block Cache 退化等运维诊断场景。


五、当前支持的命令

$ ckb-probe --help

Commands:
  check     Check environment and validate eBPF probes
  symbols   Analyse a CKB binary for uprobe-attachable symbols
  rocksdb   Monitor RocksDB operations on a live CKB node via eBPF
用法 说明
ckb-probe rocksdb --binary ./ckb --pid 1234 默认实时统计表
... --histogram 表格 + log2 延迟分布柱状图
... --slow --threshold 5000 慢操作 box 表格(>5ms)
... --json 每周期一行 JSON,便于管线集成
... --interval 5 自定义采样间隔

六、下周计划(Week 5–6 合并规划)

为什么本周提前完成里程碑 2: 评估 Week 6 原定任务的工作量后判断单周难以收敛——Docker 双容器拓扑(CKB 测试网节点 + ckb-probe sidecar,含 --privileged / /sys/kernel/debug 挂载、BTF 共享、PID 自动发现)、env-check.sh、三个演示脚本、48h 长跑、性能影响量化(CPU / 内存 / 事件丢失 / 同步速度四项)、两个诊断场景案例分析(IBD 写入模式 + Compaction 延迟尖峰)这套交付物,叠加镜像调试与测试网同步本身的耗时,实际所需工时显著超过一周。因此本周把原计划放在 Week 5 的 EWMA 异常检测一并做完,把里程碑 2 的代码工作整体前移到 Week 4 收口,腾出 Week 5 + Week 6 两整周 用来推进第三阶段。

Week 5 计划:

  1. 中期报告整理与提交 — Week 2-4 全部交付物的文档汇总:架构图、Map/probe 设计决策、OpStats/SlowEvent/EWMA 三套数据结构说明、可复现验证步骤、--slow/--histogram 真实采样案例分析
  2. CLI 输出润色(基于 clap) — 用 clap 的 derive API 重构参数定义,补齐 #[command(about / long_about / after_help)] 帮助文案、#[arg(value_name / help / default_value)] 元数据与子命令示例;配套 Ctrl+C 退出时清屏复位、窄终端下的降级渲染、错误退出码规范化
  3. Docker 可复现环境搭建(Week 6 任务前移)docker-compose.yml 双容器拓扑(CKB 测试网节点 + ckb-probe sidecar)、env-check.sh 一键自检、三个端到端演示脚本(默认表格 / 直方图 / 慢操作)
  4. 48h 采集报告收集代码逻辑落地 — 新增长跑采集模块(暂定 ckb-probe rocksdb --record <dir> 或独立 collector 二进制),周期性把 OP_STATS / LATENCY_HIST / SLOW_EVENTS / EWMA anomalies[] 落盘成时间序列文件(JSONL 或 Parquet 二选一),同步采样宿主侧 CPU / 内存(/proc/<pid>/stat + status)、事件丢失率(PerfEventArray lost counter)、CKB 同步速度(RPC get_tip_block_number 差分)四项指标,并提供一个简易聚合脚本生成稳定性报告所需的时序数据集
  5. 48h 稳定性测试启动 — Docker 环境 + 采集模块就绪后立即挂起长跑,所有数据通过上述采集逻辑自动落盘,为 Week 6 的报告整理提供可直接喂给绘图脚本的原始数据

Week 6 计划(原计划任务 + Week 5 溢出项):

  1. 48h 稳定性测试收尾与数据整理 — 时序图表、资源消耗汇总、事件保真度报告、延迟分布图表
  2. 两个 RocksDB 诊断场景案例分析 — IBD 写入模式分析 + Compaction 延迟尖峰捕获,作为 Week 7 稳定性报告的核心案例
  3. 针对性优化与健壮性加固 — 根据稳定性测试结果做 CPU / 内存 / 事件丢失三项的针对性优化

里程碑 2 目标已全部达成:ckb-probe rocksdb 在测试网节点上稳定运行,输出有意义的 RocksDB 性能数据,含 EWMA 异常检测与 Compaction 风暴归因。通过 Week 4 的提前收口,Week 5 与 Week 6 合并为完整两周来吃下原定 Week 6 的复杂交付物,避免单周排期失败的风险。

4 Likes

Week 4 Progress Report: RocksDB Core Probes Live + ckb-probe rocksdb Subcommand

Period: 2026-04-06 ~ 2026-04-12
Author: Clair
Project: ckb-probe — an eBPF-based deep observability tool for CKB full nodes


1. Goals for This Week

This week marked the start of Phase 2 — RocksDB core probe development.
Based on the 15 Tier 1 symbols validated in Week 3, the goal was to complete deep tracing for the RocksDB storage layer and also bring forward the originally planned Week 5 EWMA anomaly detection module, so that the core implementation work for Milestone 2 could be largely completed within Week 4.

The target scope for this week included:

  • Paired uprobe / uretprobe instrumentation for five core RocksDB operations
    (GET / PUT / WRITE / ITER_NEW / TXN_COMMIT)
  • Three BPF map types:
    • counter aggregation
    • log2 latency histograms
    • threshold-triggered slow-operation events
  • RocksDbCollector user-space collector, reading from BPF maps and supporting four display modes
  • Output formats aligned with the original proposal:
    • table
    • histogram
    • slow-operation view
    • JSON
  • EWMA anomaly detection
    (originally planned for Week 5, completed early in Week 4):
    • rolling baseline
    • 5-minute warm-up
    • triple-trigger logic based on AVG / P99 / absolute P99 cap
    • compaction storm attribution

2. Deliverables Completed This Week

Deliverable Status Notes
Five paired RocksDB uprobe/uretprobe hooks :white_check_mark: Successfully attached for GET / PUT / WRITE / ITER_NEW / TXN_COMMIT
OP_STATS counter aggregation map :white_check_mark: PerCpuArray for lock-free aggregation of per-op count / total / min / max / bytes
LATENCY_HIST latency histogram map :white_check_mark: log2 buckets, 64 buckets covering 1ns – 2^63 ns
SLOW_EVENTS slow-operation event map :white_check_mark: PerfEventArray; only emits events above threshold, zero overhead in the normal case
RocksDbCollector user-space collector :white_check_mark: Periodic polling, per-CPU merge, QPS / percentile / bytes-per-second calculation
ckb-probe rocksdb subcommand :white_check_mark: Supports table / --histogram / --slow / --json
Bytes/s throughput stats (GET / PUT / TXN_COMMIT) :white_check_mark: GET: reads PinnableSlice size at offset 8; PUT: reads ctx.arg(5); TXN_COMMIT: snapshots per-TID PUT accumulator
EWMA anomaly detection + status line :white_check_mark: Rolling baseline, 5-minute warm-up, AVG×5 / P99×3 / hard cap triggers, low-QPS downgrade window, no baseline update during jitter
Anomaly attribution hints :white_check_mark: On trigger, prints ⚠️ ANOMALY DETECTED, → Probable cause: Compaction storm, and suggests running --slow
Automatic CKB node version detection :white_check_mark: Calls <binary> --version on startup and renders it in the header
BPF verifier tuning :white_check_mark: Fully unrolled log2 logic, HashMap rate limiting, saturating_sub for overflow safety
Validation on a real testnet node :white_check_mark: All four display modes plus anomaly detection ran stably on a CKB v0.204.0 testnet node

All Week 4 core goals were completed: RocksDB probes, four display modes, bytes-per-second throughput accounting, and EWMA anomaly detection.
The EWMA module that had originally been planned for Week 5 was also delivered this week.


3. Feature Demonstration

3.1 Default Mode — Real-Time Summary Table

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428
  ✅ attached rocksdb_get_pinned_cf
  ✅ attached rocksdb_transaction_put_cf
  ✅ attached rocksdb_write
  ✅ attached rocksdb_create_iterator_cf
  ✅ attached rocksdb_transaction_commit

  Monitoring 5 operations on PID 3310428 (threshold: 1000μs, interval: 1s)
  Press Ctrl+C to stop.

╭─────────────── CKB RocksDB Monitor (PID: 3310428) ───────────────╮
│ Uptime: 00:00:04   Sampling: 2s   Node: CKB v0.204.0             │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │   573 │  1407.2 │    49.2 │ 12582.9 │  357.4 KB/s   │
│ PUT        │    87 │     9.0 │     6.1 │    49.2 │   3.2 KB/s    │
│ WRITE      │     2 │    43.9 │    49.2 │    49.2 │       —       │
│ ITER_NEW   │    30 │    32.4 │    24.6 │    49.2 │       —       │
│ TXN_COMMIT │    27 │   114.9 │    98.3 │   196.6 │   3.2 KB/s    │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯
  Status: ⏳ Warming up — Collecting baseline (296s remaining).

The table refreshes once per second. The header automatically detects and displays the node version (CKB v0.204.0), while the Status: line below the table shows the current anomaly-detection state in real time (baseline learning / normal / alert; see §3.5).

Among the five monitored operations, three have real bytes-per-second accounting:

  • GET ~357 KB/s
    In the uretprobe, bpf_probe_read_user peeks at the size_ field of the returned PinnableSlice at offset 8.
    Each GET returns about 624 B on average, which is consistent with real CKB payload sizes such as blocks, headers, and cells.
  • PUT ~3.2 KB/s
    The entry probe reads the vlen register value via ctx.arg(5).
  • TXN_COMMIT ~3.2 KB/s
    A per-TID PUT_PENDING_BYTES accumulator is snapshotted and reset at commit entry.
    Its throughput matches the PUT row exactly, which is the expected behavior: all PUTs within a transaction are “settled” together at commit time. This is a strong end-to-end correctness signal.

For the remaining two operations:

  • WRITE / ITER_NEW
    rocksdb_write internally stores batch payload in WriteBatch::rep_, but the std::string layout offset depends on the RocksDB version and the libstdc++ ABI, making it too fragile to trace reliably, so bytes accounting is intentionally skipped.
    ITER_NEW does not have a meaningful payload concept.

3.2 --histogram Mode — log2 Latency Distribution

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --histogram
╭─────────────── CKB RocksDB Monitor (PID: 3310428) ───────────────╮
│ Uptime: 00:00:02   Sampling: 2s   Node: CKB v0.204.0             │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │   524 │  1503.4 │    49.2 │ 12582.9 │       —       │
│ PUT        │    76 │     9.5 │     6.1 │    49.2 │   2.5 KB/s    │
│ WRITE      │     2 │    55.2 │    49.2 │    49.2 │       —       │
│ ITER_NEW   │    28 │    30.0 │    24.6 │    98.3 │       —       │
│ TXN_COMMIT │    25 │    98.8 │    98.3 │   196.6 │       —       │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯

  GET latency distribution:
         2μs │██                                       16
         4μs │█████████                                54
         8μs │███████████████                          92
        16μs │████████████████████████████████████████ 239
        32μs │███████████████████████████████████████  235
        65μs │█████████████████████████                155
       131μs │█                                        9
       262μs │                                         4
       524μs │
         1ms │                                         4
         2ms │███████                                  46
         4ms │█████████████████████████                152
         8ms │██████                                   41
        16ms │                                         1

  PUT latency distribution:
         4μs │████████████████████████████████████████ 92
         8μs │█████████████████████                    50
        16μs │███                                      9
        32μs │                                         2

  WRITE latency distribution:
        32μs │████████████████████████████████████████ 4

  ITER_NEW latency distribution:
        16μs │████████████████████████████████████████ 47
        32μs │██████                                   8
        65μs │                                         1

  TXN_COMMIT latency distribution:
        65μs │████████████████████████████████████████ 50
       131μs │                                         1

The GET latency distribution is clearly bimodal:

  • The first peak is around 16–65 μs, likely corresponding to a fast path such as block-cache or Bloom-filter hits.
  • The second peak is around 2–8 ms, likely indicating block-cache misses followed by disk reads and/or larger SST lookups.

The aggregate statistic (Avg = 1503 μs) cannot reveal this structure at all.
This is precisely why the histogram mode exists: to expose the true shape of the latency tail.

This view will be especially important for later diagnosis of:

  • compaction storms
  • block-cache sizing issues
  • read-path degradation

3.3 --slow Mode — Slow Operation Table

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --slow --threshold 5000
╭───────────────── Slow Operations (threshold: 5000μs) ──────────────────╮
│ Timestamp     │ Op         │   Latency │     Size │ Note               │
├───────────────┼────────────┼───────────┼──────────┼────────────────────┤
│ 34:58.824     │ GET        │   6,046μs │   1.1 KB │                    │
│ 34:58.832     │ GET        │   7,641μs │     52 B │                    │
│ 34:58.839     │ GET        │   7,264μs │     52 B │                    │
│ 34:58.850     │ GET        │  10,692μs │     52 B │                    │
│ 34:58.864     │ GET        │   8,715μs │     52 B │                    │
│ 34:58.872     │ GET        │   7,291μs │     32 B │                    │
│ 34:58.877     │ GET        │   5,476μs │    240 B │                    │
│ 34:58.900     │ GET        │   7,280μs │     52 B │                    │
╰───────────────┴────────────┴───────────┴──────────┴────────────────────┘
  Showing 8 of 268 slow operations in last 3s.

The threshold is configurable (--threshold, default 1000μs).
Only operations whose latency exceeds the threshold are pushed to user space via PerfEventArray, which means zero overhead during normal operation.

The table retains the latest 8 entries, while the footer shows the total count accumulated within the recent time window.

The Size column is especially useful because it directly reveals the payload behind a slow operation. In the example above, most of the slow GETs are very small reads (32–52 B), yet they take 5–10 ms. That strongly suggests the latency comes from block-cache miss → SST lookup / disk access, not from moving a large object.

This conclusion would not be possible from aggregate statistics alone.

In addition, these slow requests are concentrated on the same chain-service worker thread, which provides an entry point for follow-up diagnosis of lock contention or I/O bottlenecks.


3.4 --json Mode — Machine-Readable Output

$ sudo ckb-probe rocksdb --binary /root/ckb --pid 3310428 --json
{
  "operations": {
    "GET": {
      "avg_us": 1684.68,
      "bytes_per_sec": 309017,
      "p50_us": 49.15,
      "p99_us": 12582.91,
      "qps": 496
    },
    "PUT": {
      "avg_us": 9.53,
      "bytes_per_sec": 2542,
      "p50_us": 6.14,
      "p99_us": 24.58,
      "qps": 78
    },
    "WRITE": {
      "avg_us": 44.55,
      "bytes_per_sec": null,
      "p50_us": 49.15,
      "p99_us": 98.3,
      "qps": 4
    },
    "ITER_NEW": {
      "avg_us": 29.62,
      "bytes_per_sec": null,
      "p50_us": 24.58,
      "p99_us": 49.15,
      "qps": 30
    },
    "TXN_COMMIT": {
      "avg_us": 108.7,
      "bytes_per_sec": 2542,
      "p50_us": 98.3,
      "p99_us": 196.61,
      "qps": 26
    }
  },
  "anomalies": [],
  "pid": 3310428,
  "timestamp": "2026-04-10T13:07:23Z",
  "uptime_secs": 2
}

One JSON object is emitted per sampling cycle, making it easy to pipe into tools such as jq or downstream systems like Prometheus, ELK, or Grafana.

Key fields:

  • timestamp — ISO8601 UTC timestamp
  • bytes_per_sec — throughput in bytes/sec; explicitly null for operations that are not instrumented for payload size
  • operations{} — aggregated metrics for all five operations: QPS / Avg / P50 / P99 / Bytes/s
  • anomalies[] — EWMA-detected latency spikes in the current cycle; empty during stable periods.
    When triggered, each anomaly entry includes:
    • operation
    • trigger
    • current_avg_us
    • baseline_avg_us
    • multiplier
    • current_p99_us
    • baseline_p99_us

This structure can be fed directly into alerting or observability pipelines.


3.5 EWMA Anomaly Detection — Three Status States + Compaction Storm Attribution

Anomaly detection is always enabled.
In default table mode, each sampling cycle prints a Status: line at the bottom. There are three possible states:

Warm-up

Status: ⏳ Warming up — Collecting baseline (174s remaining).

For the first 5 minutes after startup, the tool stays in a warm-up period and builds a per-operation EWMA rolling baseline (α = 0.05). No alerts are emitted during this stage, avoiding cold-start false positives.

Normal

Status: ✅ Normal — All latencies within baseline.

This indicates that all five operations remain within both the baseline-derived thresholds and the absolute caps.

Alert

⚠️  ANOMALY DETECTED [13:42:08]
  → GET [P99+CAP]  avg 1842.3μs (base 312.4μs, ×5.9)  p99 68341.2μs (base 12480.0μs)
  → Probable cause: Compaction storm (WRITE P99 = 4.7ms)
  → Run `ckb-probe rocksdb --slow` for slow operation details.

When a spike is detected, the tool prints:

  • the affected operation
  • the trigger reason (AVG / P99 / CAP, or any combination)
  • current vs baseline values
  • a probable root cause hint

If WRITE P99 > 1 ms, the alert additionally reports:

  • → Probable cause: Compaction storm

and suggests switching to --slow mode for detailed slow-request evidence.

The detector is designed to satisfy four practical goals simultaneously:

  • No false positives at startup
    due to the 5-minute warm-up
  • No false positives from transient jitter
    due to the absolute floor (50 μs) and the rule that the baseline is not updated during jitter spikes
  • No missed sustained degradation
    due to the absolute P99 hard-cap trigger
  • Low-QPS operations are still observable
    due to a 5-second downgraded sliding-window fallback

Algorithm parameters and implementation details are documented in docs/technical-deep-dive.md.


4. The Five Monitored Operations

Op RocksDB Function CKB Call Path Bytes/s Source
GET rocksdb_get_pinned_cf Main read path (block / header / cell lookup) :white_check_mark: uretprobe reads PinnableSlice::size_ via bpf_probe_read_user(ret + 8)
PUT rocksdb_transaction_put_cf Single write inside a transaction :white_check_mark: entry probe reads vlen from ctx.arg::<usize>(5)
WRITE rocksdb_write Atomic WriteBatch commit — skipped because WriteBatch::rep_ depends on RocksDB version and libstdc++ ABI layout
ITER_NEW rocksdb_create_iterator_cf Range-scan entry point — no meaningful payload concept
TXN_COMMIT rocksdb_transaction_commit Transaction commit :white_check_mark: per-TID PUT_PENDING_BYTES accumulator snapshotted and reset on commit entry

These five operations cover the main RocksDB I/O categories in a CKB node.
Three out of five also provide real bytes-per-second accounting, which is already sufficient to support operational diagnosis for scenarios such as:

  • slow synchronization
  • compaction storms
  • block-cache regressions

5. Currently Supported Commands

$ ckb-probe --help
Commands:
  check     Check environment and validate eBPF probes
  symbols   Analyse a CKB binary for uprobe-attachable symbols
  rocksdb   Monitor RocksDB operations on a live CKB node via eBPF
Usage Description
ckb-probe rocksdb --binary ./ckb --pid 1234 Default real-time summary table
... --histogram Table + log2 latency histogram
... --slow --threshold 5000 Slow-operation table for operations above 5 ms
... --json One JSON object per cycle for pipeline integration
... --interval 5 Custom sampling interval

6. Plan for Next Week (Merged Week 5–6 Planning)

Why Milestone 2 Was Effectively Completed Early This Week

After evaluating the amount of work originally scheduled for Week 6, I concluded that it would be difficult to finish within a single week.

The planned deliverables include:

  • a Docker dual-container topology
    (CKB testnet node + ckb-probe sidecar, including --privileged, /sys/kernel/debug mount, shared BTF, and automatic PID discovery)
  • env-check.sh
  • three demo scripts
  • a 48-hour endurance run
  • performance impact measurements across four dimensions:
    • CPU
    • memory
    • event loss
    • synchronization speed
  • two diagnostic case studies:
    • IBD write-pattern analysis
    • compaction latency spike analysis

Once Docker image debugging and testnet synchronization time are also factored in, the real effort is clearly more than one week.

For that reason, I moved the originally planned Week 5 EWMA anomaly detection task forward and completed it in Week 4, effectively closing out the Milestone 2 coding work one week earlier. This frees up two full weeks (Week 5 + Week 6) for the more complex Phase 3 deliverables.


Week 5 Plan

  • Midterm report preparation and submission
    Consolidate all deliverables from Weeks 2–4, including:

    • architecture diagrams
    • probe/map design decisions
    • explanations of the OpStats, SlowEvent, and EWMA data structures
    • reproducible validation steps
    • real sample analyses from --slow and --histogram
  • CLI polish (based on clap)
    Refactor parameter definitions using clap derive API, complete:

    • #[command(about / long_about / after_help)]
    • #[arg(value_name / help / default_value)]
    • subcommand examples
      Also improve:
    • Ctrl+C terminal reset behavior
    • degraded rendering for narrow terminals
    • standardized error exit codes
  • Docker reproducible environment setup
    (moved forward from the original Week 6 plan)
    Build:

    • docker-compose.yml dual-container topology
      (CKB testnet node + ckb-probe sidecar)
    • one-click env-check.sh
    • three end-to-end demo scripts
      (default table / histogram / slow-op mode)
  • Implement 48h recording/report collection logic
    Add a long-run recording mode, tentatively:

    • ckb-probe rocksdb --record <dir>
      or a separate collector binary
      It will periodically persist:
    • OP_STATS
    • LATENCY_HIST
    • SLOW_EVENTS
    • EWMA anomalies[]
      into time-series files (JSONL or Parquet, TBD)

    In parallel, it will also sample:

    • host-side CPU and memory
      (/proc/<pid>/stat + status)
    • event loss rate
      (PerfEventArray lost counter)
    • CKB synchronization speed
      (differential get_tip_block_number via RPC)

    A lightweight aggregation script will also be provided to prepare the time-series dataset needed for the stability report.

  • Launch the 48-hour endurance test
    As soon as the Docker environment and recording logic are ready, start the long run immediately and persist all data automatically, so that Week 6 can focus on analysis and reporting rather than data collection.


Week 6 Plan

(original tasks plus any overflow from Week 5)

  • Complete the 48-hour endurance run and organize results
    • time-series charts
    • resource consumption summary
    • event fidelity report
    • latency distribution visualizations
  • Two RocksDB diagnostic case studies
    • IBD write-pattern analysis
    • compaction latency spike capture
  • Targeted optimization and hardening
    • tune CPU overhead
    • reduce memory overhead
    • reduce event loss
    • strengthen robustness based on endurance-test findings

7. Current Status Summary

All Milestone 2 objectives have now been achieved:

  • ckb-probe rocksdb runs stably on a testnet node
  • it outputs meaningful RocksDB performance data
  • it includes EWMA-based anomaly detection
  • it provides compaction storm attribution

By closing out the core coding work in Week 4, Weeks 5 and 6 can now be treated as a dedicated two-week window for the heavier Phase 3 deliverables, reducing the risk of schedule failure from trying to compress them into a single week.

4 Likes