Week 5 Report: Performance Optimization, Docker Environment, CLI Polish, and Collection Framework
Period: 2026-04-13 to 2026-04-19
Author: Clair
Project:ckb-probe— an eBPF-based deep observability tool for CKB full nodes
1. Goals for This Week
- Polish CLI output based on
clap - Build a reproducible Docker environment
- Implement the 48-hour collection/reporting logic and prepare to launch the 48-hour stability test
- Execute and optimize P-1 to P-4 performance tests
2. Completed Work
| Deliverable | Status | Notes |
|---|---|---|
CLI clap polish |
Added help text, examples, and standardized exit codes for all three subcommands | |
| Single-container Docker environment | Three-stage Dockerfile, 6 demo scripts, and env-check.sh |
|
| 48-hour collection scripts | stability-48h.sh + generate-report.sh, with 3 parallel ckb-probe instances; launch scheduled within the week |
|
| Performance optimization (P-2) | Perf buffer reduced from 1024 to 16 pages/CPU; RSS reduced from 87.9 MB to 21.9 MB | |
| BPF map optimization | HashMap max_entries reduced from 10240 to 1024 |
|
| RingBuf refactor | Migrated SLOW_EVENTS from PerfEventArray to RingBuf |
|
| S-4 process restart recovery | Automatically detects CKB exit, polls for new PID, and reconnects | |
| P-1 to P-4 performance testing | Executed inside Docker with two fresh IBD runs for comparison | |
| CI configuration | Added build, lint, script check, and CKB version compatibility checks |
3. CLI Output Polish
3.1 Refactoring with clap derive API
Using clap’s #[command] and #[arg] attributes, the following were completed for all three subcommands:
about/long_about— command summary and detailed descriptionafter_help— usage examples and exit code notesvalue_name— parameter placeholders (PATH/PID/MICROSECONDS)help— help text for every argument
$ ckb-probe --help
ckb-probe uses eBPF (uprobe / kprobe / tracepoint) to deliver
application-semantic, real-time performance insights for CKB full nodes.
Usage: ckb-probe <COMMAND>
Commands:
check Check environment and validate eBPF probes
symbols Analyse a CKB binary for uprobe-attachable symbols
rocksdb Monitor RocksDB operations on a live CKB node via eBPF
EXAMPLES:
sudo ckb-probe check --binary ./ckb --pid $(pgrep -x ckb)
ckb-probe symbols ./ckb --json
sudo ckb-probe rocksdb --binary ./ckb --pid $(pgrep -x ckb)
3.2 Standardized Exit Codes
| Exit Code | Meaning |
|---|---|
| 0 | Normal exit, including Ctrl+C |
| 1 | Runtime error / target process exited |
| 2 | Invalid arguments (clap default) |
3.3 Cursor Restoration on Ctrl+C
On exit, \x1B[?25h is emitted to restore the cursor visibility, since TUI mode may hide it.
4. Reproducible Docker Environment
4.1 Two-Stage Dockerfile
The image does not include the CKB binary. Instead, the CKB binary is mounted from the host at runtime using bind mount (-v /path/to/ckb:/path/to/ckb:ro).
Stage 1 — FROM rust:latest AS probe-builder
- Install build dependencies:
clang,llvm,libelf-dev,zlib1g-dev,pkg-config - Install
bpf-linker(cargo install) + nightly toolchain +rust-srccomponent (required for eBPF compilation) - Copy source code (
Cargo.toml,Cargo.lock,.cargo/,xtask/,ckb-probe/,ckb-probe-common/,ckb-probe-ebpf/) - Build eBPF:
cargo xtask build-ebpf --release - Build userspace CLI:
cargo build --release -p ckb-probe - Build
db_benchfrom RocksDB v9.10.0 source after installinglibgflags-dev,libsnappy-dev,liblz4-dev, andlibzstd-dev, then runmake -j db_bench
Stage 2 — FROM ubuntu:24.04 (runtime)
- Install runtime tools:
bash,sysstat,curl,jq,procps,tar,gzip,zstd,unzip,coreutils,grep,sed,gawk,iproute2,lsof,ca-certificates - Install
db_benchruntime dependencies:libgflags2.2,libsnappy1v5,liblz4-1,libzstd1 - Copy from Stage 1:
ckb-probe→/usr/local/bin/ckb-probeckb-probe-ebpfELF →/opt/ckb-probe-ebpf/target/bpfel-unknown-none/release/ckb-probe-ebpfdb_bench→/usr/local/bin/db_bench
- Copy all scripts to
/opt/scripts/under four subdirectories:perf,demo,case, andstability - Copy the entrypoint dispatcher
/entrypoint.shandckb.toml.aggressive - Set
WORKDIR /optbecauseckb-probelocates the eBPF ELF via a relative path - Declare
VOLUME ["/data", "/tmp/perf-run"]so data is mounted from the host - Set
ENTRYPOINT ["/entrypoint.sh"]andCMD ["help"]so the container shows help by default
4.2 Six Demo Scripts
| Script | Function | Validation Result |
|---|---|---|
demo-check |
Environment, symbols, and eBPF validation | |
demo-table |
Default table mode | GET / PUT / TXN_COMMIT data |
demo-histogram |
Latency distribution histogram | GET distribution |
demo-slow |
Slow operation capture | |
demo-normal |
JSON monitoring output | |
demo-stress |
db_bench stress injection |
4.3 env-check.sh
Checks six host-side prerequisites: kernel version, Docker, RAM, disk space, BTF, and BPF config.
5. Performance Optimization
5.1 P-2 Memory Optimization
Problem: perf_array.open(cpu_id, Some(1024)) allocated 1024 pages (4 MB) of perf ring buffer per CPU. With 24 CPUs, that meant 24 × 4 MB = 96 MB, and 87.9 MB RSS in practice, far exceeding the 50 MB budget.
Fix: Reduced it to 16 pages (64 KB) per CPU. Under an extreme workload of 13K events/sec, 64 KB still provides more than 15 ms of buffering headroom.
Result: RSS dropped from 87.9 MB to 21.9 MB and remained stable with no growth.
5.2 BPF Map Capacity Optimization
Reduced max_entries from 10240 to 1024 for the three HashMaps: UPROBE_START, TCP_START, and PUT_PENDING_BYTES. Since CKB uses around 100 threads, 1024 still provides roughly 10× headroom.
5.3 Replacing PerfEventArray with RingBuf
SLOW_EVENTS was migrated from PerfEventArray to RingBuf (256 KB):
PerfEventArray |
RingBuf |
|
|---|---|---|
| Buffer layout | Per-CPU (24 independent ring buffers) | One shared buffer across all CPUs |
| Wake-up method | One epoll wake-up per event | Userspace polling every 50 ms |
| Overhead at 10K events/sec | ~10K context switches | ~20 polls |
| CPU impact | +5% (threshold=1) |
+1.16% (threshold=1000) |
6. 48-Hour Collection Framework
6.1 stability-48h.sh
Three ckb-probe instances run in parallel:
| Instance | Mode | Collected Data | Resource Statistics |
|---|---|---|---|
| #1 (primary) | --json |
OP_STATS / anomalies |
|
| #2 | --slow --threshold 1000 |
SLOW_EVENTS + BPF loss |
Not counted |
| #3 | --histogram --interval 30 |
Full LATENCY_HIST distribution |
Not counted |
Sampling is performed every 10 seconds, producing the following output files:
| File | Contents |
|---|---|
timeseries.tsv |
Probe CPU% / RSS / CKB CPU% / RSS |
events.tsv |
Per-operation QPS / avg / P50 / P99 / bytes |
tip-sync.tsv |
CKB tip height / delta blocks / blocks per minute |
event-loss.tsv |
Total BPF events / lost events / loss rate |
event-counts-by-op.tsv |
Per-operation QPS for each interval |
slow-events.log |
Raw slow operation output |
histogram.log |
Raw latency histogram output |
6.2 generate-report.sh
Generates a Markdown report from the stability output directory, including:
- S-1 to S-4 evaluation table
- Time-series plots (gnuplot PNG or ASCII) — CPU / RSS / P99 / throughput / tip sync
- Resource usage summary table (Min / Max / Avg / P99)
- Event fidelity analysis (per-operation breakdown + BPF loss + sync speed)
- Latency distribution histogram (log2 buckets + CDF)
- Case 1: IBD write pattern
- Case 2: Compaction / anomaly spikes
- Reproduction instructions
6.3 Permission Checks
At startup, stability-48h.sh verifies root privileges, debugfs mount, and BTF availability. If not run as root, it fails immediately with a sudo hint.
7. S-4 Process Restart Recovery
Implemented in rocksdb.rs and validated on a real CKB node:
Monitoring PID 3310428 → CKB stopped
⚠ Target process (PID 3310428) exited. Waiting for CKB to restart...
✅ CKB restarted (new PID 673651). Reattaching probes...
Monitoring PID 673651 → data resumed seamlessly
Implementation logic:
- A background thread checks
/proc/{pid}once per second - If the process exits, release BPF resources
- Poll
/proc/*/exeto find the same binary - Once a new PID is found, reload the BPF ELF and reattach all uprobes
8. CI Configuration
.github/workflows/ci.yml contains four jobs:
| Job | Trigger | Contents |
|---|---|---|
| build | push / PR | cargo xtask build-ebpf + cargo build --release |
| lint | push / PR | cargo fmt --check + cargo clippy -D warnings |
| scripts | push / PR | bash -n for all .sh files + shellcheck |
| ckb-compat | Every Monday 08:00 UTC | Download the latest binary from the nervosnetwork/ckb GitHub release page, run ckb-probe symbols to verify 5 core symbols, and automatically create an Issue on failure |
9. Deviations from the Original Plan
-
Docker changed from dual-container to single-container
The original plan used adocker-compose.ymldual-container topology (CKB node +ckb-probesidecar). After evaluation, it was changed to a single-container approach: for case study scenarios, a singledocker runcommand is enough, the PID namespace is automatically shared, and reviewer onboarding cost is minimized. A production sidecar mode can be implemented in a later version. -
The 48-hour collection module was implemented in shell scripts instead of a Rust
--recordmodule
The original plan was to add a new Rust subcommand--record <dir>or a standalone collector binary. In practice,stability-48h.sh+generate-report.shwere adopted instead. Three parallelckb-probeinstances (--json,--slow, and--histogram) are used, while shell scripts collect/procmetrics and RPC tip data. The functionality is fully equivalent. The collected data formats (TSV/JSONL) can also be fed directly into gnuplot scripts for visualization. -
48-hour stability test —
Scheduled to start this weekend, with data collection continuing into Week 6.
10. P-1 to P-4 Performance Test Results
This week, a strict comparison was completed using two fresh IBD runs.
Test method:
- Executed inside a Docker container, using the RingBuf data path with
threshold=1000μs - Phase A (
with-probe) and Phase B (baseline) both started from tip = 20,447,628 (diff=0) - Each phase lasted 2 hours, fully covering both the IBD peak period (~31 min) and the steady-state period
Results:
════════════════════════════════════════════════════════════════════════════════
ckb-probe · P-1~P-4 Performance Evaluation Report
Generated: 2026-04-16 07:05
Mode: Docker, RingBuf, threshold=1000μs
Both Phase A and B started from tip=20447628 (diff=0)
Environment: Linux 6.8.0-106-generic, 24 CPU, CKB testnet
════════════════════════════════════════════════════════════════════════════════
The synchronization process can be divided into two periods:
① Peak period (0~31 min): locally cached block data is batch-written into RocksDB
RocksDB operation density is extremely high, and CPU usage saturates multiple cores (~330%)
② Steady-state period (31 min~2 h): waiting for new blocks to be downloaded from the network and then written one by one
RocksDB operations become sparse and are limited by network bandwidth
════════════════════════════════════════════════════════════════════════════════
P-1 Additional CPU usage ≤ 3% (relative)
════════════════════════════════════════════════════════════════════════════════
① Peak period (0~31 min, local batch writes)
baseline : 329.96%
with-probe : 324.92%
relative delta : -1.53%
→ with-probe is slightly lower; probe overhead is within system noise
② Overall (full 2 h)
baseline : 130.58%
with-probe : 133.34%
relative delta : +2.11%
P-1 budget: ≤ 3%
status : ✅ PASS
════════════════════════════════════════════════════════════════════════════════
P-2 ckb-probe RSS ≤ 50 MB (continuous monitoring over 2 h)
════════════════════════════════════════════════════════════════════════════════
samples : 1435 mean : 21.97 MB max : 21.97 MB
P-2 budget: ≤ 50 MB
status : ✅ PASS
════════════════════════════════════════════════════════════════════════════════
P-3 BPF event loss rate < 0.1%
════════════════════════════════════════════════════════════════════════════════
threshold=1000μs (this test): 0 / 78,353 attempted, 0.0000%
threshold=1 extreme stress test (historical): 0 / 29,052,243 events, 0.0000%, peak ~13K/sec
P-3 budget: < 0.1%
status : ✅ PASS
════════════════════════════════════════════════════════════════════════════════
P-4 CKB sync speed degradation < 1%
════════════════════════════════════════════════════════════════════════════════
① Peak period (0~31 min, local batch writes)
baseline : 10827.3 blocks/min (326,095 blocks)
with-probe : 10116.0 blocks/min (304,781 blocks)
degradation : +6.57%
→ Peak-period degradation of 6.57% comes from microscopic interruption by uprobes
to the CKB execution pipeline (cache locality, branch prediction), which does
not show up directly in CPU utilization
② Overall (full 2 h)
baseline : 2800.4 blocks/min (333,299 blocks)
with-probe : 2790.0 blocks/min (332,057 blocks)
degradation : +0.37%
→ Overall 2 h degradation is only 0.37%, well below the 1% budget.
Probe impact is negligible during the steady-state period.
P-4 budget: < 1%
status : ✅ PASS (based on the full 2 h overall result of +0.37%)
════════════════════════════════════════════════════════════════════════════════
Summary
════════════════════════════════════════════════════════════════════════════════
┌───────────┬──────────────────────────────────────┬────────┬────────┐
│ Metric │ Result │ Budget │ Status │
├───────────┼──────────────────────────────────────┼────────┼────────┤
│ P-1 CPU │ -1.53% peak / +2.11% overall 2 h │ ≤ 3% │ ✅ │
│ P-2 RSS │ 21.97 MB (stable, no growth) │ ≤ 50MB │ ✅ │
│ P-3 Loss │ 0/78353 (0.0000%) │ <0.1% │ ✅ │
│ P-4 Deg. │ +6.57% peak / +0.37% overall 2 h │ < 1% │ ✅ │
└───────────┴──────────────────────────────────────┴────────┴────────┘
All four metrics passed.
════════════════════════════════════════════════════════════════════════════════
Key findings:
-
P-1 peak-period CPU: -1.53% —
with-probeactually showed slightly lower CPU usage thanbaseline, indicating that probe overhead is within system noise. The Week 5 optimizations—perf buffer reduction (96 MB → 1 MB), RingBuf refactor, and BPF map capacity reduction—were highly effective. -
P-4 peak-period degradation of 6.57% vs. only 0.37% overall over 2 hours — During the peak period (~10K ops/sec), microscopic interruptions from uprobes affected the CKB pipeline efficiency through cache locality and branch prediction effects. However, this is only significant at extreme operation density. During steady-state operation, probe impact is negligible, and the overall 2-hour degradation of 0.37% is far below the 1% budget.
-
P-2 RSS remained stable at 21.97 MB — Continuous monitoring over 2 hours showed no growth, confirming that the Week 5 memory optimization (87.9 MB → 22 MB) was effective.
-
P-3 zero loss — At
threshold=1000μs, 78K events were captured with zero loss; in the historical extreme stress test withthreshold=1(29M events at 13K/sec), loss remained zero as well.
10.1 Launch of the 48-Hour Stability Test
Immediately after the P-1 to P-4 tests, the 48-hour stability test was launched:
-
Command used (single-container, detached background mode):
docker run -d --name stability-test \ --privileged --pid host --network host \ -v /sys/kernel/debug:/sys/kernel/debug:ro \ -v /sys/kernel/btf:/sys/kernel/btf:ro \ -v /root/ckb-testnet/ckb:/root/ckb-testnet/ckb:ro \ -v /tmp/perf-run:/tmp/perf-run \ -e CKB_BIN=/root/ckb-testnet/ckb \ -e CKB_RPC=http://127.0.0.1:8124 \ ckb-probe:latest stability -
Test scope: S-1 (no crash for 48h) / S-2 (RSS growth ≤ 5 MB) / S-3 (no BPF warnings in
dmesg) / S-4 (automatic reconnection after CKB restart at T+24h) -
Sampling frequency: once every 10 seconds, for a total of 17,280 data points
-
Data will be processed by
generate-report.shinto a complete stability report for Week 6
11. Next Steps
Week 6: Stability Testing and Case Analysis
- Wrap up the 48-hour stability test and organize the data — including time-series charts, resource usage summaries, event fidelity reports, and latency distribution plots
- Analyze two RocksDB diagnostic scenarios — IBD write pattern analysis and compaction latency spike capture, as the core case studies in the stability report
Week 7: Hardening and Final Preparation
- Global JSON output — ensure that all modes have a unified JSON output format with complete fields
- Prepare complete demo documentation — a structured written demo report (Markdown / PDF) covering the exact same five demo workflows as the video, with full terminal screenshots, key command explanations, and output interpretation for each step
- If the Week 6 48-hour stability test is not yet complete, finish the remaining work during this week
Week 8: Release and Project Closure
- Finalize bilingual documentation — final review of all documentation in both Chinese and English
- GitHub v0.1.0 Release — create tag, write release notes, and attach prebuilt binaries
- Final project report — organize all deliverables, acceptance checklist, and known limitations according to
main_proj.md - Community sharing — submit the final monthly report