Spark Program | Ckb-probe: Deep Observability Tool for CKB Nodes Based on Aya Kernel eBPF/ckb-probe:基于 Aya 内核 eBPF 的 CKB 节点深度可观测性工具

ckb-probe: Deep Observability Tool for CKB Nodes Based on Aya Kernel eBPF

Project Proposal


I. Project Name and Summary

Project Name: ckb-probe

One-line Summary:
A deep diagnostics tool for CKB full nodes built on the Aya pure-Rust eBPF framework. Leveraging uprobe, kprobe, and tracepoint mechanisms, it traces CKB’s RocksDB storage, P2P networking, and syscall behavior directly in the kernel, delivering application-semantic, real-time performance insights.


II. Team / Individual Introduction

Applicant: Clair

2025 graduate in Computer Science, with nearly two years of commercial development experience.

Core Competencies: Participated in eBPF system observability projects at PLCT Lab (Institute of Software, Chinese Academy of Sciences), using the Aya framework to write uprobe/kprobe probes. Practiced the complete workflow of writing, loading, and managing BPF programs, Map-based data exchange, and user-space data collection. Has direct experience with BPF development in #![no_std] environments, verifier constraints, and kernel/user-space interaction. Concurrently participated in a JAX non-intrusive debugging project focused on zero-intrusion observability and performance overhead control. Proficient in Rust systems programming (ownership, lifetimes, trait system), familiar with the full Aya stack (aya-bpf + aya + aya-tool + aya-log) and the tokio async runtime. Systematically studied CS:APP, with a complete knowledge framework covering processor architecture, virtual memory, ELF linking, and system-level I/O. Worked as a frontend development engineer at a startup technology company for nearly two years (2023.08–2025.09), contributing to the delivery of an OCR program’s frontend. Additionally, developed independently an OJ Platform, demonstrating complete engineering delivery capability.


III. Problem Description

The CKB full node is a complex Rust program containing subsystems including a P2P network layer (tentacle), a consensus engine (NC-Max), a storage engine (RocksDB), a transaction pool, and CKB-VM. When nodes experience slow synchronization, abnormal memory growth, unstable peer connections, or RocksDB compaction storms, operators face multiple challenges:

Insufficient application-layer observability. Built-in metrics only cover high-level indicators (block height, peer count, etc.) and cannot tell you “RocksDB get average latency spiked from 5μs to 500μs.”

Semantic gap with generic tools. perf and strace output raw OS-level data (syscall numbers, file descriptors, addresses) and cannot distinguish whether a pwrite64 originates from RocksDB compaction or log writing.

No dedicated diagnostic tooling. Ethereum has deep Prometheus + Grafana integration, Solana has built-in validator metrics. The CKB ecosystem currently lacks such tools. Operators must rely on experience or seek help on Discord.

No verifiable sharing mechanism for diagnostic data. Operators can only share screenshots or verbal descriptions when seeking help (unverifiable). Mining pools lack cryptographically verifiable performance attestations to prove node quality.

Real-world impact: Undetected RocksDB degradation at mining pools can cause block production delays and economic losses. Core developers lacking fine-grained performance data must optimize by guesswork. New operators unable to self-diagnose increase community support burden.

Why eBPF: Zero intrusion (no CKB source modification), extremely low overhead (kernel-native speed + verifier safety checks), flexible precision (attach to any function entry/exit, enable/disable on demand).


IV. Solution

4.1 Core Approach

ckb-probe implements a pure-Rust full-stack eBPF application via the Aya framework, performing three-layer deep tracing of CKB nodes. The fundamental difference from generic tools: BPF programs understand CKB application semantics, outputting “RocksDB put operation took 23μs, wrote 512 bytes” rather than “pwrite64 syscall.”

4.2 Why Aya

Pure Rust full stack — both kernel-side BPF programs and user-space control programs are written in Rust, unified with CKB’s technology stack. No C toolchain dependency — no clang/llvm/libelf required; compiles directly via rustc → LLVM → BPF target. Type safety — Map key/value types checked at compile time. CO-RE support — BTF relocations enable cross-kernel-version execution. Mature community — adopted by production projects such as Linkerd2-proxy.

4.3 Technical Architecture

The architecture is divided into a kernel-side BPF program layer and a user-space control program layer, communicating via eBPF Maps.

Kernel-side (aya-bpf, #![no_std] Rust): Three probe groups — RocksDB uprobe group (attached to rocksdb_get/put/write/delete/iter_seek C API functions), network kprobe group (attached to tcp_sendmsg/recvmsg/connect/close kernel functions), and syscall raw tracepoint group (attached to sys_enter/sys_exit).

Data channels: BPF HashMaps aggregate statistical data (user-space polls periodically); PerfEventArray/RingBuf transmit detailed events (user-space consumes asynchronously); BPF Arrays carry configuration parameters (target PID, thresholds, etc.).

User-space (aya + tokio + ratatui): A BPF lifecycle manager handles load/attach/detach; three independent Collectors (RocksDB/Network/Syscall) read Maps and perform secondary aggregation; an analysis engine runs anomaly detection and correlation analysis; the presentation layer provides CLI tables, a TUI dashboard, and an optional Prometheus Exporter.

4.4 Three-Layer Observability Model

Layer 1: RocksDB Storage (uprobe) — the project’s highest-value module. RocksDB C API symbols cross the FFI boundary, unaffected by Rust name mangling or inlining, making them stably available. Traces rocksdb_get (point query latency and hit rate), rocksdb_put (write latency and size), rocksdb_write (batch writes), rocksdb_delete (deletion frequency), and rocksdb_iter_seek (range queries).

Layer 2: P2P Network (kprobe primary + uprobe optional) — kprobes attach to kernel TCP stack functions, reading five-tuple information from struct sock and filtering by PID for the CKB process. Symbols are absolutely stable (kernel ABI). Optional tentacle framework uprobe for protocol-level information.

Layer 3: System Calls (raw tracepoint) — attached to sys_enter/sys_exit, counts frequency and latency by syscall number, with focus on I/O, network, synchronization, and memory categories.

4.5 Constraints and Mitigations

Requires root or CAP_BPF+CAP_PERFMON (minimum-privilege configuration guide provided); requires Linux 5.8+ (ckb-probe check auto-detects); no non-Linux support (CKB nodes run almost exclusively on Linux); uprobe context-switch overhead is negligible for low-frequency calls (~10K/sec).

4.6 Web5 Decentralized Diagnostic Data Sovereignty (opt-in)

Borrows three core Web5 concepts to address controlled sharing of diagnostic data:

DID Identity — based on the did:key method (Ed25519 key pair, purely local generation, no on-chain registration), providing operators with a pseudonymous identity. Reports from the same operator at different times can be confirmed as originating from the same entity.

Signed Diagnostic Reports — stores structured diagnostic snapshots following DWN data model principles. When sharing, generates a DID-private-key-signed report. Recipients can verify integrity but cannot forge it. Operators can choose to expose only partial data.

Verifiable Credentials (VC) — generates W3C VC-format node health attestations (e.g., “RocksDB P99 latency < 1ms over the past 24h”), usable for mining pools proving infrastructure quality or future peer reputation systems.

All Web5 features are strictly opt-in. Core eBPF functionality does not depend on any Web5 component.


V. Detailed Technical Implementation Plan

5.1 Phase 0: CKB Binary Symbol Reconnaissance

A Rust tool (goblin + rustc-demangle) comprehensively scans CKB binary symbols and grades them at three levels: Tier 1 (highly reliable) — RocksDB C API functions that cross the FFI boundary, unaffected by the Rust compiler, making them ideal uprobe targets; Tier 2 (possibly available) — Rust cross-crate public functions (e.g., tentacle connection management), name-mangled and potentially varying across versions; Tier 3 (uncertain) — crate-internal functions, not relied upon. Also confirms whether RocksDB is dynamically linked (.so symbols independently available — ideal) or statically linked.

5.2 Phase 1: Environment Setup and Baseline Verification

Set up the Aya development environment (Rust nightly + bpf-linker + bpftool + aya-tool) and complete four key verifications: (1) rocksdb_get uprobe + uretprobe latency measurement; (2) multiple RocksDB functions simultaneously uprobed; (3) tcp_sendmsg kprobe reading peer IP:Port; (4) sys_enter raw tracepoint counting syscall Top-N. All four verifications must be completed within the first two weeks to confirm the technical route is viable.

5.3 Phase 2: RocksDB Storage Layer Deep Tracing (Core Module)

BPF programs: Uprobe/uretprobe pairs for five operations, collecting latency, data size, and hit rate. Map architecture: ENTRY_TIMESTAMPS (HashMap, pairing uprobe/uretprobe), OP_STATS (PerCpuHashMap, aggregate statistics), LATENCY_HIST (PerCpuHashMap, log2-bucketed histogram), SLOW_EVENTS (PerfEventArray, sends only when exceeding threshold), CONFIG (Array, user-space configuration). User-space Collector: Periodically polls and merges per-CPU data, computes instantaneous rates, approximates P50/P90/P99 from histograms, and asynchronously consumes slow operation events.

5.4 Phase 3: P2P Network Layer Tracing

kprobe (primary approach): tcp_sendmsg/recvmsg (traffic statistics), tcp_connect/inet_csk_accept (connection events), tcp_close (disconnection events), reading remote addresses from struct sock with PID filtering. Map architecture: PEER_STATS (LruHashMap, capacity 256, per-peer statistics), NET_GLOBAL (PerCpuArray, global traffic), CONN_EVENTS (PerfEventArray, connection/disconnection events). uprobe (optional enhancement): If tentacle symbols are available, attach to obtain protocol-level information.

5.5 Phase 4: Syscall Layer Tracing

sys_enter/sys_exit raw tracepoints with PID filtering, counting frequency and latency by syscall number. Map architecture: SYSCALL_ENTRY (HashMap, enter/exit pairing), SYSCALL_STATS (PerCpuHashMap, per-syscall aggregation), SYSCALL_LATENCY_HIST (PerCpuHashMap, latency histogram). The Collector computes characteristic metrics such as futex wait ratio, I/O efficiency, and epoll wake frequency.

5.6 Phase 5: User-Space Control Program and TUI Dashboard

Project structure: ckb-probe-ebpf/ (BPF programs, #![no_std]), ckb-probe-common/ (shared type definitions), ckb-probe/ (user-space control program with collectors, analysis, ui, identity, and report modules), xtask/ (build helpers).

CLI subcommands (clap v4): check (environment check), symbols (symbol analysis), rocksdb (RocksDB monitoring), net (P2P monitoring), syscall (syscall analysis), overview (TUI dashboard), identity (DID management), report (signed reports), verify (report verification), export (Prometheus, optional).

TUI dashboard (ratatui + crossterm): Elm-like architecture with four-tab switching (Overview/RocksDB/Network/Syscall). Overview uses a four-quadrant layout (P2P Network / RocksDB / Syscall Top-5 / Event Log); RocksDB panel includes stats table + histogram + slow operation log; Network panel includes peer list + connection events; Syscall panel includes ranking table + latency histogram.

5.7 Phase 6: Anomaly Detection and Correlation Analysis

RocksDB latency spike detection — EWMA baseline (α=0.3); alerts when last 10s average latency exceeds 5× baseline. Peer connection stability detection — alerts when disconnections exceed 10/min in a 1-minute window; tracks frequently reconnecting peers. Lock contention inference — alerts when futex frequency exceeds 3× baseline and WAIT proportion > 70%. I/O bottleneck inference — correlates pwrite64/fdatasync latency spikes with simultaneous RocksDB write latency spikes. First 5 minutes after startup serve as a baseline collection period with no alerts.

5.8 Phase 6b: Web5 Identity and Signed Reports

identity generate creates an Ed25519 key pair and did:key DID, stored locally (file permissions 0600). report extracts statistical summaries from Collector historical data, builds a JSON report signed with the Ed25519 private key (JWS format), with optional W3C VC format output. verify parses did:key to obtain the public key and verifies signature integrity. Depends only on ed25519-dalek and multibase — two lightweight crates.

5.9 Phase 7: Prometheus Exporter (Bonus)

ckb-probe export starts an HTTP server (default port 9190) exposing RocksDB/network/syscall/alert metrics in standard Prometheus text format, with a pre-configured Grafana Dashboard JSON template.


VI. Expected Deliverables

6.1 Core Deliverables

  1. ckb-probe CLI tool v0.1.0 — subcommands: check (environment check), symbols (symbol analysis), rocksdb (RocksDB monitoring with --histogram/–slow/–json). Built-in basic RocksDB latency spike anomaly detection (EWMA baseline + threshold alerting).

  2. ckb-probe-ebpf BPF probe collection — RocksDB uprobe/uretprobe (5 pairs), CO-RE cross-kernel compatible, BPF ELF embedded in the user-space binary.

  3. CKB symbol analysis tool and report — complete tiered symbol availability report; ckb-probe symbols is reusable.

  4. Bilingual documentation (Chinese/English) — README, INSTALL, USAGE, and ARCHITECTURE each in both languages.

  5. 48h stability test report — see §6.5 for details.

  6. Docker-based reproducible test environment — see §6.3 for details.

  7. Pre-recorded full demo video — see §6.3 for details.

6.2 Concrete Output Examples

To make the tool’s output tangible for reviewers, the following shows what the terminal output of ckb-probe rocksdb looks like under two representative scenarios.

Scenario A — Normal Operation

$ sudo ckb-probe rocksdb --pid 18920

╭──────────────── CKB RocksDB Monitor (PID: 18920) ────────────────╮
│ Uptime: 00:05:32   Sampling: 1s   Node: CKB v0.119.0            │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │ 3,241 │     4.7 │     3.2 │    18.5 │     1.2 MB/s  │
│ PUT        │   856 │    12.3 │     9.1 │    45.2 │   420.0 KB/s  │
│ WRITE      │   128 │    38.7 │    28.4 │   112.0 │     1.8 MB/s  │
│ DELETE     │    42 │     5.1 │     4.0 │    15.3 │         —     │
│ ITER_SEEK  │   215 │     8.9 │     6.5 │    32.1 │         —     │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯
  Status: ✅ Normal — All latencies within baseline.

All five operation types are reported with per-second throughput, average latency, and percentile distribution. No anomaly is detected; the status line confirms healthy operation.

Scenario B — RocksDB Latency Spike (Compaction Storm)

$ sudo ckb-probe rocksdb --pid 18920

╭──────────────── CKB RocksDB Monitor (PID: 18920) ────────────────╮
│ Uptime: 02:17:45   Sampling: 1s   Node: CKB v0.119.0            │
├────────────┬───────┬─────────┬─────────┬──────────┬──────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs)  │    Bytes/s   │
├────────────┼───────┼─────────┼─────────┼──────────┼──────────────┤
│ GET        │ 1,102 │   487.3 │   312.5 │  2,841.0 │   389.0 KB/s │
│ PUT        │   312 │   892.1 │   645.0 │  5,120.0 │   156.0 KB/s │
│ WRITE      │    45 │ 2,341.5 │ 1,890.0 │  8,900.0 │   890.0 KB/s │
│ DELETE     │    18 │    52.3 │    38.0 │    210.5 │         —    │
│ ITER_SEEK  │    67 │   125.4 │    89.0 │    560.3 │         —    │
╰────────────┴───────┴─────────┴─────────┴──────────┴──────────────╯

⚠️  ANOMALY DETECTED [02:17:43]
  → GET avg latency 487μs exceeds 5× baseline (4.7μs → 487μs)
  → PUT avg latency 892μs exceeds 5× baseline (12.3μs → 892μs)
  → Probable cause: Compaction storm (WRITE P99 = 8.9ms)
  → Run `ckb-probe rocksdb --slow` for slow operation details.

The EWMA-based anomaly detection fires automatically when the 10-second moving average exceeds 5× the learned baseline, and provides actionable next steps.

Scenario B (continued) — Slow Operation Log

$ sudo ckb-probe rocksdb --pid 18920 --slow --threshold 100

╭────────────── Slow Operations (threshold: 100μs) ──────────────╮
│ Timestamp       │ Op    │ Latency │ Size   │ Note              │
├─────────────────┼───────┼─────────┼────────┼───────────────────┤
│ 02:17:41.023    │ WRITE │ 8,912μs │ 4.2 MB │ batch write       │
│ 02:17:41.891    │ PUT   │ 5,120μs │ 2.1 KB │                   │
│ 02:17:42.103    │ GET   │ 2,841μs │  512 B │                   │
│ 02:17:42.445    │ WRITE │ 7,230μs │ 3.8 MB │ batch write       │
│ 02:17:43.012    │ GET   │ 1,923μs │  256 B │                   │
╰─────────────────┴───────┴─────────┴────────┴───────────────────╯
  Showing 5 of 847 slow operations in last 60s.

Scenario C — JSON Machine-Readable Output

$ sudo ckb-probe rocksdb --pid 18920 --json

{
  "timestamp": "2026-04-15T02:17:45Z",
  "pid": 18920,
  "uptime_secs": 8265,
  "operations": {
    "GET":       { "qps": 1102, "avg_us": 487.3, "p50_us": 312.5, "p99_us": 2841.0, "bytes_per_sec": 398336 },
    "PUT":       { "qps": 312,  "avg_us": 892.1, "p50_us": 645.0, "p99_us": 5120.0, "bytes_per_sec": 159744 },
    "WRITE":     { "qps": 45,   "avg_us": 2341.5,"p50_us": 1890.0,"p99_us": 8900.0, "bytes_per_sec": 911360 },
    "DELETE":    { "qps": 18,   "avg_us": 52.3,  "p50_us": 38.0,  "p99_us": 210.5,  "bytes_per_sec": null   },
    "ITER_SEEK": { "qps": 67,   "avg_us": 125.4, "p50_us": 89.0,  "p99_us": 560.3,  "bytes_per_sec": null   }
  },
  "anomalies": [
    { "time": "02:17:43", "type": "latency_spike", "operation": "GET", "current_avg_us": 487.3, "baseline_avg_us": 4.7, "multiplier": 103.7 },
    { "time": "02:17:43", "type": "latency_spike", "operation": "PUT", "current_avg_us": 892.1, "baseline_avg_us": 12.3, "multiplier": 72.5 }
  ]
}

JSON output enables piping into downstream tools (jq, monitoring pipelines, automated alerting scripts).

6.3 Reproducible Verification Environment

To allow reviewers to verify all deliverables without prior eBPF expertise, the following verification assets are provided:

Docker-Based Reproducible Environment. A docker-compose.yml launches a complete test environment with a single command:

git clone https://github.com/xxxx/ckb-probe.git
cd ckb-probe
docker compose up --build

The compose file provisions two containers: (a) a CKB testnet full node (official image, pre-configured for testnet sync), and (b) a ckb-probe sidecar (built from source, with --privileged and /sys/kernel/debug mounted for BPF access). On startup, the sidecar auto-detects the CKB PID and begins RocksDB tracing. Scripted demo scenarios are included:

  • demo-normal.sh — lets the node sync for 5 minutes, captures normal-state output, and writes a JSON snapshot.
  • demo-stress.sh — injects synthetic RocksDB load via db_bench running in the same container (writing 100K entries in a burst), triggering latency spike detection and slow operation logging.
  • demo-check.sh — runs ckb-probe check and ckb-probe symbols to display environment and symbol reports.

All demo scripts print expected vs. actual output and return exit code 0 on success, making them usable as acceptance smoke tests.

Minimum host requirements: Linux host with kernel ≥ 5.8, Docker ≥ 20.10, docker-compose ≥ 2.0, 4 GB free RAM, 20 GB free disk. An env-check.sh script validates these prerequisites before launching.

Pre-Recorded Full Demo Video. A 10–15 minute narrated screen recording (uploaded to YouTube and mirrored to a CDN-hosted MP4) walks through: (1) environment check, (2) symbol analysis, (3) live RocksDB monitoring during normal sync, (4) anomaly detection triggering during a synthetic compaction burst, (5) slow operation log and JSON export. This serves as a fallback for reviewers who cannot run Docker locally.

6.4 Acceptance Criteria

The project defines three categories of acceptance criteria. All criteria are objectively measurable.

Functional Verification Checklist:

  • F-1: ckb-probe check correctly reports kernel version, BTF availability, BPF capability status, and provides actionable messages for any missing prerequisites.
  • F-2: ckb-probe symbols <ckb-binary> produces a tiered symbol availability report covering Tier 1/2/3 classification and RocksDB linking method detection.
  • F-3: ckb-probe rocksdb --pid <PID> attaches to a running CKB process and outputs a real-time metrics table refreshing at 1-second intervals.
  • F-4: All 5 RocksDB operations (GET/PUT/WRITE/DELETE/ITER_SEEK) are traced with per-operation QPS, average latency, P50/P99 latency, and throughput.
  • F-5: --slow --threshold <N> mode captures and displays individual operations exceeding the specified latency threshold in microseconds.
  • F-6: --histogram mode displays a log2-bucketed latency distribution.
  • F-7: EWMA-based anomaly detection triggers an alert within 15 seconds of a synthetic latency spike (verified via demo-stress.sh).
  • F-8: --json mode produces valid JSON parseable by jq with no errors.
  • F-9: Graceful shutdown on SIGINT/SIGTERM: BPF programs are detached cleanly, no orphaned probes remain (verified via bpftool prog list after exit).
  • F-10: Graceful handling of CKB process exit during monitoring: ckb-probe emits an informative message and exits without panic.

Performance Overhead Thresholds:

  • P-1: Additional CPU usage ≤ 3% relative to baseline CKB node operation (measured as the difference in average CPU% over a 1-hour window with and without ckb-probe attached).
  • P-2: ckb-probe process RSS memory usage ≤ 50 MB under sustained monitoring.
  • P-3: BPF event loss rate < 0.1% under sustained 10K events/sec (reported by PerfEventArray lost event counter).
  • P-4: CKB block sync speed degradation < 1% (measured as blocks-synced-per-minute over a 2-hour IBD window, with vs. without ckb-probe).

Stability Benchmarks:

  • S-1: 48-hour continuous operation without crash, panic, or restart.
  • S-2: ckb-probe RSS memory growth over 48 hours ≤ 5 MB (no memory leak).
  • S-3: Zero kernel warnings or BPF-related dmesg errors during the 48-hour run.
  • S-4: Successful recovery when the monitored CKB process is restarted during monitoring (ckb-probe detects the exit and can be relaunched on the new PID).

6.5 48-Hour Stability Test Report Content

The 48-hour test report is a self-contained PDF/Markdown document that allows external reviewers to fully assess results without running the tool themselves. It includes:

Time-series metric charts (sampled every 10 seconds for 48 hours): ckb-probe CPU usage , ckb-probe RSS memory (MB), CKB node CPU usage (with vs. without probe), RocksDB GET/PUT/WRITE P99 latency over time, and BPF event throughput (events/sec).

Resource consumption summary table: Min/Max/Avg/P99 for CPU%, memory, and event rate, with explicit comparison against the P-1 through P-4 thresholds defined above.

Event capture fidelity report: Total events generated vs. total events captured, per-operation breakdown, lost event count and percentage.

Write latency distribution charts: Full log2-bucketed histograms for each of the 5 operations, aggregated over the entire 48-hour window, presented both as bar charts and cumulative distribution functions.

Two annotated diagnostic case studies: (1) IBD write pattern analysis — captures the RocksDB write amplification pattern during Initial Block Download, showing how PUT/WRITE throughput and latency evolve as the chain grows. (2) Compaction latency spike capture — either a naturally occurring spike during the 48-hour run or a synthetically induced one (via demo-stress.sh), with before/during/after latency charts and the corresponding anomaly alert output annotated with timestamps.

Reproduction instructions: Exact kernel version, CKB version, hardware specs, Docker image tags, and the single command to reproduce the full 48-hour test.

6.6 Bonus Deliverables (Planned for Subsequent Versions)

P2P network layer probes + ckb-probe net; syscall layer probes + ckb-probe syscall; TUI interactive dashboard (ckb-probe overview); anomaly correlation analysis (lock contention / I/O bottleneck inference); Web5 diagnostic data sovereignty toolkit (identity/report/verify); Prometheus exporter + Grafana Dashboard JSON template. All of the above have been fully designed in this proposal and are planned for subsequent versions.


VII. Funding Request and Usage

Total Requested: 1,000 USD

Payment Method: 100% CKB

Category Amount Description
Cloud Server $350 USD 1 VPS (Linux 5.15+ kernel, ≥4 cores 8GB), serving both as development/build machine and CKB testnet full node. 8-week usage.
Developer Stipend $450 USD Core development work. Estimated 20–30 hours per week, 8 weeks total.
Docs & Community $200 USD Bilingual documentation, architecture diagrams, 2 monthly demo session materials, and final project report.

VIII. Estimated Completion Timeline

Total Duration: 8 weeks (approximately 2 months)

Stage 1: Research and Feasibility Verification (Week 1–3)

Week 1: CKB source code architecture study (P2P/storage/sync layer call chain analysis) + Aya framework deep learning (implement 2–3 official examples) + development environment setup + CKB testnet node deployment.

Week 2: Comprehensive CKB binary symbol reconnaissance — scan official Release and self-compiled versions, analyze RocksDB linking method, generate tiered report. Implement ckb-probe symbols subcommand.

Week 3: Four eBPF feasibility verifications (RocksDB uprobe latency measurement / multi-function uprobe / TCP kprobe / sys_enter tracepoint). Implement ckb-probe check subcommand.

:bookmark: Milestone 1 (end of Week 3): Feasibility verification complete. All three BPF program types verified on the CKB process. check and symbols subcommands delivered.

Stage 2: RocksDB Core Probe Development (Week 4–5)

Week 4: Complete RocksDB BPF probe implementation (5 operation uprobe/uretprobe pairs + OP_STATS/LATENCY_HIST/SLOW_EVENTS Maps + verifier tuning) + RocksDbCollector.

Week 5: Complete ckb-probe rocksdb subcommand (table/histogram/slow log/JSON modes) + basic RocksDB latency spike anomaly detection (EWMA baseline + N× threshold alerting).

:bookmark: Milestone 2 (end of Week 5): ckb-probe rocksdb operational on a CKB testnet node, producing meaningful RocksDB performance data with basic anomaly detection. Mid-term report submitted.

Stage 3: Testing, Optimization, and Reproducible Environment (Week 6–7)

Week 6: Build Docker-based reproducible test environment (docker-compose.yml + three demo scripts + env-check.sh). Begin 48-hour stability test. Quantitative performance overhead assessment (CPU/memory/event loss/sync speed). Two RocksDB diagnostic scenario case analyses (IBD write pattern analysis + compaction latency spike capture).

Week 7: Complete 48-hour stability test report (time-series charts, resource consumption summary, event fidelity report, latency distribution charts, annotated case studies). Targeted optimization (CPU/memory/event loss) + robustness hardening (process exit handling / permission prompts / signal handling) + global JSON output + CI configuration. Record full demo video (10–15 min narrated walkthrough).

Stage 4: Release and Project Closure (Week 8)

Week 8: Bilingual documentation + GitHub v0.1.0 Release + community presentation + final project report.

:bookmark: Milestone 3 (end of Week 8): All deliverables submitted — including CLI tool, BPF probes, symbol analysis tool, Docker test environment, demo video, 48h test report, and bilingual documentation. Project closed.

P2P network layer probes (original Week 6), syscall layer probes (original Week 7), TUI dashboard (original Week 8), Web5 identity and signed reports (original Week 9) are moved to subsequent version plans.

Timeline Overview

Stage Weeks Focus Milestone
Stage 1: Research & Verification Week 1–3 CKB/Aya deep research, symbol analysis, 4 eBPF feasibility verifications :white_check_mark: Milestone 1
Stage 2: RocksDB Core Probes Week 4–5 RocksDB uprobe + ckb-probe rocksdb CLI + basic anomaly detection :white_check_mark: Milestone 2 (Week 5)
Stage 3: Testing, Optimization & Reproducible Environment Week 6–7 Docker environment, 48h stability test, optimization, demo video
Stage 4: Release & Closure Week 8 Docs, release, community, final report :white_check_mark: Milestone 3

IX. Relevance to the CKB Ecosystem

Fills a tooling gap — the first system-level diagnostic tool in the CKB ecosystem that understands application semantics, with output directly corresponding to the issues operators care about.

Serves multiple ecosystem roles — mining pool operators monitor RocksDB and network health; core developers obtain fine-grained performance data to guide optimization; new operators self-diagnose via check and overview.

Pure Rust stack alignment — community Rust developers can contribute without friction, with no need to learn C/Python to modify BPF logic.

Embodies self-sovereignty — all analysis is performed locally with no dependency on centralized monitoring services. Web5 DID + VC extends the “don’t trust, verify” ethos into the operational domain, laying groundwork for a decentralized peer reputation system.

Technical frontier — building a blockchain node diagnostic tool with Aya pure-Rust eBPF is a frontier endeavor. The architecture is extensible to other Rust-based blockchain nodes (Reth, Substrate, etc.).

Open-source and extensible — clear extension interfaces are reserved for future additions such as CKB-VM tracing, tentacle protocol decoding, Column Family-level tracing, predictive alerting, and on-chain report anchoring.


X. Technical Risks and Mitigations

Risk Impact Probability Mitigation
CKB binary stripped, no RocksDB uprobe symbols High Medium Confirm in Week 2. Check dynamic linking first (.so symbols independently available); provide symbol-preserving compilation guide; worst case degrade to kprobe+tracepoint
BPF verifier rejects probes Medium Medium Aya’s safety abstractions greatly reduce verifier issues; common problems have known solutions; simplify BPF logic if necessary
Kernel version < 5.8 Medium Low PerfEventArray as RingBuf fallback (available since 4.15+); check auto-detects
TCP stack function parameter layout changes across kernels Medium Low CO-RE + BTF type relocations auto-adapt
Tentacle uprobe symbols unpredictable Medium High By design not a dependency — P2P uses kprobe TCP stack as primary; tentacle uprobe is optional enhancement only
Probe performance impact exceeds expectations Medium Low Quantitative assessment in Week 6; reduce hook points / raise thresholds / implement --lightweight mode if needed
Web5 features add complexity Low Medium Fully opt-in and decoupled; uses only two lightweight crates; can downgrade to signed reports only if time is tight
Docker privileged mode required for BPF Low Low Documented in INSTALL guide; env-check.sh validates prerequisites; demo video serves as fallback for reviewers who cannot run Docker

XI. Transparency Commitments

Fully open-source code — public development on GitHub from Day 1, MIT OR Apache-2.0 dual license. Weekly public updates — progress updates posted weekly in the Discord Spark Program channel. Monthly demo sessions — 2 sessions (Weeks 4 and 8), including a live demo and Q&A at the final session. Authentic test data — all data from actual CKB testnet node operation with complete reproduction steps. Honest limitation reporting — final report explicitly documents limitations and unimplemented features. Reproducible verification — Docker test environment and demo video ensure any reviewer can independently verify deliverables.



ckb-probe:基于 Aya 内核 eBPF 的 CKB 节点深度可观测性工具

项目提案书


一、项目名称与简介

项目名称: ckb-probe

一句话简介:
基于 Aya 纯 Rust eBPF 框架的 CKB 全节点深度诊断工具。利用 uprobe、kprobe、tracepoint 三重机制,在内核中高效追踪 CKB 的 RocksDB 存储、P2P 网络与系统调用行为,提供应用语义级的实时性能洞察。


二、团队/个人介绍

申请人: Clair

计算机科学与技术 2025 届毕业生,近两年商业开发经验。

核心能力: 在 PLCT Lab(中科院软件所)参与 eBPF 系统观测项目,使用 Aya 框架编写 uprobe/kprobe 探针,实践了 BPF 程序编写、加载、Map 数据交换及用户态采集的完整流程,对 #![no_std] 环境下的 BPF 开发、verifier 约束、内核态/用户态交互有直接经验。同期参与 JAX 非侵入式调试项目,关注零侵入观测与性能开销控制。具备 Rust 系统编程能力(所有权、生命周期、trait 系统),熟悉 Aya 全栈开发(aya-bpf + aya + aya-tool + aya-log)及 tokio 异步运行时。系统学习过 CS:APP,对处理器架构、虚拟内存、ELF 链接、系统级 I/O 有完整知识框架。在一初创科技公司任前端工程师近两年(2023.08–2025.09),参与交付一OCR识别项目的前端,另自行开发一个OJ Platform,共参与多个商业产品,具备完整工程交付能力。


三、问题描述

CKB 全节点是包含 P2P 网络层(tentacle)、共识引擎(NC-Max)、存储引擎(RocksDB)、交易池、CKB-VM 等子系统的复杂 Rust 程序。当节点出现同步缓慢、内存异常、peer 连接不稳定、RocksDB compaction 风暴时,运营者面临多重困境:

应用层可观测性不足。 内置 metrics 只覆盖高层指标(区块高度、peer 数等),无法告诉你"RocksDB get 平均延迟从 5μs 飙升到 500μs"。

通用工具语义鸿沟。 perf、strace 输出原始 OS 数据(syscall 编号、fd、地址),无法区分一个 pwrite64 来自 RocksDB compaction 还是日志写入。

无专用诊断工具。 以太坊有 Prometheus + Grafana 深度集成,Solana 有内置 validator metrics,CKB 生态目前无此类工具。运营者只能凭经验或在 Discord 求助。

诊断数据缺乏可验证共享机制。 运营者求助时只能截图或口头描述(不可验证),矿池证明节点质量缺乏密码学可验证的性能证明。

实际影响: 矿池未及时发现 RocksDB 退化可致出块延迟和经济损失;核心开发者缺少精细性能数据只能凭猜测优化;新运营者无法自诊断导致社区支持负担加重。

为什么 eBPF: 零侵入(不改 CKB 源码)、极低开销(内核原生速度 + verifier 安全校验)、灵活精准(任意函数入口/出口挂载,按需开启)。


四、解决方案

4.1 核心思路

ckb-probe 通过 Aya 框架实现纯 Rust 全栈 eBPF 应用,对 CKB 节点进行三层深度追踪。与通用工具的根本区别:BPF 程序理解 CKB 应用语义,输出"RocksDB put 操作耗时 23μs,写入 512 字节"而非"pwrite64 系统调用"。

4.2 为什么选择 Aya

纯 Rust 全栈——BPF 内核态程序和用户态控制程序均用 Rust,与 CKB 技术栈统一。无 C 工具链依赖——不需要 clang/llvm/libelf,通过 rustc → LLVM → BPF target 直接编译。类型安全——Map key/value 编译期检查。CO-RE 支持——BTF 重定位实现跨内核版本运行。社区成熟——已被 Linkerd2-proxy 等生产项目采用。

4.3 技术架构

架构分为内核态 BPF 程序层和用户态控制程序层,通过 eBPF Maps 高效数据交换。

内核态(aya-bpf,#![no_std] Rust): 三组探针——RocksDB uprobe 探针组(挂载 rocksdb_get/put/write/delete/iter_seek 等 C API 函数)、网络 kprobe 探针组(挂载 tcp_sendmsg/recvmsg/connect/close 等内核函数)、Syscall raw tracepoint 探针组(挂载 sys_enter/sys_exit)。

数据通道: BPF HashMap 聚合统计数据(用户态定时轮询);PerfEventArray/RingBuf 传递详细事件(用户态异步消费);BPF Array 传递配置参数(目标 PID、阈值等)。

用户态(aya + tokio + ratatui): BPF 生命周期管理器负责加载/attach/detach;三个独立 Collector(RocksDB/Network/Syscall)读取 Map 并二次聚合;分析引擎执行异常检测和关联分析;展示层提供 CLI 表格、TUI 仪表盘和可选 Prometheus Exporter。

4.4 三层观测模型

Layer 1:RocksDB 存储层(uprobe) ——项目最高价值模块。RocksDB C API 符号跨 FFI 边界,不受 Rust name mangling 和内联影响,稳定可用。追踪 rocksdb_get(点查询延迟与命中率)、rocksdb_put(写入延迟与大小)、rocksdb_write(批量写入)、rocksdb_delete(删除频率)、rocksdb_iter_seek(范围查询)。

Layer 2:P2P 网络层(kprobe 为主 + uprobe 可选) ——kprobe 挂载内核 TCP 栈函数,通过 struct sock 获取五元组信息,按 PID 过滤 CKB 进程。符号绝对稳定(内核 ABI)。可选挂载 tentacle 框架 uprobe 获取协议级信息。

Layer 3:系统调用层(raw tracepoint) ——挂载 sys_enter/sys_exit,按 syscall number 统计调用频率和延迟,重点关注 I/O、网络、同步、内存四类。

4.5 约束与应对

需要 root 或 CAP_BPF+CAP_PERFMON(提供最小权限配置指南);需要 Linux 5.8+(ckb-probe check 自动检测);不支持非 Linux(CKB 节点几乎全在 Linux);uprobe 上下文切换开销对低频调用(万级/秒)可忽略。

4.6 Web5 去中心化诊断数据主权(opt-in)

借鉴 Web5 三个核心概念解决诊断数据可控共享问题:

DID 身份 ——基于 did:key 方法(Ed25519 密钥对,纯本地生成,无需链上注册)为运营者提供假名身份,同一运营者不同时间的报告可被确认来自同一实体。

签名诊断报告 ——遵循 DWN 数据模型思路存储结构化诊断快照,分享时生成 DID 私钥签名的报告,接收方可验证完整性但无法伪造。运营者可选择只暴露部分数据。

可验证凭证(VC) ——生成 W3C VC 格式的节点健康证明(如"过去 24h RocksDB P99 延迟 < 1ms"),可用于矿池证明基础设施质量或未来 peer 信誉系统。

所有 Web5 特性严格 opt-in,核心 eBPF 功能不依赖任何 Web5 组件。


五、详细技术实现计划

5.1 Phase 0:CKB 二进制符号侦察

编写 Rust 工具(goblin + rustc-demangle)全面扫描 CKB 二进制符号,分三级评估:Tier 1(高度可靠)为 RocksDB C API 函数——跨 FFI 边界,不受 Rust 编译器影响,是 uprobe 最佳目标;Tier 2(可能可用)为 Rust 跨 crate 公开函数(如 tentacle 连接管理),经 name mangling 可能跨版本变化;Tier 3(不确定)为 crate 内部函数,不作为依赖。同时确认 RocksDB 是动态链接(.so 符号独立可用,最理想)还是静态链接。

5.2 Phase 1:环境搭建与基础验证

搭建 Aya 开发环境(Rust nightly + bpf-linker + bpftool + aya-tool),完成四项关键验证:(1) rocksdb_get uprobe + uretprobe 延迟测量;(2) 多个 RocksDB 函数同时 uprobe;(3) tcp_sendmsg kprobe 读取 peer IP:Port;(4) sys_enter raw tracepoint 统计 syscall Top-N。四项验证必须在项目前两周内完成,确认技术路线可行。

5.3 Phase 2:RocksDB 存储层深度追踪(核心模块)

BPF 程序: 五种操作的 uprobe/uretprobe 对,采集延迟、数据大小、命中率。Map 架构: ENTRY_TIMESTAMPS(HashMap,配对 uprobe/uretprobe)、OP_STATS(PerCpuHashMap,聚合统计)、LATENCY_HIST(PerCpuHashMap,log2 分桶直方图)、SLOW_EVENTS(PerfEventArray,仅超阈值发送)、CONFIG(Array,用户态配置)。用户态 Collector: 定时轮询合并 per-CPU 数据,计算瞬时速率,从直方图近似 P50/P90/P99,异步消费慢操作事件。

5.4 Phase 3:P2P 网络层追踪

kprobe(主方案): tcp_sendmsg/recvmsg(流量统计)、tcp_connect/inet_csk_accept(连接事件)、tcp_close(断开事件),从 struct sock 读取远端地址,PID 过滤。Map 架构: PEER_STATS(LruHashMap,容量 256,per-peer 统计)、NET_GLOBAL(PerCpuArray,全局流量)、CONN_EVENTS(PerfEventArray,连接/断开事件)。uprobe(可选增强): 若 tentacle 符号可用则挂载获取协议级信息。

5.5 Phase 4:Syscall 层追踪

sys_enter/sys_exit raw tracepoint,PID 过滤,按 syscall number 统计频率和延迟。Map 架构: SYSCALL_ENTRY(HashMap,enter/exit 配对)、SYSCALL_STATS(PerCpuHashMap,per-syscall 聚合)、SYSCALL_LATENCY_HIST(PerCpuHashMap,延迟直方图)。Collector 计算 futex wait 比率、I/O 效率、epoll 唤醒频率等特征指标。

5.6 Phase 5:用户态控制程序与 TUI 仪表盘

项目结构: ckb-probe-ebpf/(BPF 程序,#![no_std])、ckb-probe-common/(共享类型定义)、ckb-probe/(用户态控制程序,含 collectors、analysis、ui、identity、report 模块)、xtask/(构建辅助)。

CLI 子命令(clap v4): check(环境检查)、symbols(符号分析)、rocksdb(RocksDB 监控)、net(P2P 监控)、syscall(syscall 分析)、overview(TUI 仪表盘)、identity(DID 管理)、report(签名报告)、verify(报告验证)、export(Prometheus,可选)。

TUI 仪表盘(ratatui + crossterm): Elm-like 架构,四 Tab 切换(Overview/RocksDB/Network/Syscall)。Overview 四象限布局(P2P 网络/RocksDB/Syscall Top-5/事件日志);RocksDB 面板含统计表+直方图+慢操作日志;Network 面板含 peer 列表+连接事件;Syscall 面板含排行表+延迟直方图。

5.7 Phase 6:异常检测与关联分析

RocksDB 延迟飙升检测——EWMA 基线(α=0.3),最近 10s 平均延迟超基线 5 倍触发告警。Peer 连接稳定性检测——1 分钟窗口断开超 10 次触发告警,追踪频繁重连 peer。锁竞争推断——futex 频率超基线 3 倍且 WAIT 占比 > 70%。I/O 瓶颈推断——pwrite64/fdatasync 延迟飙升与 RocksDB write 延迟同时飙升时关联推断。启动后前 5 分钟为基线收集期不触发告警。

5.8 Phase 6b:Web5 身份与签名报告

identity generate 生成 Ed25519 密钥对和 did:key DID,存储到本地文件(权限 0600)。report 从 Collector 历史数据提取统计摘要,构建 JSON 报告并用 Ed25519 私钥签名(JWS 格式),可选输出 W3C VC 格式。verify 解析 did:key 获取公钥验证签名完整性。仅依赖 ed25519-dalek 和 multibase 两个轻量 crate。

5.9 Phase 7:Prometheus Exporter(加分项)

ckb-probe export 启动 HTTP 服务器(默认端口 9190),以标准 Prometheus text 格式暴露 RocksDB/网络/Syscall/告警指标,配套预配置 Grafana Dashboard JSON 模板。


六、预期交付成果

6.1 核心交付物

  1. ckb-probe CLI 工具 v0.1.0 —— 子命令集:check(环境检查)、symbols(符号分析)、rocksdb(RocksDB 监控,支持 --histogram/–slow/–json)。内置基础 RocksDB 延迟飙升异常检测(EWMA 基线 + 阈值告警)。

  2. ckb-probe-ebpf BPF 探针程序集 —— RocksDB uprobe/uretprobe(5 对),CO-RE 跨内核兼容,BPF ELF 嵌入用户态二进制。

  3. CKB 符号分析工具与报告 —— 完整符号可用性分级报告,ckb-probe symbols 可复用。

  4. 中英双语文档 —— README、INSTALL、USAGE、ARCHITECTURE 各含中英文版本。

  5. 48 小时稳定性测试报告 —— 详见 §6.5。

  6. 基于 Docker 的可复现测试环境 —— 详见 §6.3。

  7. 预录制完整演示视频 —— 详见 §6.3。

6.2 具体输出示例

为帮助评审者直观理解工具的实际产出,以下展示 ckb-probe rocksdb 在两种典型场景下的终端输出样本。

场景 A —— 正常运行

$ sudo ckb-probe rocksdb --pid 18920

╭──────────────── CKB RocksDB Monitor (PID: 18920) ────────────────╮
│ Uptime: 00:05:32   Sampling: 1s   Node: CKB v0.119.0            │
├────────────┬───────┬─────────┬─────────┬─────────┬───────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs) │    Bytes/s    │
├────────────┼───────┼─────────┼─────────┼─────────┼───────────────┤
│ GET        │ 3,241 │     4.7 │     3.2 │    18.5 │     1.2 MB/s  │
│ PUT        │   856 │    12.3 │     9.1 │    45.2 │   420.0 KB/s  │
│ WRITE      │   128 │    38.7 │    28.4 │   112.0 │     1.8 MB/s  │
│ DELETE     │    42 │     5.1 │     4.0 │    15.3 │         —     │
│ ITER_SEEK  │   215 │     8.9 │     6.5 │    32.1 │         —     │
╰────────────┴───────┴─────────┴─────────┴─────────┴───────────────╯
  Status: ✅ Normal — All latencies within baseline.

五种操作类型均报告每秒吞吐量、平均延迟和百分位分布。未检测到异常,状态行确认运行健康。

场景 B —— RocksDB 延迟飙升(Compaction 风暴)

$ sudo ckb-probe rocksdb --pid 18920

╭──────────────── CKB RocksDB Monitor (PID: 18920) ────────────────╮
│ Uptime: 02:17:45   Sampling: 1s   Node: CKB v0.119.0            │
├────────────┬───────┬─────────┬─────────┬──────────┬──────────────┤
│ Operation  │  QPS  │ Avg(μs) │ P50(μs) │ P99(μs)  │    Bytes/s   │
├────────────┼───────┼─────────┼─────────┼──────────┼──────────────┤
│ GET        │ 1,102 │   487.3 │   312.5 │  2,841.0 │   389.0 KB/s │
│ PUT        │   312 │   892.1 │   645.0 │  5,120.0 │   156.0 KB/s │
│ WRITE      │    45 │ 2,341.5 │ 1,890.0 │  8,900.0 │   890.0 KB/s │
│ DELETE     │    18 │    52.3 │    38.0 │    210.5 │         —    │
│ ITER_SEEK  │    67 │   125.4 │    89.0 │    560.3 │         —    │
╰────────────┴───────┴─────────┴─────────┴──────────┴──────────────╯

⚠️  ANOMALY DETECTED [02:17:43]
  → GET avg latency 487μs exceeds 5× baseline (4.7μs → 487μs)
  → PUT avg latency 892μs exceeds 5× baseline (12.3μs → 892μs)
  → Probable cause: Compaction storm (WRITE P99 = 8.9ms)
  → Run `ckb-probe rocksdb --slow` for slow operation details.

EWMA 异常检测在最近 10 秒移动平均超过学习基线 5 倍时自动触发,并提供下一步操作建议。

场景 B(续)—— 慢操作日志

$ sudo ckb-probe rocksdb --pid 18920 --slow --threshold 100

╭────────────── Slow Operations (threshold: 100μs) ──────────────╮
│ Timestamp       │ Op    │ Latency │ Size   │ Note              │
├─────────────────┼───────┼─────────┼────────┼───────────────────┤
│ 02:17:41.023    │ WRITE │ 8,912μs │ 4.2 MB │ batch write       │
│ 02:17:41.891    │ PUT   │ 5,120μs │ 2.1 KB │                   │
│ 02:17:42.103    │ GET   │ 2,841μs │  512 B │                   │
│ 02:17:42.445    │ WRITE │ 7,230μs │ 3.8 MB │ batch write       │
│ 02:17:43.012    │ GET   │ 1,923μs │  256 B │                   │
╰─────────────────┴───────┴─────────┴────────┴───────────────────╯
  Showing 5 of 847 slow operations in last 60s.

场景 C —— JSON 机器可读输出

$ sudo ckb-probe rocksdb --pid 18920 --json

{
  "timestamp": "2026-04-15T02:17:45Z",
  "pid": 18920,
  "uptime_secs": 8265,
  "operations": {
    "GET":       { "qps": 1102, "avg_us": 487.3, "p50_us": 312.5, "p99_us": 2841.0, "bytes_per_sec": 398336 },
    "PUT":       { "qps": 312,  "avg_us": 892.1, "p50_us": 645.0, "p99_us": 5120.0, "bytes_per_sec": 159744 },
    "WRITE":     { "qps": 45,   "avg_us": 2341.5,"p50_us": 1890.0,"p99_us": 8900.0, "bytes_per_sec": 911360 },
    "DELETE":    { "qps": 18,   "avg_us": 52.3,  "p50_us": 38.0,  "p99_us": 210.5,  "bytes_per_sec": null   },
    "ITER_SEEK": { "qps": 67,   "avg_us": 125.4, "p50_us": 89.0,  "p99_us": 560.3,  "bytes_per_sec": null   }
  },
  "anomalies": [
    { "time": "02:17:43", "type": "latency_spike", "operation": "GET", "current_avg_us": 487.3, "baseline_avg_us": 4.7, "multiplier": 103.7 },
    { "time": "02:17:43", "type": "latency_spike", "operation": "PUT", "current_avg_us": 892.1, "baseline_avg_us": 12.3, "multiplier": 72.5 }
  ]
}

JSON 输出支持管道传递给下游工具(jq、监控管线、自动告警脚本)。

6.3 可复现验证环境

为使评审者无需具备 eBPF 专业知识即可验证所有交付物,提供以下验证资产:

基于 Docker 的可复现环境。 通过 docker-compose.yml 一键启动完整测试环境:

git clone https://github.com/xxxx/ckb-probe.git
cd ckb-probe
docker compose up --build

compose 文件配置两个容器:(a) CKB 测试网全节点(官方镜像,预配置测试网同步),(b) ckb-probe sidecar(从源码构建,挂载 --privileged/sys/kernel/debug 以支持 BPF 访问)。启动后 sidecar 自动检测 CKB PID 并开始 RocksDB 追踪。附带脚本化演示场景:

  • demo-normal.sh —— 让节点同步 5 分钟,捕获正常状态输出,写入 JSON 快照。
  • demo-stress.sh —— 在同一容器内通过 db_bench 注入合成 RocksDB 负载(突发写入 100K 条目),触发延迟飙升检测和慢操作日志。
  • demo-check.sh —— 运行 ckb-probe checkckb-probe symbols 展示环境和符号报告。

所有演示脚本输出预期值与实际值对比,成功时返回退出码 0,可直接作为验收冒烟测试使用。

最低宿主机要求: Linux 内核 ≥ 5.8,Docker ≥ 20.10,docker-compose ≥ 2.0,4 GB 可用内存,20 GB 可用磁盘。env-check.sh 脚本在启动前验证这些前提条件。

预录制完整演示视频。 一段 10–15 分钟的带旁白屏幕录制(上传 YouTube 并镜像到 CDN 托管的 MP4),演示流程包括:(1) 环境检查,(2) 符号分析,(3) 正常同步期间的实时 RocksDB 监控,(4) 合成 compaction 突发期间的异常检测触发,(5) 慢操作日志和 JSON 导出。此视频作为无法在本地运行 Docker 的评审者的备选验证方式。

6.4 验收标准

项目定义三类验收标准,所有标准均可客观度量。

功能验证清单:

  • F-1:ckb-probe check 正确报告内核版本、BTF 可用性、BPF 能力状态,并为缺失的前提条件提供可操作的提示信息。
  • F-2:ckb-probe symbols <ckb-binary> 生成涵盖 Tier 1/2/3 分级和 RocksDB 链接方式检测的符号可用性报告。
  • F-3:ckb-probe rocksdb --pid <PID> 挂载到运行中的 CKB 进程,以 1 秒间隔输出实时指标表格。
  • F-4:五种 RocksDB 操作(GET/PUT/WRITE/DELETE/ITER_SEEK)均被追踪,报告各操作 QPS、平均延迟、P50/P99 延迟和吞吐量。
  • F-5:--slow --threshold <N> 模式捕获并显示超过指定微秒延迟阈值的单个操作。
  • F-6:--histogram 模式显示 log2 分桶延迟分布。
  • F-7:EWMA 异常检测在合成延迟飙升后 15 秒内触发告警(通过 demo-stress.sh 验证)。
  • F-8:--json 模式输出有效 JSON,可被 jq 无错解析。
  • F-9:收到 SIGINT/SIGTERM 后优雅关闭:BPF 程序干净卸载,退出后无残留探针(通过 bpftool prog list 验证)。
  • F-10:监控期间 CKB 进程退出时优雅处理:ckb-probe 输出提示信息并正常退出,无 panic。

性能开销阈值:

  • P-1:附加 CPU 使用率 ≤ 3%(以 1 小时窗口内有/无 ckb-probe 挂载的平均 CPU% 差值衡量)。
  • P-2:ckb-probe 进程 RSS 内存使用 ≤ 50 MB(持续监控状态)。
  • P-3:BPF 事件丢失率 < 0.1%(持续 10K events/sec 负载下,由 PerfEventArray 丢失事件计数器报告)。
  • P-4:CKB 区块同步速度退化 < 1%(以 2 小时 IBD 窗口内每分钟同步区块数,有/无 ckb-probe 对比衡量)。

稳定性基准:

  • S-1:48 小时连续运行无崩溃、panic 或重启。
  • S-2:48 小时内 ckb-probe RSS 内存增长 ≤ 5 MB(无内存泄漏)。
  • S-3:48 小时运行期间内核日志中无 BPF 相关警告或 dmesg 错误。
  • S-4:监控期间被监控 CKB 进程重启时成功恢复(ckb-probe 检测到进程退出,可在新 PID 上重新启动)。

6.5 48 小时稳定性测试报告内容

48 小时测试报告为自包含的 PDF/Markdown 文档,使外部评审者无需运行工具即可全面评估结果。包含以下内容:

时序指标图表(每 10 秒采样一次,持续 48 小时): ckb-probe CPU 使用率 %、ckb-probe RSS 内存(MB)、CKB 节点 CPU 使用率 %(有/无探针对比)、RocksDB GET/PUT/WRITE P99 延迟随时间变化、BPF 事件吞吐量(events/sec)。

资源消耗汇总表: CPU%、内存、事件率的 Min/Max/Avg/P99,并与上述 P-1 至 P-4 阈值显式对比。

事件捕获保真度报告: 总生成事件数 vs. 总捕获事件数,按操作类型分解,丢失事件计数和百分比。

写入延迟分布图表: 五种操作各自的完整 log2 分桶直方图,汇总 48 小时全窗口数据,以柱状图和累积分布函数两种形式呈现。

两个带注释的诊断案例分析: (1) IBD 写入模式分析——捕获初始区块下载期间的 RocksDB 写放大模式,展示随链增长 PUT/WRITE 吞吐量和延迟的演变。(2) Compaction 延迟尖峰捕获——48 小时运行中自然发生的尖峰或通过 demo-stress.sh 合成触发的尖峰,附带 before/during/after 延迟图表及对应的异常告警输出,标注时间戳。

复现说明: 精确的内核版本、CKB 版本、硬件配置、Docker 镜像标签,以及复现完整 48 小时测试的单条命令。

6.6 加分交付物(计划在后续版本实现)

P2P 网络层探针 + ckb-probe net;Syscall 层探针 + ckb-probe syscall;TUI 交互仪表盘(ckb-probe overview);异常关联分析(锁竞争/I/O 瓶颈推断);Web5 诊断数据主权工具集(identity/report/verify);Prometheus exporter + Grafana Dashboard JSON 模板。以上功能均已在本提案中完成设计,计划在后续版本中实现。


七、所需资金及用途说明

申请总额: 1,000 USD

支付方式: 100% CKB

类别 金额 说明
云服务器 $350 USD 1 台 VPS(Linux 5.15+ 内核,≥4 核 8GB),同时承担开发编译和运行 CKB 测试网全节点。8 周使用。
开发者补贴 $450 USD 核心开发工作。预计每周 20–30 小时,共 8 周。
文档与社区 $200 USD 中英双语文档编写、架构图制作、2 次月度分享会材料、结项报告。

八、预计完成时间

总周期: 8 周(约 2 个月)

第一阶段:调研与可行性验证(Week 1–3)

Week 1: CKB 源码架构调研(P2P/存储/同步层调用链梳理)+ Aya 框架深度学习(实现 2–3 个官方 example)+ 开发环境搭建 + CKB 测试网节点部署。

Week 2: CKB 二进制符号全面侦察——扫描官方 Release 和自编译版本,分析 RocksDB 链接方式,生成分级报告。实现 ckb-probe symbols 子命令。

Week 3: 四项 eBPF 可行性验证(RocksDB uprobe 延迟测量 / 多函数 uprobe / TCP kprobe / sys_enter tracepoint)。实现 ckb-probe check 子命令。

:bookmark: 里程碑 1(Week 3 末): 可行性验证完成,三种 BPF 程序类型在 CKB 进程上验证通过。check 和 symbols 子命令交付。

第二阶段:RocksDB 核心探针开发(Week 4–5)

Week 4: RocksDB BPF 探针完整实现(五种操作 uprobe/uretprobe + OP_STATS/LATENCY_HIST/SLOW_EVENTS Map + verifier 调优)+ RocksDbCollector。

Week 5: 完成 ckb-probe rocksdb 子命令(表格/直方图/慢操作日志/JSON 模式)+ 基础 RocksDB 延迟飙升异常检测(EWMA 基线 + N 倍阈值告警)。

:bookmark: 里程碑 2(Week 5 末): ckb-probe rocksdb 在测试网节点上可用,输出有意义的 RocksDB 性能数据并含基础异常检测。提交中期报告。

第三阶段:测试、优化与可复现环境(Week 6–7)

Week 6: 构建基于 Docker 的可复现测试环境(docker-compose.yml + 三个演示脚本 + env-check.sh)。启动 48 小时稳定性测试。性能影响量化评估(CPU/内存/事件丢失/同步速度)。两个 RocksDB 诊断场景案例分析(IBD 写入模式分析 + compaction 延迟尖峰捕获)。

Week 7: 完成 48 小时稳定性测试报告(时序图表、资源消耗汇总、事件保真度报告、延迟分布图表、带注释的案例分析)。针对性优化(CPU/内存/事件丢失)+ 鲁棒性加固(进程退出处理/权限提示/信号处理)+ JSON 全局输出 + CI 配置。录制完整演示视频(10–15 分钟带旁白演示)。

第四阶段:发布与结项(Week 8)

Week 8: 中英双语文档 + GitHub v0.1.0 Release + 社区分享 + 结项报告。

:bookmark: 里程碑 3(Week 8 末): 全部交付物提交——包括 CLI 工具、BPF 探针、符号分析工具、Docker 测试环境、演示视频、48h 测试报告和中英双语文档。项目结项。

P2P 网络层探针(原 Week 6)、Syscall 层探针(原 Week 7)、TUI 仪表盘(原 Week 8)、Web5 身份与签名报告(原 Week 9)移至后续版本计划。

时间线总览

阶段 周次 重点 里程碑
第一阶段:调研与验证 Week 1–3 CKB/Aya 深度调研、符号分析、四项 eBPF 可行性验证 :white_check_mark: 里程碑 1
第二阶段:RocksDB 核心探针 Week 4–5 RocksDB uprobe + ckb-probe rocksdb CLI + 基础异常检测 :white_check_mark: 里程碑 2(Week 5)
第三阶段:测试、优化与可复现环境 Week 6–7 Docker 环境、48h 稳定性测试、优化、演示视频
第四阶段:发布与结项 Week 8 文档、发布、社区分享、结项报告 :white_check_mark: 里程碑 3

九、与 CKB 生态的关联性

填补工具空白 ——CKB 生态首个理解应用语义的系统级诊断工具,输出直接对应运营者关心的问题。

服务多类角色 ——矿池运营者监控 RocksDB 和网络健康;核心开发者获取精细性能数据指导优化;新运营者通过 check 和 overview 自主诊断。

纯 Rust 技术栈统一 ——社区 Rust 开发者可无摩擦参与贡献,无需学习 C/Python 修改 BPF 逻辑。

体现自主权理念 ——所有分析本地完成,不依赖中心化监控服务。Web5 DID + VC 将"不信任,去验证"精神延伸到运维领域,为去中心化 peer 信誉系统奠定基础。

技术前沿性 ——Aya 纯 Rust eBPF 构建区块链节点诊断工具在行业内属前沿尝试,架构可推广至 Reth、Substrate 等其他 Rust 区块链节点。

开源可扩展 ——预留清晰扩展接口,后续可扩展 CKB-VM 追踪、tentacle 协议解码、Column Family 级追踪、预测性告警、链上报告锚定等。


十、技术风险与应对

风险 影响 概率 应对
CKB 二进制被 strip,RocksDB 无符号 Week 2 确认。优先检查动态链接(.so 符号独立可用);提供保留符号的编译指南;最坏情况退化为 kprobe+tracepoint
BPF verifier 拒绝探针 Aya 安全抽象大幅减少 verifier 问题,常见问题有已知解法,必要时简化 BPF 逻辑
内核版本 < 5.8 PerfEventArray 作为 RingBuf 的 fallback(4.15+ 可用),check 自动检测
TCP 栈函数跨内核版本参数变化 CO-RE + BTF 类型重定位自动适配
tentacle uprobe 符号不可预测 设计上不依赖——P2P 以 kprobe TCP 栈为主,tentacle uprobe 仅为可选增强
探针性能影响超预期 Week 6 量化评估,必要时减少 hook 点/提高阈值/实现 --lightweight 模式
Web5 特性增加复杂度 完全 opt-in 且解耦,仅用两个轻量 crate,时间紧可降级为仅签名报告
Docker privileged 模式需求 在 INSTALL 指南中文档化;env-check.sh 验证前提条件;演示视频作为无法运行 Docker 的评审者备选方案

十一、透明度承诺

代码完全开源 ——Day 1 起 GitHub 公开开发,MIT OR Apache-2.0 双许可。周报公开 ——每周在 NERVOS TALK论坛发布进度更新。月度分享会 ——共 2 次(第 4 周和第 8 周各一次),最后一次含 Demo 和 Q&A。测试数据真实 ——所有数据来自实际 CKB 测试网节点,提供完整复现步骤。如实报告限制 ——结项报告明确记录局限性和未实现功能。可复现验证 ——Docker 测试环境和演示视频确保任何评审者均可独立验证交付物。

14 Likes

Hi @clair, thanks for the proposal. The technical approach is solid. A few things to address before Committee review:

您好 Jiayao,感谢提交提案,技术方案很扎实。进入 Committee 评审前,请补充和调整以下几点:

1. Missing required fields / 缺少必填项

Please add: To-do list (week-by-week), funding amount & breakdown, and payment preference (100% CKB or 100% USDI).

请补充:按周拆分的 To-do list、申请金额及资金分配明细、支付方式选择(100% CKB 或 100% USDI)。

2. Scope too broad / 范围偏大

12 weeks exceeds the Spark Program’s 1–2 month timeframe. You’ve identified the RocksDB uprobe layer as highest-value , so consider making that the core deliverable within 4–6 weeks, with the P2P and syscall layers as stretch goals or future work.

12周超出了星火计划1-2个月的建议周期。你已经将 RocksDB uprobe 层标记为最高价值,建议以此作为4-6周内的核心交付物,P2P 层和系统调用层作为延伸目标或后续计划。

3. Web5 alignment / Web5 契合度

DID/DWN/VC features are all marked opt-in, so they can’t serve as the primary Web5 argument. If some of these are part of the core design, then specific development and validation plans need to be reflected in the to-do list and milestones.

DID/DWN/VC 都标注为可选功能,那么与 Web5 的契合度是?如果它是核心设计的一部分,那请在 To-do list 和里程碑中体现具体的开发和验证计划。

4. Funding reference / 资金参考

Pure tech projects: up to $1,000. Comprehensive: up to $1,500. Special/high-difficulty: up to $2,000.

纯技术开发:up to $1,000。综合类:通常最高 $1,500。特殊高难度:最高 $2,000。

5. Format / 格式

Please front-load key info (deliverables, timeline, budget). Bilingual (Chinese + English) is also required.

核心信息(交付物、时间线、预算)请突出。也请提供中英文双语版本。

Looking forward to the revised version!

期待修改后的版本!

2 Likes

Hi Jiayao, the updated proposal is much improved. In terms of format, two items need to address:

1. Timeline

12 weeks still exceeds the Spark Program’s 1–2 month guideline. Could you either trim the scope to fit within ~8 weeks (for example, focusing on the RocksDB layer + basic CLI as the core deliverable and moving P2P/Syscall/TUI/Web5 to future work), or let us know why 12 weeks is essential and can’t be compressed?

2. Payment preference

Please specify whether you’d like to receive funding in 100% CKB or 100% USDI.

Also, I’ve looped in @Hanssen and @yixiu.ckbfans.bit to take a closer look at the technical side. They may follow up with additional questions.

Looking forward to your response!

Hi Jiayao,更新后的提案改进很大。格式上,还有两点需要请您确认:

1. 项目周期

12周仍超出星火计划1-2个月的建议周期。能否将范围缩减到8周左右(比如以 RocksDB 层 + 基础 CLI 为核心交付物,P2P/Syscall/TUI/Web5 移至后续计划),或者说明一下12周不可压缩的原因?

2. 支付方式

请注明选择 100% CKB 还是 100% USDI。

另外,技术方面请 @hanssen@yixiu.ckbfans.bit 来看,他们可能会有进一步的问题。

期待您的回复!

CC @xingtianchunyan

3 Likes

已修改 ,请您审阅

2 Likes

I think the project is meaningful. Moreover, we can add USDT (User Statically-Defined Tracing) to enable better usage, e.g., monitoring on-chain script execution.

5 Likes

Thanks for the feedback! Adding USDT for tracing on-chain script execution sounds like a great idea. I’ll keep it in mind and consider incorporating it into the proposal.

Hi @clair, here’s the feedback from the Spark Committee on the ckb-probe proposal.

The committee acknowledges that this project addresses a real gap in CKB’s infrastructure tooling. The eBPF-based approach to node observability is technically ambitious, and frankly goes beyond what Spark micro-grants typically cover in terms of complexity. That said, the committee sees clear value in this kind of foundational work for the CKB and Web5 ecosystem, and is keen to support it.

After discussion, the proposal is currently Pending, with the following feedback for revision:

**1. Deliverables need clearer description

The current deliverables list CLI subcommands and flags, but don’t show what a successful outcome looks like in practice. Please add:

  • A concrete example of ckb-probe output when monitoring a CKB node (e.g. what does rocksdb actually print during normal operation vs. a latency spike?)
  • What criteria you would use to determine the tool is “working correctly” at project closure?

2. Explain how the Committee and community can verify and test

This is a system-level tool requiring Linux + root/CAP_BPF + a running CKB node, so the testing barrier is higher than typical Spark projects. Please describe:

  • A lightweight, reproducible way for reviewers to verify the deliverables (e.g. a Docker-based test environment, a pre-recorded demo, or step-by-step instructions for a reviewer with a Linux VPS)
  • Whether the 48h stability test report will include enough detail for an external reviewer to assess results without running the tool themselves

Please revise your proposal accordingly and resubmit.

Hi Jiayao,以下是 Spark Committee 对 ckb-probe 提案的讨论结果。

委员会认可该项目解决了 CKB 基础设施工具层面的真实缺口。基于 eBPF 的节点可观测性方案技术门槛较高,坦率地说超出了 Spark 微资助通常覆盖的项目复杂度。但委员会认为这类面向 CKB 和 Web5 生态的基础工具建设有明确价值,很愿意支持。

经讨论,提案目前状态为 Pending,需要根据以下反馈进行修改:

**1. 交付物需要更清晰地描述

当前交付物列出了 CLI 子命令及其参数,但还不够直观地展示实际产出。请补充:

  • ckb-probe 监控 CKB 节点时的具体输出示例(比如 rocksdb 子命令在正常运行和延迟飙升时分别输出什么?)
  • 项目结项时,你会用什么标准来判定工具"正常工作"?

2. 说明 Committee 和社区如何验证和测试

这是一个需要 Linux + root/CAP_BPF + 运行中 CKB 节点的系统级工具,测试门槛比一般 Spark 项目高。请描述:

  • 一种轻量、可复现的方式供评审者验证交付物(比如基于 Docker 的测试环境、预录制的演示视频,或评审者在 Linux VPS 上可跟着操作的步骤说明)
  • 48小时稳定性测试报告是否会包含足够细节,让外部评审者无需亲自运行工具就能评估结果

请据此修改提案。

元宵快乐!

Zhouzhou
On behalf of the Spark Program Committee

cc: @Hanssen @yixiu.ckbfans.bit @xingtianchunyan

3 Likes

Hi Zhouzhou,

Thank you and the Committee for the detailed review and recognition. Your supportive attitude toward this kind of infrastructure tooling is very encouraging.

I’ll revise the proposal to address both points of feedback:

On Deliverable Clarity: I’ll add concrete output examples for ckb-probe rocksdb, showing what the terminal output looks like under normal operation versus a latency spike scenario, so reviewers can intuitively understand the tool’s actual output. I’ll also define explicit acceptance criteria for project closure, including a functional verification checklist, performance overhead thresholds (e.g., additional CPU usage not exceeding X%), and stability benchmarks.

On Verification and Testing: I plan to provide a Docker-based reproducible test environment, bundling a CKB testnet node with ckb-probe and scripted demo scenarios, so reviewers can verify deliverables on any Linux machine with a single command. A pre-recorded full demo video will also be provided as a fallback. The 48-hour stability test report will include time-series metric charts, resource consumption data, event capture rates, write latency distributions, and annotated diagnostic case studies — ensuring external reviewers can fully assess results without running the tool themselves.

One additional question I’d like to raise with the Committee: As noted in the review feedback, the eBPF-based node observability approach has a relatively high technical barrier, exceeding the complexity typically covered by Spark micro-grants. In practice, the development involves writing and debugging eBPF probes, kernel compatibility adaptation, interfacing with RocksDB/CKB internals, and building a reproducible Docker test environment — all of which require significant technical investment. Would the Committee consider adjusting the grant amount for this project to 1,500 USDI? If there are budget constraints within the micro-grant framework, I completely understand and will plan the deliverable scope accordingly within the current budget.

I’ll resubmit the revised proposal shortly. Thanks again!

感谢委员会的详细评审和认可,你们对这类基础设施工具建设的支持态度让我很受鼓舞。

我会针对两点反馈修改提案:

关于交付物描述清晰度: 我会在提案中补充 ckb-probe rocksdb 的具体输出示例,分别展示正常运行和延迟飙升两种场景下的终端输出样本,让评审者能直观理解工具的实际产出。同时会定义明确的结项验收标准,包括功能验证清单、性能开销阈值(如 CPU 额外占用不超过 X%)以及稳定性基准。

关于验证与测试方式: 我计划提供一个基于 Docker 的可复现测试环境,打包 CKB 测试网节点与 ckb-probe,附带脚本化的演示场景,评审者在 Linux 机器上一键即可验证。同时会提供预录制的完整演示视频作为备选。48 小时稳定性测试报告将包含时序指标图表、资源消耗数据、事件捕获率、写入延迟分布以及带注释的诊断案例分析,确保外部评审者无需亲自运行工具即可全面评估结果。

另外有一个问题想和委员会沟通: 正如评审意见中提到的,基于 eBPF 的节点可观测性方案技术门槛较高,超出了 Spark 微资助通常覆盖的项目复杂度。实际开发中,eBPF 探针的编写与调试、内核兼容性适配、与 RocksDB/CKB 内部结构的对接,以及构建可复现的 Docker 测试环境,这些工作的技术投入确实比较大。请问委员会是否考虑将本项目的资助额度调整为 1500 USDI?如果微资助框架下有额度限制,我也完全理解,会在现有预算内合理规划交付范围。

我会尽快提交修改后的提案。再次感谢!

Best regards,
Jiayao

3 Likes

提案已更新,请您审阅

2 Likes

Appreciate the detailed update, we’ll provide feedback soon!

1 Like

Hi Zhouzhou,

Hello, I would like to raise a few questions concerning the USDI token used in the project, and I hope to receive a detailed response.

First, could you provide some background information on USDI? Specifically, who is the issuer, what is the pegging mechanism, what are the underlying reserve assets, and has it undergone any third-party audits? As a project participant, I need to have a thorough understanding of the background and security of the token being used for payments.

Second, I have certain concerns regarding the current liquidity of USDI. In particular: On which exchanges or DEXs is USDI currently available for trading? What are the average daily trading volume and order book depth? Are the slippage and settlement times acceptable when converting to mainstream stablecoins or fiat currencies? If holding a significant amount of USDI, is it possible to liquidate smoothly and in a timely manner?

If, upon evaluation, USDI is found to have significant liquidity issues — such as insufficient trading depth, limited exchange availability, or de-pegging risks — I would like to request that the project’s payment method be switched to 100% CKB payments. As the native token of the Nervos Network, CKB is listed on major exchanges, offers relatively stronger liquidity, and is more convenient for participants in terms of asset management and liquidation.

I look forward to your reply. Thank you.

你好,我想就项目中涉及的 USDI 代币提出几个问题,希望能得到详细的解答。

首先,能否介绍一下 USDI 的基本情况?包括它的发行方是谁、锚定机制是什么、底层资产储备如何、是否经过第三方审计等。作为项目参与者,我需要充分了解所使用支付代币的背景和安全性。

其次,我对 USDI 目前的流动性状况存在一定担忧。具体来说:USDI 目前在哪些交易所或 DEX 上可以交易?日均交易量和交易深度如何?兑换为主流稳定币或法币的滑点和时效是否有保障?如果持有大量 USDI,能否顺利、及时地完成变现?

如果经过评估后发现 USDI 的流动性确实存在较大问题——例如交易深度不足、兑换渠道有限、或存在脱锚风险——我希望将项目的支付方式调整为 100% CKB 支付。CKB 作为 Nervos 网络的原生代币,在主流交易所均有上线,流动性相对更有保障,也更便于参与者进行资产管理和变现。

期待您的回复,谢谢。

Hi @clair, here’s the information on USDI:

Issuer and reserves: USDI is issued by Interpaystellar (IPN). The reserve backing is 100% USDC at a 1:1 ratio (initially backed by STBT issued by Matrixport, since transitioned to USDC). IPN provides stablecoin exchange services within the IPN network for KYC-verified institutional users, with a minimum exchange amount of $10,000 equivalent. Asset reserve proof can be provided upon request. For more details, feel free to contact IPN directly.

Liquidity and conversion: USDI currently serves primarily as a CKB-native stablecoin. It does not offer direct fiat off-ramp. You can convert USDI to USDT via app.destbridge.com, then exchange to fiat through the exchange of your choice.

If you’d prefer to switch to 100% CKB payment given these conditions, that’s completely fine. Just let us know and update your proposal accordingly.

Hi Jiayao,以下是关于 USDI 的信息:

发行方与储备: USDI 由 Interpaystellar(IPN)发行,底层储备为 100% USDC,比例 1:1(初期由 Matrixport 发行的 STBT 支持,目前已转为 USDC)。IPN 平台面向经过 KYC 的机构用户提供 IPN 网络内的稳定币兑换服务,单次兑换金额不少于 1 万美元等值稳定币。应兑换方需求,项目方可提供对应资产储备证明。如需了解更多详情,可以直接联系IPN。

流动性与兑换: USDI 目前主要作为 CKB 原生稳定币使用,不提供直接兑换法币的通道。可以通过 app.destbridge.com 将 USDI 兑换为 USDT,再到对应交易所进行法币兑换。

如果基于以上情况希望改为 100% CKB 支付,没有问题,告知我们 + 更新proposal 即可。

3 Likes

感谢您的回复 ,我考虑改用100%CKB支付,我稍后会修改proposal

2 Likes

感谢舟舟的回复。
HI,我是来自Interpaystellar的ZOE。
接舟舟的帖子补充信息:

USDI 是储备支持型(reserve-backed)的稳定币,目前100%使用USDC进行储备,对于机构兑换用户可以由持牌信托机构提供100%资产储备证明;

  1. 机构可通过邮件方式提出申请,邮箱 [email protected] ,后续可按照KYC的信息提供相应材料与文件;
  2. USDI的发行量取决于机构兑换量,例如上述所提到的 destbridge 为生态伙伴帮助构建的跨链桥,可以提供 USDT - USDI 的双向兑换。

USDI在诞生之日就以提供未来的支付场景基础设施为目标,特别是面向NERVOS的FIBER 网络。Agent 逐渐成为经济行为主体的当下,希望和社区伙伴共同建设。

4 Likes

Hi Zhouzhou,

希望一切顺利!我已于此前提交了根据委员会反馈修改后的提案,包括补充的交付物输出示例、验收标准、Docker 可复现测试方案及稳定性测试报告的详细规划。

想跟您简单确认一下:修改后的提案是否已进入委员会的审阅流程?目前大概处于哪个阶段?如果有预计的审阅完成时间,也烦请告知,我好相应安排开发节奏。

另外,之前邮件中提到的关于资助额度调整至 1,500 USD 的沟通,不知委员会是否已有初步意见?无论结果如何,我都会尽快推进项目,只是希望提前了解以便合理规划交付范围。

如有任何需要我补充的材料或需要进一步说明的地方,请随时告诉我。

谢谢!

3 Likes

Hi Clair, 预计committee下周二会给到反馈

感谢您的耐心

3 Likes

Hi @clair / Jiayao,

Thank you for your quick revision and detailed supplementary explanation of the proposal!

The Spark Committee has completed the new round of review for the ckb-probe project.

We are very pleased to see that the revised proposal has fully addressed the feedback from the previous round: the specific output examples you added (normal operation vs. latency spike scenarios), the clear acceptance criteria (functional checklist, performance thresholds, stability benchmarks), the Docker-based reproducible test environment, and the detailed structure of the 48-hour stability test report are all very clear and highly actionable. The core technical solution, deliverables description, verification methods, and other content of the current proposal have been unanimously approved by the committee.

However, the proposal is still in Pending status and requires one more revision targeting the following two points:

  1. Regarding the request to adjust the grant amount to 1,500 USDI

    Under the current Spark Program rules, the maximum grant amount for pure technical development projects is 1,000 USD (for comprehensive projects under special/high-difficulty circumstances, the amount may be increased up to 2,000 USD, but requires unanimous approval from the entire committee). Your request for 1,500 USDI has no precedent.

    Please provide a detailed explanation in the revised proposal to justify the need for 1,500 USDI, including:

    • A breakdown of the specific uses for the additional 500 USD;

    • Why the original 1,000 USD budget is insufficient to cover core work such as eBPF probe development, CO-RE adaptation, 48h stability testing, etc.;

    • If the increased amount is approved, how you will ensure that the use of funds strictly aligns with the project objectives.

  2. Post-project maintenance plan

    As an infrastructure tool for CKB nodes, ckb-probe will need to continue supporting new CKB versions, binary symbol changes, kernel updates, and more in the future. Please add an explanation of your long-term maintenance plan for the tool after project completion, for example:

    • Will you plan for ongoing updates (version compatibility strategy)?

    • Will you open community contribution mechanisms (GitHub issue/PR process, maintainer handover)?

    • How will bug fixes and security updates be guaranteed?

    This will help the committee and the community better assess the long-term value of the project.

Please revise the proposal based on the above two points (we recommend replying directly in a new post in this thread). We will arrange the next round of review as soon as possible.

The committee continues to have high expectations for this eBPF observability project and sincerely thanks you for your efforts in building CKB infrastructure! Feel free to communicate any questions directly in this thread.

Best regards,
xingtian
On behalf of the Spark Program Committee


你好 @clair / Jiayao,

感谢你对提案的快速修订和详细补充说明!

Spark Committee 已完成对 ckb-probe 项目的新一轮评审。

我们很高兴看到修订后的提案已充分响应上一轮反馈:你补充的具体输出示例(正常运行 vs. 延迟飙升场景)、明确的验收标准(功能清单、性能阈值、稳定性基准)、Docker 可复现测试环境以及 48 小时稳定性测试报告的详细结构,都非常清晰且具有可操作性。现有提案的核心技术方案、交付物描述、验证方式等内容已获得委员会一致认可。

但提案目前仍处于 待定 状态,需要针对以下两点进行一次修订:

  1. 关于资助额度调整至 1500 USDI 的请求

    目前星火计划规则中,对于纯技术开发型项目,最高资助金额为 1000 USD(综合类项目在特殊/高难度情况下可上浮至 2000 USD,但需委员会全员一致同意)。你的 1500 USDI 申请尚无先例。

    请在修订版中详细说明支撑 1,500 USDI 的合理性,建议包含:

    • 额外 500 美元的具体用途 breakdown;

    • 为什么原有 1,000 美元预算不足以覆盖 eBPF 探针开发、CO-RE 适配、48h 稳定性测试等核心工作;

    • 若额度获批,你将如何确保资金使用与项目目标严格匹配。

  2. 项目结项后的维护规划

    ckb-probe 作为 CKB 节点基础设施工具,未来需持续兼容新版本 CKB、二进制符号变化、内核更新等。请补充说明结项后你对工具的长期维护计划,例如:

    • 是否计划持续更新(版本兼容性策略)?

    • 是否开放社区贡献机制(GitHub issue/PR 流程、维护者交接)?

    • bug 修复与安全更新如何保障?

    这将帮助委员会和社区更好地评估项目的长期价值。

请根据以上两点修订提案(建议在本贴新的楼层中直接回复)。我们会快速安排下一轮评审。

委员会对这个 eBPF 可观测性项目仍抱有很大期待,也感谢你为 CKB 基础设施建设付出的努力!有任何疑问随时在本帖沟通。

祝好,
行天
代表星火计划委员会
cc: @zz_tovarishch , @Hanssen , @yixiu.ckbfans.bit

1 Like

Hi,Xingtian,
针对以上申请1500usd的要求 我考虑继续保持1000usd的申请额度,并完成之前计划的开发内容
我会考虑计划持续更新,并开放社区贡献机制(GitHub issue/PR 流程、维护者交接
bug 修复与安全更新如何保障:会使用ci会根据ckb版本更新进行测试 若出现bug与兼容性问题会着手进行修复。

1 Like

Hi @Clair, I’m glad to share that the Spark Program Committee has approved your ckb-probe proposal for a grant of $1,000 USD (100% CKB, 0.001502 CKB/USD, 665,779 CKB).

The committee recognizes a clear and practical gap in the ecosystem: CKB node operators and core developers lack application-semantic, fine-grained observability. When sync slows down, memory grows unexpectedly, peer connections become unstable, or RocksDB compaction storms occur, current generic tools and built-in metrics often cannot pinpoint the real cause. Your proposal addresses this directly by using Aya’s pure-Rust eBPF stack to provide deep, low-overhead tracing across RocksDB storage, P2P networking, and syscalls, and by defining measurable acceptance criteria and a reproducible verification approach.

We also appreciate the proposal’s engineering clarity: a phased plan with early feasibility verification, concrete CLI outputs (normal vs. latency spike), a Docker-based reproducible test environment, and a 48-hour stability report. These make the deliverables verifiable and reviewer-friendly.

Here are the next steps:

1. Funding & Wallet Address

The total grant is $1,000 USD (paid in CKB). The first installment (20%) will be disbursed shortly.

Please provide a CKB wallet address to receive the funds. The remaining 80% follows a flexible model: you can request funds on-demand during weekly syncs, or receive the balance upon project completion.

2. Weekly Sync

We’d like to set up a regular weekly check-in. Two options:

  • Text-based updates on this thread at a fixed time each week, with committee feedback.
  • A brief live video call.

Please let us know your preference and a time that works for you.

3. Proposal Content Lock

Now that the proposal is approved, we’ll lock the current version of your proposal post as the reference baseline for deliverable verification. This is standard practice for all approved Spark projects. If any adjustments are needed during development, we can discuss and document them in our weekly syncs.

Congratulations, and looking forward to the collaboration!

Best,

xingtian

On behalf of the Spark Program Committee


Hi @Clair,很高兴通知你,Spark Program Committee 已批准 ckb-probe 项目,资助金额为 $1,000 USD(100% 以 CKB 支付,0.001502 CKB/USD, 665,779 CKB)

委员会认为该项目回应了生态中一个明确且高价值的缺口:当 CKB 节点出现同步变慢、内存异常增长、peer 连接不稳定、RocksDB compaction 风暴等问题时,现有通用工具(perf/strace)与节点内置指标往往难以提供“应用语义级”的定位能力。你提出的基于 Aya 纯 Rust eBPF 的方案,能以较低开销在内核侧对 RocksDB 存储、P2P 网络、系统调用进行深度追踪,并输出可操作的实时性能洞察,这对节点运营者与核心开发者都很有意义。

我们也认可提案在工程交付上的清晰度:分阶段计划中先做可行性验证,并给出具体 CLI 输出示例、可复现的 Docker 验证环境,以及 48 小时稳定性测试报告与可量化的验收标准,这些都能显著提升交付的可验证性。

接下来的步骤:

1. 资金与钱包地址

资助总额 $1,000 USD(以 CKB 支付),首笔(20%)将尽快发放。

请提供接收资金的 CKB 钱包地址。剩余 80% 采用灵活模式:可在每周同步时按需申请,也可在结项时领取。

2. 每周同步

我们希望建立固定的周同步机制,有两种方式:

  • 在本帖进行文字同步,每周固定时间更新进展,委员会回复反馈。
  • 简短的视频通话。

请告知你的偏好和方便的时间。

3. 提案内容锁定

提案获批后,我们会将当前版本的提案帖锁定,作为后续交付验收的参照基准。这是所有 Spark 获批项目的常规流程。如果开发过程中需要调整,可以在每周同步时沟通

祝好,
行天
代表星火计划委员会

cc: @zz_tovarishch , @yixiu.ckbfans.bit , @Hanssen

4 Likes

Hi 行天,

非常感谢委员会的认可与批准,也感谢你详细的说明。收到好消息很受鼓舞,我会全力推进交付。

针对接下来的步骤逐一回复:

1. 钱包地址

我的 CKB 收款地址如下

ckb1qrgqep8saj8agswr30pls73hra28ry8jlnlc3ejzh3dl2ju7xxpjxqgqqynmx484fhf5v04kjn9yar0hjshwfuk4v5383emd

首笔 20% 发放后我会确认到账。剩余部分我倾向于在每周同步时按阶段进展申请,这样也便于双方对齐交付节奏。

2. 每周同步

我偏好在本帖进行文字同步,这样讨论内容也方便后续回溯查阅。

时间方面,每周一更新对我比较方便,我会在当天发布上一周的进展、遇到的问题以及下一周的计划。如果委员会有其他时间偏好也可以协调。我计划于3.16日 即下周一开展工作 第一次汇报于3.23 即下下周一。

3. 提案锁定

了解,没有问题。当前版本作为验收基准我完全认可。如果开发过程中发现需要调整的地方(比如 eBPF 在某些内核版本上的兼容性导致技术路线微调),我会在周同步中提前沟通。

再次感谢。

6 Likes