Spark Program | Federated Wallet Behaviour Intelligence for Nervos CKB

A privacy-preserving, decentralised machine learning system for identifying non-human wallet activity and suspicious wallets at scale

Spark Grant Request: $1,500
Duration: 1 month + 2 week buffer (6 weeks total)

1. Overview

We are building a collaborative, federated machine learning system that monitors and classifies on-chain wallet behaviour on the Nervos CKB blockchain. Its core purpose is to distinguish between wallets operated by genuine human users and those driven by automated bots, DEX trading agents, or exchange-related non-human activity and later on labelling wallets used in fraudulent activities without any organisation needing to expose their raw user data to a central party.

Current status: We have already started the research and product development. A small working prototype exists. This Spark grant will rapidly accelerate the work enhancing the model, onboarding partners, building reliable dashboards, collecting labeled data, and releasing a production-ready system that any CKB user or wallet provider or generally any team can access.

Timeline: 1 month of focused work with a 2-week buffer for testing and iteration (6 weeks total).

2. Current Status & What We Have Built

Completed work

  • Research & architecture design
    Federated learning architecture for wallet classification

  • Flower federated server
    FedProx aggregation, model distribution

  • Flower client implementation
    Local training, weight update submission

  • FedProx strategy integration
    Proximal term for heterogeneous data

  • Local training pipeline
    PyTorch-based classifier training

  • Model inference
    Query endpoint for wallet classification

  • Initial feature research
    Temporal and graph feature definitionsdatasets

Milestone + Budget

week 1 - $200 | CKB-CCC + Explorer API integration
Connect to CKB mainnet/testnet RPC via CKB-CCC, fetch transaction history for any lock hash, pull enriched data from Explorer API (timestamps, counterparties, Cell consumption patterns)

Week 2 - $250 | Feature extraction pipeline + continued data curation
Extract temporal features (inter-transaction intervals, burst detection, regularity scores) and graph features (centrality, clustering, flow ratios). Continue curating a goal of 500 human + 500 non-human labeled wallets, we already trained the model with 100 non-human wallet.

Week 3 - $300 | Complete labeled dataset + global model training
Finalize 500 human + 500 non-human wallet dataset. Train FedProx global model on CKB data, validate accuracy (>75% target), produce ~100MB model weights

Week 4 - $350 | Production web UI + finetuner dashboard + partner integrations
Launch complete web UI with wallet classification, whitelisted finetuner dashboard for training submissions, Mosaic Africa integration, API endpoint for any wallet provider (JoyID, Neuron, etc.)

Week 5 - $250 | Testing + iteration + onboarding
Cross-browser QA, mobile responsiveness, accuracy improvements, bug fixes, user feedback collection, onboard 2+ additional projects

Week 6 - $150 | Public release + documentation + completion
Open source launch (MIT license), API documentation, live demo page, demo video, completion report

3. Problem Statement

Blockchain networks are inherently transparent, yet identifying whether a wallet is controlled by a real human trader, an automated trading bot, or an exchange wallet remains a difficult challenge.

Organisations building on CKB currently lack a shared, privacy-preserving mechanism to detect such behaviour. A DEX cannot easily tell if a trader is human or bot. An airdrop platform cannot easily filter Sybil wallets. A wallet provider cannot easily warn users about suspicious counterparties.

Training a centralised detection model would require pooling sensitive user data across organisations which is a privacy and compliance risk. At the same time, any single organisation’s data is unlikely to capture the full breadth of abnormal human behaviour seen across the entire network.

4. Proposed Solution

We introduces a federated learning framework , built on Flower (flwr) , in which each participating organisation contributes to improving the shared model without exposing their raw user data.

Key innovation for user experience: Participating organisations do NOT need to download the model or run complex infrastructure. Everything is abstracted behind a simple web UI.

5. How it works for participating organisations (Partner organisations finetuning)

The organisation only needs to:

  1. Get whitelisted as a finetuner

  2. Log into the web UI

  3. Submit wallet addresses with labels (human/non-human)

  4. Click “Start Training”

Everything else is an internal workflow load and abstracted away.

6. How it works for inference-only users (no training)

No model download required. No infrastructure setup. Just use the UI or API.

7. Example integration: A Randomn wallet

**

8. Inference output: Binary classification with confidence

The model generates a binary output along with a confidence score. An output of 1, for example with 60% confidence, indicates that the behaviour is consistent with genuine human trading. Conversely, an output of 0, for example with 30% confidence, signifies that the behaviour matches patterns typical of bots, DEX agents, or exchanges.

9. Continuous improvement through federated learning

Once the global model is deployed, whitelisted finetuners can:

  1. Submit additional labeled wallet addresses

  2. Trigger local training rounds

  3. Contribute weight updates to improve the global model

All organisations and wallet providers benefit from improvements, even if they never train.

10. Input Features

The model learns from two primary categories of on-chain signals derived from CKB transaction records.

Transaction frequency and timing

Human patterns are characterized by irregular, variable inter-transaction intervals, occasional and context-driven burst patterns, time-of-day distribution that aligns with human hours (which varies by region), and a random transaction cadence. In contrast, non-human patterns feature regular, deterministic inter-transaction intervals, frequent and sustained burst patterns, 24/7 activity regardless of time of day, and a periodic transaction cadence, such as every 60 seconds.

Wallet graph relationships and address patterns

Human patterns show a diverse and irregular interaction graph, with mixed send and receive flow directionality, and low graph centrality. Non-human patterns, on the other hand, are characterized by star topologies, chains, or tight clusters in the interaction graph, primarily one-directional flow, and high graph centrality, meaning a wallet acts as a hub for many other wallets

11. Technical Implementation

Data extraction on CKB

We use two primary data sources for feature extraction:

CKB-CCC , historical data endpoint

CKB Explorer API, transactions and wallet endpoint:

Model size and infrastructure

The global model size is approximately ~100 MB. Inference does not require any download, as the user interface or API abstracts the model away. Training is performed on backend server hardware, not on user devices. As a result, user requirements are minimal , only a web browser is needed.

12. User Interface & Experience

Training UI (for whitelisted finetuners only)

13. Justification for $1,500:

  • Two specialized developers ( Backend Developer + ML Engineer)

  • Existing work already started (research + prototype)

  • Dataset curation effort (500 human + 500 non-human wallets)

  • API development for any wallet provider to integrate

  • Post-grant commitment to onboarding 3+ projects

  • 1 month rapid execution + 2 week buffer for quality

14. Team

We are two developers with complementary expertise in blockchain infrastructure, applied machine learning, and production systems. We’ve already built a working prototype, validated the approach, and are now moving toward a full production release.

Fadhil Mulinya - Backend & Blockchain Developer

  • Experienced in CKB as a nervos catalyst program participant, experienced in CKB-CCC, RPC interfaces, and transaction data modelling.
  • Built the current node data extraction pipeline and FL client-server integration.
  • Focuses on reliable, production-grade infrastructure that works for real partners.

Paul Wako - Machine Learning Engineer

  • Specialises in privacy-preserving ML, federated learning (Flower / PyTorch), and feature engineering from sequential data.
  • Designed the current model architecture, FedProx training loop, and feature extraction logic.
  • Ensures the model is lightweight, explainable, and practical for real-world wallet classification.

Project GitHub

4 Likes

Hi @mulinya,欢迎你在 Spark Program 提案!

以下是我在提交委员会审核前的一些个人看法,供你参考,不代表委员会立场。
通读提案后,我梳理了以下几点可能影响委员会评审的问题,建议你在正式评审前做针对性调整:

1. 提案结构顺序混乱,可读性较差。

当前提案的章节顺序:概述 → 已完成工作(里程碑+预算) → 问题陈述 → 解决方案 → 工作流程 → 技术实现 → 界面设计 → 预算 → 团队

这明显不符合 Spark Program 申请模板的标准格式,请务必将相关内容整理后置于同一章节。

2. 缺少独立的“交付物”和“如何验证”章节。

目前提案没有独立且完整的 Deliverables(交付物) 和 How to Verify(验证方式)章节;社区和委员会成员目前都很难直观了解该项目的交付物有哪些,以及无从了解如何直接有效地以非代码审查的方式验证你的工作。

建议:综合前两个问题,按照 Spark Program Mini-Grant 模板 重新组织章节。

3. 预算额度 $1,500 超出纯技术类项目 $1,000 标准上限。

根据 Spark Program 资助方案,常规项目资助上限为 $1,000,特殊项目经委员会审议后可提高至 $2,000。目前你的提案申请 $1,500,但没有说明为什么这个纯技术类项目需要突破 $1,000 的标准线。

建议:将预算调整至 $1,000 以内;

4. 提案未回应"CKB 上是否存在女巫攻击问题"这一核心前提。

提案在"问题陈述"部分假设了"空投平台难以过滤女巫钱包"和"DEX 难以识别人类与机器人"等场景。但需要指出的是:CKB 是一条以记录重要数据为目的的链,任何个体在使用 CKB 保存重要数据时都需支付对应的 Cell 占用成本(State Rent)。这与那些依赖低成本的账户模型的公链有本质区别——在 CKB 上,"批量创建低成本钱包"本身就需要持续支付存储成本,天然提高了女巫攻击的门槛。除非"女巫钱包"有破坏 Fiber Network 中交易链路的能力,否则在 CKB 生态中,"区分人类与非人类钱包"这一需求的真实性和紧迫性似乎有待论证。

建议:在问题陈述中明确说明,为什么 CKB 生态需要这个工具——你的目标用户是谁(钱包开发商?DEX?审计团队?),他们在什么具体场景下面临"无法识别人类 vs 机器人"的痛点?这个痛点是否值得通过外部工具来解决,而非通过现有数据即可简单判断?

以上就是我梳理的几个关键问题,供你参考。在你调整后我会提请委员会尽快进入正式评审,相信修改后的提案会让评审过程顺畅很多。

期待你的更新版本。
祝好,
行天

2 Likes