Spark Program | Nervos Brain - A Global Developer Onboarding Engine and Cross-Language Hub Powered by Agentic RAG

IrisNeko · April 20, 2026, 11:25pm

第六周周报（中文）

一、本周目标

本周延续第五周“已上线可用”的基础，重点从“能跑”转向“稳定可用”。核心目标是提升多轮对话中的上下文延续能力，减少重复追问，并让图引擎在不确定场景下具备更清晰的决策路径。

二、本周完成

多轮补参与线程恢复闭环完善
已打通 AskUser -> checkpoint 挂起 -> 用户补充 -> 恢复执行 主链路，支持在同线程内恢复上下文继续检索与回答，减少“重新从头问一遍”的体验问题。
通用反思决策模块落地
pre/post 两阶段反思逻辑已统一到同一决策框架，支持 continue_retrieval / ask_user / revise_answer / accept_answer 四种动作，并结合不确定度阈值、hop 上限、反思轮次上限做收敛控制。
回答生成稳态与兜底能力增强
对回答生成流程补齐了重试与异常兜底；当回答阶段异常或反思轮次超限时，系统会给出可解释降级结果，避免静默失败或空响应。
Telegram / Discord 在线运行时一致性增强
两端 gateway 都统一接入 full graph 主流程，并对过滤、异常回退、消息格式适配、分段发送等行为进行了补齐，线上入口行为更一致。
测试与回归验证
现有非集成测试已完整回归通过：pytest -m 'not integration' 结果为 260 passed, 2 deselected，覆盖图引擎、记忆服务、平台适配、检索与工具运行时等关键模块。

三、阶段性成果

本周完成后，系统已从“单轮可用”进一步推进到“多轮可持续交互”阶段：

追问后可恢复执行，不再依赖人工重述完整问题。
图引擎在证据不足、证据冲突、回答不稳定场景下有了更明确的自动分流策略。
双平台在线链路与离线演示链路的行为一致性显著提升，便于后续评测与迭代。

四、当前问题

运行日志仍存在噪音，测试结束阶段偶发第三方库清理日志报错（不影响通过率，但影响可读性）。
工具协议层已定义多工具（如 discourse_query、github_search），但当前运行时主路径仍以 qdrant_search / memory_fetch 为主，工具闭环仍需继续补齐。
多轮效果已有机制保障，但缺少按任务类型拆分的量化指标面板（例如上下文延续率、追问命中率、平均追问轮次）。

五、下周计划（Week 7）

日志治理与可观测性优化：降低噪音日志，统一关键错误码与诊断视图。
补齐工具执行闭环：推进 discourse_query / github_search 在运行时的标准化接入与超时/幂等策略对齐。
建立多轮评测集：按“方案推荐 / 开发指导 / 排障定位”三类任务沉淀样例并量化跟踪。
强化线上稳定性验证：补充 Telegram/Discord 端到端长对话与异常路径压测。

Week 6 Report (English)

1. Weekly Goal

Following Week 5’s online availability milestone, this week focused on stability and multi-turn quality: preserving context across clarification turns, reducing repeated follow-up questions, and making graph decisions more deterministic under uncertainty.

2. Completed Work

Clarification-resume loop improved
Completed AskUser -> checkpoint suspend -> user supplement -> resume flow, enabling in-thread continuation instead of restarting from scratch.
Unified reflection decision framework
Implemented a unified pre/post reflection decision set:
continue_retrieval / ask_user / revise_answer / accept_answer, with uncertainty threshold, hop caps, and reflection-round limits.
Answer generation resilience upgrades
Added retry and fallback behavior for answer composition, with clearer degradation behavior when composition fails or post-reflection rounds are exhausted.
Telegram/Discord runtime consistency improvements
Both gateways now align on full-graph execution behavior, including filtering, fallback handling, output formatting, and segmented delivery.
Regression verification
Non-integration regression tests passed:
pytest -m 'not integration' → 260 passed, 2 deselected.

3. Milestone Outcomes

The system moved from “single-turn usable” to a more reliable multi-turn stage:

Clarification can resume execution with preserved thread context.
Evidence-insufficient/conflicting/unstable-answer cases now have clearer automatic routing behavior.
Better parity between online runtime paths and offline demo paths for future evaluation.

4. Current Issues

Log noise still exists (occasional third-party cleanup logging errors after test completion).
Tool schema supports more tools (discourse_query, github_search), while runtime execution is still centered on qdrant_search / memory_fetch.
Multi-turn mechanisms are in place, but task-type-based quantitative dashboards are still missing.

5. Plan for Week 7

Logging and observability cleanup with clearer diagnostics.
Complete runtime tool-loop coverage for discourse_query / github_search with timeout/idempotency alignment.
Build a multi-turn evaluation set across recommendation/guidance/troubleshooting scenarios.
Add longer end-to-end stability and failure-path tests for Telegram/Discord.