Multi-Agent Runtime & White-Box
ObservabilityBuilt a three-stage runtime
across data preparation, collaborative analysis, and report generation on top of PocketFlow, and
standardized statistical-analysis tools through MCP to reduce fragmented tool invocation, poor
extensibility, and limited auditability in agent workflows. Also developed a Streamlit white-box
console exposing step-level traces, tool I/O, and node scheduling paths, reducing exception
diagnosis time for complex tasks from hours to 10 minutes.
Collaborative Orchestration & Conflict
ResolutionDesigned ForumHost as the explicit
orchestration hub for open-domain public-opinion analysis, coordinating DataAgent for quantitative
analysis over labeled data and tools, and SearchAgent for real-time facts and context via Tavily.
Through iterative evidence supplementation, cross-challenge, conflict detection, and stopping
conditions, the workflow constrained multi-agent collaboration from free-form dialogue into a
controllable process and mitigated context pollution and objective drift. In representative
head-to-head validation, key factual error rate dropped from about 20% to 0 versus a single-agent
baseline using only DataAgent, and the factual portions of the final report required no further
manual correction.
Structured Reporting & Citation
ConstraintsTo address snowball
hallucinations, unconstrained elaboration, and citation drift in long-horizon report generation,
embedded structured output schemas, inline citations, source backtracking, and reliability
annotations into chapter-by-chapter generation. Each core claim had to bind to evidence, source,
and confidence score, with a verifier agent performing secondary checks on critical conclusions and
cited evidence. Final reports reached 95% citation coverage, while incorrect citation rate was
reduced to 3%.
High-Concurrency Data Processing
PipelineBuilt an Async-based high-concurrency
cleaning and semantic annotation pipeline for massive unordered raw posts. Few-shot prompted LLM
nodes handled text cleaning, label completion, and semantic normalization, while retry logic,
resumable execution, and concurrency control mitigated API rate limits and stabilized the data
foundation for downstream analysis. Under a 40-60 concurrency budget, processing time for 20,000
records was reduced to under 12 hours, and input-token cost dropped by about 60% by caching
few-shot examples and system prompts.