About

hyacehila is my long-term online ID. It comes from hyacinth, my favorite plant. I later reshaped it into a lighter, more name-like form: hyacehila. The ending -hila / -ila gives it a small, airy, fictional texture, so to me it is not only a username, but also a hyacinth sprite living inside complex structures.

This ID is close to how I understand technology: moving through dense systems, toolchains, workflows, and uncertain environments to find a natural, explainable path that actually solves the problem. I care about general problem-solving patterns and about whether technology can transfer and generalize across real scenarios.

Today I mainly focus on AI Agent deployment and Evaluation, which I see as two of the most important technologies for bringing large models into engineering practice and industrial pipelines. I study both Single-Agent and Multi-Agent systems: how they are designed, evaluated, constrained, and eventually embedded in real business workflows rather than left at the benchmark level. Writing commercial fiction is a side interest.

You can call me Julian or Jules.

Current Work & Interests

  • Data Mining

    Discovering hidden patterns in large-scale data and turning them into practical insights that guide better decisions.

  • LLM Agents

    Building LLM-powered agents that plan, use tools, and automate complex workflows with reliable task execution.

  • LLM Evaluation

    Measuring how model outputs align with human expectations and turning feedback into signals for better training.

  • Client-Side Dev

    Building client-side apps that turn user needs into clear, responsive interfaces and practical product experiences.

Latest Blog Posts

Friends

Resume

Education

  1. Wuhan University

    Sep 2025 — Jul 2027

    Currently pursuing Master's degree in Applied Statistics, with research focusing on LLM Agent, Evaluation, and Agentic Reinforcement Learning. Aiming to build powerful LLM AI Agents to solve real-world problems.

  2. Xidian University

    Sep 2021 — Jul 2025

    Bachelor's degree in Statistics. Established solid theoretical foundation in statistical analysis, machine learning, and data mining. Participated in national data modeling competitions to develop practical implementation skills.

Experience

  1. Algorithm Researcher (Intern) | NSFOCUS Technology (Wuhan)

    Dec 2025 — Mar 2026
    Security Data Pipeline
    • Automated Data Pipeline: Built a security data preprocessing and feature extraction pipeline over GitHub repositories and internal databases; extracted and validated 8,000+ CVE instances, distilled about 4,000 high-quality samples, and produced multi-dimensional CWE/OWASP labels.
    • Sample Mixing for Training Targets: Mixed samples to align vulnerability-type and programming-language distributions with training targets.
    CodeQL Review Agent
    • Single-Agent Review Loop: Built a single-agent workflow for CodeQL-based code review, covering information retrieval, repository navigation, taint analysis, and engine-backed validation. Added reflection from real execution feedback to improve self-correction and long-horizon decision-making.
    • SFT & RL Data Synthesis: Analyzed 500 raw CVEs to distill 2,500 structured SFT trajectories with high-confidence tool-use patterns and 300 verified taint-flow examples for cold start and downstream RL training.
    RL Baseline & Alignment Roadmap
    • RL Baseline: Completed a rule-based reward design and the initial RL training pipeline using high-quality taint-flow supervision.
    • Advanced Alignment Roadmap: Proposed a next-stage alignment roadmap that combines Faiss+LSH-based vector sampling, curriculum-style difficulty control, and an LLM-as-Judge reward system for long-horizon taint analysis.

Research

  1. Trustworthy Public Opinion Analysis Multi-Agent System

    Led multi-agent architecture design, focusing on black-box observability, long-horizon generation drift, and evidence traceability in open-domain agent systems.
    Multi-Agent Runtime & White-Box ObservabilityBuilt a three-stage runtime across data preparation, collaborative analysis, and report generation on top of PocketFlow, and standardized statistical-analysis tools through MCP to reduce fragmented tool invocation, poor extensibility, and limited auditability in agent workflows. Also developed a Streamlit white-box console exposing step-level traces, tool I/O, and node scheduling paths, reducing exception diagnosis time for complex tasks from hours to 10 minutes.
    Collaborative Orchestration & Conflict ResolutionDesigned ForumHost as the explicit orchestration hub for open-domain public-opinion analysis, coordinating DataAgent for quantitative analysis over labeled data and tools, and SearchAgent for real-time facts and context via Tavily. Through iterative evidence supplementation, cross-challenge, conflict detection, and stopping conditions, the workflow constrained multi-agent collaboration from free-form dialogue into a controllable process and mitigated context pollution and objective drift. In representative head-to-head validation, key factual error rate dropped from about 20% to 0 versus a single-agent baseline using only DataAgent, and the factual portions of the final report required no further manual correction.
    Structured Reporting & Citation ConstraintsTo address snowball hallucinations, unconstrained elaboration, and citation drift in long-horizon report generation, embedded structured output schemas, inline citations, source backtracking, and reliability annotations into chapter-by-chapter generation. Each core claim had to bind to evidence, source, and confidence score, with a verifier agent performing secondary checks on critical conclusions and cited evidence. Final reports reached 95% citation coverage, while incorrect citation rate was reduced to 3%.
    High-Concurrency Data Processing PipelineBuilt an Async-based high-concurrency cleaning and semantic annotation pipeline for massive unordered raw posts. Few-shot prompted LLM nodes handled text cleaning, label completion, and semantic normalization, while retry logic, resumable execution, and concurrency control mitigated API rate limits and stabilized the data foundation for downstream analysis. Under a 40-60 concurrency budget, processing time for 20,000 records was reduced to under 12 hours, and input-token cost dropped by about 60% by caching few-shot examples and system prompts.
  2. Unveiling the Drivers of PTSD: An Interpretable Machine Learning Approach with SHAP

    International Conference on Intelligent Computing and Data Analysis 2025 ; EI
    研究与技术方案为深入分析903名应急人员的PTSD影响因素,采用可解释机器学习架构。构建基于决策树集成的随机森林预测模型,并引入SHAP(Shapley Additive exPlanations)和PDP(Partial Dependence Plots)对黑盒模型进行全方位解构,结合Friedman H-statistic深度验证变量间的复杂非线性交互效应。
    核心发现与结论精准量化了心理韧性对PTSD的缓冲作用(SHAP值为0.5061),并敏锐捕捉到年龄分布上的U型非线性规律。研究不仅验证了韧性、年龄与创伤暴露间的深层交互模式,更为不同年龄段的高危职业人群提供了数据驱动的差异化干预基准。

Awards & Certificates

  1. 全国大学生统计建模大赛陕西省一等奖

  2. 中国高校SAS数据分析大赛全国三等奖

  3. 美国大学生数学建模竞赛二等奖

  4. CET4: 510 | CET6: 513

Projects

  • 项目

    Novel Evaluation:基于 LLM Rubric 的小说结构化评测系统

    个人研究项目 · LLM as Judge / 评测工作流
    已完成
    View Project
    核心问题:

    小说评测高度依赖编辑经验、平台直觉、类型判断和市场判断,标准分散且难以复用,不同作品之间也很难稳定比较。这个项目关注的是:能否用 LLM Rubric 把“像编辑一样判断”的过程拆成清晰、可解释、可迭代的评测依据,而不是只给出一个黑箱式总分。

    系统方案:

    我围绕 LLM Rubric 构建了一套本地单用户评测系统:先进行输入筛查与分析模式选择,再依次完成类型识别、8 轴 Rubric 评分、类型 lens 评估、一致性检查、聚合与最终结果投影,输出包含总体判断、评分轴和可选类型评估的结构化结果。除 Rubric 方案本身外,项目也完成了前端任务创建与结果展示、FastAPI 后端任务执行、SQLite 历史与结果持久化、Prompt 资产管理、Schema 校验以及 eval/batch 回归工具等工程化系统。

  • 项目

    基于可穿戴设备技术的心理健康检测方案

    国家级大学生创新创业训练计划项目
    已完成
    项目背景:

    传统心理健康评估依赖主观问卷,存在评估间断性和主观偏差问题。随着可穿戴设备的普及,连续监测生理指标成为可能,项目旨在构建基于可穿戴设备数据的心理健康评估模型,实现客观、连续的心理状态监测,为心理健康评估提供新的技术途径。

    技术实现:

    从NetHealth数据集四年观测数据中筛选出43个关键变量,构建约3万行的多时间尺度数据集;采用多重插补链式方程(MICE)处理缺失值,应用Boxplot检测并移除离群值,对偏态数据实施BoxCox变换,使用PCA对于高度复共线性的原始时间序列数据进行了维数缩减;构建ElasticNet、RandomForest、XGBoost三个基模型,使用了决策树Stacking集成策略,最终通过网格搜索优化整体超参数。

    项目成果:

    集成模型达到MSE 0.032的优异性能,显著优于所有基模型;成功实现了从可穿戴设备数据到心理健康状态的端到端预测;项目获得国家级创新创业训练计划立项支持,为心理健康监测提供了新的技术路径。

  • 竞赛

    中国现代化下人民健康状况测度与影响因素

    全国大学生统计建模大赛
    已完成
    项目背景:

    随着现代化进程加快,传统单一健康指标难以全面反映中国居民复杂的健康状况变化。本项目旨在构建一个综合性的健康评价体系,多维度评估人民健康状况及其影响因素,为现代化背景下的公共卫生政策提供数据支撑。

    技术实现:

    整合国家统计局数据库和商业数据库中的多源健康指标,利用爬虫技术从CNKI获取医疗研究重心变化趋势;应用主成分分析处理数据库中获取的健康指标,基于系数解释性选择前两个具备实际意义的主成分作为和健康测度;构建LASSO回归模型,通过网格搜索在系数稀疏性和模型R²间寻找最优平衡点,确保模型的可解释性和预测能力。

    项目成果:

    最终模型达到R²约0.89的优异拟合效果,成功识别出影响中国居民健康状况的关键因素;构建的健康测度体系为现代化进程中的健康评估提供了新视角并最终获得赛区一等奖。