About me

Hyacehila

********************

********************

Current Work & Interests

  • Data Science

    Extracting insights from data to drive informed decisions.

  • LLM Agents

    Developing intelligent agents that leverage large language models for complex task automation.

  • LLM Evaluation

    Developing comprehensive frameworks for assessing language model performance and safety.

  • Apps development

    Development of applications for iOS and other mobile platforms.

Latest Blog Posts

My Friends

Resume

Education

  1. Wuhan University

    Sep 2025 — Jul 2027

    Currently pursuing Master's degree in Applied Statistics, with research focusing on LLM Agent, Evaluation, and Agentic Reinforcement Learning. Aiming to build powerful LLM AI Agents to solve real-world problems.

  2. Xidian University

    Sep 2021 — Jul 2025

    Bachelor's degree in Statistics. Established solid theoretical foundation in statistical analysis, machine learning, and data mining. Participated in national data modeling competitions to develop practical implementation skills.

Experience

  1. Algorithm Researcher (Intern) | NSFOCUS Technology (Wuhan)

    Dec 2025 — Mar 2026
    Security Data Pipeline
    • Automated Data Pipeline: Built a security data preprocessing and feature extraction pipeline over GitHub repositories and internal databases; extracted and validated 8,000+ CVE instances, distilled about 4,000 high-quality samples, and produced multi-dimensional CWE/OWASP labels.
    • Sample Mixing for Training Targets: Mixed samples to align vulnerability-type and programming-language distributions with training targets.
    CodeQL Review Agent
    • Single-Agent Review Loop: Built a single-agent workflow for CodeQL-based code review, covering information retrieval, repository navigation, taint analysis, and engine-backed validation. Added reflection from real execution feedback to improve self-correction and long-horizon decision-making.
    • SFT & RL Data Synthesis: Analyzed 500 raw CVEs to distill 2,500 structured SFT trajectories with high-confidence tool-use patterns and 300 verified taint-flow examples for cold start and downstream RL training.
    RL Baseline & Alignment Roadmap
    • RL Baseline: Completed a rule-based reward design and the initial RL training pipeline using high-quality taint-flow supervision.
    • Advanced Alignment Roadmap: Proposed a next-stage alignment roadmap that combines Faiss+LSH-based vector sampling, curriculum-style difficulty control, and an LLM-as-Judge reward system for long-horizon taint analysis.

Research

  1. Trustworthy Public Opinion Analysis Multi-Agent System

    Led multi-agent architecture design, focusing on black-box observability, long-horizon generation drift, and evidence traceability in open-domain agent systems.
    Multi-Agent Runtime & White-Box ObservabilityBuilt a three-stage runtime across data preparation, collaborative analysis, and report generation on top of PocketFlow, and standardized statistical-analysis tools through MCP to reduce fragmented tool invocation, poor extensibility, and limited auditability in agent workflows. Also developed a Streamlit white-box console exposing step-level traces, tool I/O, and node scheduling paths, reducing exception diagnosis time for complex tasks from hours to 10 minutes.
    Collaborative Orchestration & Conflict ResolutionDesigned ForumHost as the explicit orchestration hub for open-domain public-opinion analysis, coordinating DataAgent for quantitative analysis over labeled data and tools, and SearchAgent for real-time facts and context via Tavily. Through iterative evidence supplementation, cross-challenge, conflict detection, and stopping conditions, the workflow constrained multi-agent collaboration from free-form dialogue into a controllable process and mitigated context pollution and objective drift. In representative head-to-head validation, key factual error rate dropped from about 20% to 0 versus a single-agent baseline using only DataAgent, and the factual portions of the final report required no further manual correction.
    Structured Reporting & Citation ConstraintsTo address snowball hallucinations, unconstrained elaboration, and citation drift in long-horizon report generation, embedded structured output schemas, inline citations, source backtracking, and reliability annotations into chapter-by-chapter generation. Each core claim had to bind to evidence, source, and confidence score, with a verifier agent performing secondary checks on critical conclusions and cited evidence. Final reports reached 95% citation coverage, while incorrect citation rate was reduced to 3%.
    High-Concurrency Data Processing PipelineBuilt an Async-based high-concurrency cleaning and semantic annotation pipeline for massive unordered raw posts. Few-shot prompted LLM nodes handled text cleaning, label completion, and semantic normalization, while retry logic, resumable execution, and concurrency control mitigated API rate limits and stabilized the data foundation for downstream analysis. Under a 40-60 concurrency budget, processing time for 20,000 records was reduced to under 12 hours, and input-token cost dropped by about 60% by caching few-shot examples and system prompts.
  2. Unveiling the Drivers of PTSD: An Interpretable Machine Learning Approach with SHAP

    International Conference on Intelligent Computing and Data Analysis 2025 ; EI
    研究与技术方案为深入分析903名应急人员的PTSD影响因素,采用可解释机器学习架构。构建基于决策树集成的随机森林预测模型,并引入SHAP(Shapley Additive exPlanations)和PDP(Partial Dependence Plots)对黑盒模型进行全方位解构,结合Friedman H-statistic深度验证变量间的复杂非线性交互效应。
    核心发现与结论精准量化了心理韧性对PTSD的缓冲作用(SHAP值为0.5061),并敏锐捕捉到年龄分布上的U型非线性规律。研究不仅验证了韧性、年龄与创伤暴露间的深层交互模式,更为不同年龄段的高危职业人群提供了数据驱动的差异化干预基准。

Awards & Certificates

  1. 全国大学生统计建模大赛陕西省一等奖

  2. 中国高校SAS数据分析大赛全国三等奖

  3. 美国大学生数学建模竞赛二等奖

  4. CET4: 510 | CET6: 513

Projects

  • 项目

    Novel Evaluation:基于 LLM Rubric 的小说结构化评测工具

    个人研究项目 · LLM as Judge / LLM Evaluation
    已完成
    项目背景:

    面向小说质量评测这一高度主观且难以标准化的问题,项目尝试引入 LLM Rubric / LLM as Judge,将原本依赖经验与直觉的文本判断拆解为多个可解释维度,对其进行分层评估与量化表达。希望在保留文学判断弹性的同时,让评测过程更系统化且可迭代改进。

    技术实现:

    项目围绕分阶段评测流程构建,包含输入预检查、8 轴 LLM Rubric 评分、一致性整理、聚合与最终结果投影。8 轴主要覆盖平台契合程度、商业性、核心优点、核心缺点等关键维度,并通过严格 JSON 与 schema 约束保证 LLM 输出结果可被利用。系统层面则完成前端界面、后端任务执行与数据存储。

    项目成果:

    形成了包含完整前端交互界面、后端任务执行与数据库的评测工具,支持任务创建、状态跟踪、结果展示与历史回访。可输出总体判断、平台候选、市场判断等 8 轴评价结果与总体评价。验证了将 LLM Rubric 方法引入到文学评价这一高度主观且评价角度分散领域的可行性。

  • 项目

    基于可穿戴设备技术的心理健康检测方案

    国家级大学生创新创业训练计划项目
    已完成
    项目背景:

    传统心理健康评估依赖主观问卷,存在评估间断性和主观偏差问题。随着可穿戴设备的普及,连续监测生理指标成为可能,项目旨在构建基于可穿戴设备数据的心理健康评估模型,实现客观、连续的心理状态监测,为心理健康评估提供新的技术途径。

    技术实现:

    从NetHealth数据集四年观测数据中筛选出43个关键变量,构建约3万行的多时间尺度数据集;采用多重插补链式方程(MICE)处理缺失值,应用Boxplot检测并移除离群值,对偏态数据实施BoxCox变换,使用PCA对于高度复共线性的原始时间序列数据进行了维数缩减;构建ElasticNet、RandomForest、XGBoost三个基模型,使用了决策树Stacking集成策略,最终通过网格搜索优化整体超参数。

    项目成果:

    集成模型达到MSE 0.032的优异性能,显著优于所有基模型;成功实现了从可穿戴设备数据到心理健康状态的端到端预测;项目获得国家级创新创业训练计划立项支持,为心理健康监测提供了新的技术路径。

  • 竞赛

    中国现代化下人民健康状况测度与影响因素

    全国大学生统计建模大赛
    已完成
    项目背景:

    随着现代化进程加快,传统单一健康指标难以全面反映中国居民复杂的健康状况变化。本项目旨在构建一个综合性的健康评价体系,多维度评估人民健康状况及其影响因素,为现代化背景下的公共卫生政策提供数据支撑。

    技术实现:

    整合国家统计局数据库和商业数据库中的多源健康指标,利用爬虫技术从CNKI获取医疗研究重心变化趋势;应用主成分分析处理数据库中获取的健康指标,基于系数解释性选择前两个具备实际意义的主成分作为和健康测度;构建LASSO回归模型,通过网格搜索在系数稀疏性和模型R²间寻找最优平衡点,确保模型的可解释性和预测能力。

    项目成果:

    最终模型达到R²约0.89的优异拟合效果,成功识别出影响中国居民健康状况的关键因素;构建的健康测度体系为现代化进程中的健康评估提供了新视角并最终获得赛区一等奖。