100元2小时不限次数电话号码,全国空降200元快餐联系方式,24小时微信快餐妹,全国同城约资源匹配系统

科学研究
数苑经纬讲坛
当前位置: 学院主页 > 科学研究 > 数苑经纬讲坛 > 正文

数苑经纬讲坛(11):A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models

发布时间:2025-09-01 作者: 浏览次数:

报告时间:2025年6月20日(周五)上午10:00 - 11:00

报告地点:新文科楼403会议室

报告人:张林俊,教授,美国罗格斯大学

Abstract:Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the inclusion of copyrighted materials in their training data without proper attribution or licensing, which falls under the broader issue of data misappropriation. In this article, we focus on a specific problem of data misappropriation detection, namely, to determine whether a given LLM has incorporated data generated by another LLM. To address this issue, we propose embedding watermarks into the copyrighted training data and formulating the detection of data misappropriation as a hypothesis testing problem. We develop a general statistical testing framework, construct a pivotal statistic, determine the optimal rejection threshold, and explicitly control the type I and type II errors. Furthermore, we establish the asymptotic optimality properties of the proposed tests, and demonstrate its empir...


主站蜘蛛池模板: 棋牌| 安西县| 图木舒克市| 通州区| 红原县| 进贤县| 潮州市| 旌德县| 静海县| 黄浦区| 兴和县| 昔阳县| 浪卡子县| 牡丹江市| 东辽县| 柳州市| 交城县| 锦州市| 郧西县| 锡林浩特市| 吐鲁番市| 安乡县| 班玛县| 正宁县| 铅山县| 青神县| 山阴县| 东兴市| 普兰店市| 陇南市| 沙雅县| 奈曼旗| 安溪县| 临泽县| 阜南县| 昌宁县| 达州市| 海林市| 普兰店市| 昌黎县| 海淀区|