此外human-generated data应该比LLM训练数据的data时间晚(否则可能已经在LLM训练数据中了)。 肉眼区分人工文本和AIGC内容可以看到:(1)LLM生成的文本较少情绪化和客观(2)人类作者经常使用感叹号、问号和省略号来表达他们的情绪,而LLM生成的答案更加正式和结构化(3)人类编写的文本比LLM生成的文本更连贯,后者倾向于在段...
Awesome LLM-generated Text DetectionThe powerful ability of large language models (LLMs) to understand, follow, and generate complex languages has enabled LLM-generated texts to flood many areas of our daily lives at an incredible rate, with potentially negative impacts and risks on society and ...
(1)DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature检测某段文本是否是LLM模型生成,基本思路是扰动文本,让LLM生成概率。看上去是自己检验自己但不太一样,而且可能不适合ChatGPT(2)GLTR: Statistical Detection and Visualization of Generated Text上面那篇是检测文本是否固定来源于某一...
This information theoretical results rely on a key quantity called Chernoff information, which may guide the design of Watermarks of LLMs. We derived sample complexity bounds to guide the possibility of AI-generated text detection. Empirical Demonstrations ...
Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present...
Understanding the Ef f ects of Human-written Paraphrases inLLM-generated Text DetectionHiu Ting Lau, Arkaitz ZubiagaSchool of Electronic Engineering and Computer Science, Queen Mary University of London London E1 4NSARTICLE INFOKeywords:LLM-generated text detectionhuman-written paraphraseslarge language...
based on a simple idea: most decoder-only, causal language models have a huge overlap in pretraining datasets, for e.g. Common Crawl, Pile, etc. More details about the method and results can be found in our paperSpotting LLMs with Binoculars: Zero-Shot Detection of Machine-Generated Text...
1. 权重平均和模型融合可将多个 LLM 组合成单个更好的模型,并且这个新模型还没有传统集成方法的典型缺陷,比如更高的资源需求。 2. 代理调优(proxy-tuning)技术可通过使用两个小型 LLM 来提升已有大型 LLM 的性能,这个过程无需改变大模型的权重。 3. 通过将多个小型模块组合起来创建混合专家模型,可让所得 LLM ...
此项研究对自然语言处理领域带来了重要启示,尤其是在未来优化提示工程、改进RAG(Retrieval-Augmented Generation,检索增强生成)系统和幻觉检测(hallucination detection)等应用时,将为AI工具的改进提供有力的参考基础。同时,研究也引发了对AI在问答系统设计中可能带来的伦理和实用性问题的思考:如何确保生成的问题有助于提升...
(1)DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature检测某段文本是否是LLM模型生成,基本思路是扰动文本,让LLM生成概率。看上去是自己检验自己但不太一样,而且可能不适合ChatGPT(2)GLTR: Statistical Detection and Visualization of Generated Text上面那篇是检测文本是否固定来源于某一...