训练的数据我们使用的是 train_v2_drcat_02.csv数据,整合了kaggle平台的公共数据集,其中数据都是通过各种型号的LLM模型生成的,如以下图 数据集为3w条数据 概述 早期我们采用机器学习来做,使用7个提示词,每个提示词抽1500条数据作为验证集,使用TF-ide来做词嵌入,n-gram为(1,3) 本地cv auc为0.99,但提交的LB...
第一名的方案中,做了大量的数据清洗工作,用了mistral-7b和deberta等模型做集成,最佳模型为mistral-7b,private得分达到了惊人的0.966,优于deberta和一众机器学习算法,而这个大模型方案https://www.kaggle.com/code/minhsienweng/infer-mistral-7b-v0-llama-2-7b-deberta-v3基于和本方案相近的数据集微调mistral-7b,p...
'prompt_name', 'label']] train = standardize_categories(train) train_old = pd.read_csv("/kaggle/input/llm-detect-ai-generated-text/train_essays.csv") train_old.rename(columns={'generated': 'label'}, inplace=True) train_old['prompt_...
1. Set Kaggle Api exportKAGGLE_USERNAME="your_kaggle_username"exportKAGGLE_KEY="your_api_key" 2. Download Large Dataset (If you want to train a language model to finish this task) cdlarge_dataset sudo apt install unzip kaggle datasets download -d lizhecheng/llm-detect-ai-generated-text-data...
This repository contains the source code and resources for a binary classification project aimed at detecting AI-generated texts. The project is based on the Kaggle competition and utilizes a variety of classical machine learning models as well as a fine-tuned DistilRoBERTa model to achieve its goa...
LLM - Detect AI Generated Text Identify which essay was written by a large language model OverviewDataCodeModelsDiscussionLeaderboardRules Oh no! Loading items failed. We are experiencing some issues. Please try again, if the issue is persistent pleasecontact us. ...
Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more OK, Got it. Something went wrong and this page crashed! If the issue persists, it's likely a problem on our side. Unexpected end of JSON inputkeyboard_arrow_upcontent_...
Logs check_circle Successfully ran in 542.7s Accelerator None Environment Latest Container Image Output 388.6 kB Something went wrong loading notebook logs. If the issue persists, it's likely a problem on our side. Refresh chevron_right
View Active Events V1o1_oo·10mo ago· 445 views arrow_drop_up7 Copy & Edit28 more_vert Copied from Denis Ding (+4,-47) historyVersion 3 of 3chevron_right Runtime play_arrow 9m 3s Language Python Competition Notebook LLM - Detect AI Generated Text ...
Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts. - wordpiece tokenizer · Lizhecheng02/Kaggle-LLM-Detect_