Datasetfilenotes alpaca-chinesealpaca-chinese-52k.json包含了52k英文和中文的数据全集 alpaca-chinese./data/alpaca_chinese_part*.json分拆数据文件 Case1成语:有一些sample,直译后需要进行二次改写,例如成语类的 { "en_instruction": "What is the meaning of the following idiom?", "instruction": "以下成语...
Clean everything up: docker-compose down --volumes --rmi all Notes We can likely improve our model performance significantly if we had a better dataset. Consider supporting theLAION Open Assistanteffort to produce a high-quality dataset for supervised fine-tuning (or bugging them to release their...
This is what the Alpaca dataset can give us. Beyond that, ideally we’d like the model to be able to hold the conversation by remembering what transpired previously. For example, if you say “what did I ask you in my previous sentence”, the model should answer that you asked about the...
Assistant: Yes, certainly. To clean your screen, you first need to use a microfiber cloth or ...
Alpaca是由Meta的LLaMA 7B微调而来的全新模型,仅用了52k数据,性能约等于GPT-3.5。关键是训练成本奇低,…
This component can help clean your data in the dataset. LLM-N-Gram Repetition Filter (MaxCompute)-1 Filters text samples in the text field based on the character-level N-gram repetition rate. The component moves an N-character window across the text to generate contiguous sequences of N ...
PreprocessCleaned Alpacatraining dataset: python data_loading.py preprocess_alpaca \ --path_in data/alpaca_clean.json \ --path_out data/train.json If you want to useGPT4Alldata, you can use this command: python data_loading.py preprocess_gpt4all --path_out data/train.json ...
AlpacaEval dataset: a simplification of AlpacaFarm's evaluation set, where "instructions" and "inputs" are merged into one field, and reference outputs are longer. Details here.When to use and not use AlpacaEval? When to use AlpacaEval? Our automatic evaluator is a quick and cheap proxy fo...
-[ ]clean training code -[ ]write the second phase plan for Luotuo We plan to use this Luotuo project as the git repository for the entire Chinese LLM project. After the completion of the original Luotuo: LLaMA-LoRA, it will be migrated to Luotuo-vanilla. The CamelBell, Loulan, Silk-Ro...
* rm weighted lb * compute all leaderboard * compute all leaderboard * 18 -> 21 price human * add all the annotations * jsonify annotations * jsonify annotations * [CLEAN] move all annotations to be annotator dependent * update weighted lb * format sample sheet * format sample sheet * ...