M-AILABS 语音数据集是我们提供的首个大型免费数据集,可自由用于语音识别和语音合成的训练数据。数据主要基于LibriVox和Project Gutenberg,包含近千小时的音频和准备好的文本文件。每个片段都提供了转录,片段长度从1到20秒不等,总长度在列表中显示。文本发表于1884至1964年间,属于公共领域。音频由LibriVox项目录制,也...
Size: 2.8 GiB Source: [size=2]https://www.caito.de/2019/01/03/the-m-ailabs-speech-dataset/[/size] Description:German phrases pronounced by native speakers mainly fromLibrivox.org. The data is ready to be used on GoldenDict PC (not Android) and the “Search Bar” needs to be visibl...
I replaced the broken link with the updated one that I found on the same website here: http://www.caito.de/2019/01/the-m-ailabs-speech-dataset/master (mozilla/DeepSpeech#3703) Daniel Tinazzi authored Nov 17, 2021 1 parent 73e1e4f commit 4fa8dd3 Showing 1 changed file with 1 additi...
The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus. The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data,...
Dataset Viewer - Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export DeepSeek MCP Server - Model Context Protocol server integrating DeepSeek's advanced language models, in addition to other useful API endpoints Deepseek_R1 - A Model Context ...
ai in spreadsheets formula bot (datasetmatch llc) +1 applicable to: excel revolutionize your analytics workflow, from formula generation to data automation. 16 out of 60 adobe acrobat for microsoft teams and outlook adobe inc. +3 applicable to: office app outlook teams gain insights, edit, ...
The raw dataset which has been used in the Supplementary Information, Supplementary Fig. S4 is available from https://doi.org/10.5281/zenodo.7548492. Mouse Brain Atlas http://labs.gaidi.ca/mouse-brain-atlas has been used to navigate the imaging instrument into the desired location. Figure 1b...
因此,我们通过MiniCPM-V系列高效MLLM的引入,不仅展示了技术的先进性,同时也为AI的未来发展开辟了新的可能性,使得AI技术能够更广泛地应用于现实世界的...论文: https://arxiv.org/pdf/2408.018008. VidGen-1M: A Large-Scale Dataset for Text-to-video Generation视频文本对的质量从根本上决定了文本到...
For a dataset that is preprocessed for instruction purposes:{"input": "...", "output": "..."}You can use this example in your YAML config:datasets: - path: repo type: system_prompt: "" field_system: system field_instruction: input field_output: output format: "[INST] {instruction}...
Copyright: 2009, Lloyd Hilaiel, 2008, Igor Pavlov License: public-domain Comment: All the cruft you find here is public domain. You don't have to credit anyone to use this code, but my personal request is that you mention Igor Pavlov for his hard, high quality work. ...