高质量语料: https://hf.co/collections/gair-prox/prox-dataset-66e81c9d560911b836bb3704 用语言模型提升语言模型 「自己」 的数据质量 传统的预训练数据清洗和优化方法主要依赖人工设计的规则,虽然这些规则能够有效过滤低质量数据,但...
Together Computer.RedPajama: An Open Dataset for Training Large Language Models, 2023. Penedo, Guilherme, et al.The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, arXiv:2406.17557, 2024.
DS-Prox: Dataset Proximity Mining for Governing the Data LakeWith the arrival of Data Lakes (DL) there is an increasing need for efficient dataset classification to support data analysis and information retrieval. Our goal is to use meta-features describing......
I have a dataset with categorical data with 31 levels. I want to show their distribution in a scatterplot with ggplot, but I want to place special emphasis on some of the datapoints, like the red circ... Macro Vim - expand multiple Verilog Bus ...
Qualitative PROX dataset: Dataset of 100K RGB-D frames pseudo Ground Truth. The dataset captures dynamic RGB-D sequences of 20 subjects in 12 scenes and is described in Section 4.1.2 on the PROX paper. Both datasets have a very similar structure which is explained next. After extracting the...
python evaluate.py --model<model>\ --dataset_name<dataset_name>\ --lambda_ 0.0001 \ --ckpt<ckpt> An example is: python evaluate.py --model resnet18 \ --dataset_name cifar10 \ --lambda_ 0.0001 \ --ckpt checkpoints/obproxsg_plus_resnet18_cifar10_1.000000E-04.pt ...
I have a dataset with categorical data with 31 levels. I want to show their distribution in a scatterplot with ggplot, but I want to place special emphasis on some of the datapoints, like the red circ... Macro Vim - expand multiple Verilog Bus ...
DS-Prox: Dataset Proximity Mining for Governing the Data LakeWith the arrival of Data Lakes (DL) there is an increasing need for efficient dataset classification to support data analysis and information retrieval. Our goal is to use meta-features describing dat...
It is aDjangoweb app that displays the detailed patterns of lymphatic metastases in head & neck cancer and allows one to explore the underlying dataset(s) in much detail. It is hosted under the URLhttps://lyprox.org. Motivation HNSCC spreads though the lymphatic system of the neck and form...