internet. As a result of our models, we will be able to identify a dataset of toxic comments with at least a 70% success rate. Additionally, we want to be able to successfully identify both new and previously existing comments as toxic or non-toxic in multiple languages that others have ...
Low-resource languages (LRL) with complex morphology are known to be more difficult to translate in an automatic way. Some LRLs are particularly more difficult to translate than others due to the lack of research interest or collaboration. In this article, we experiment with a specific LRL, ...
Resources for conservation, development, and documentation of low resource (human) languages. According to some estimates, half of the 7,000~ currently spoken languages are expected to become extinct this century. However, there is a lot of work by academics, independent scholars, organizations, co...
内容提示: Meeting the Needs of Low-Resource Languages:The Value of Automatic Alignments via Pretrained ModelsAbteen Ebrahimi ♦ Arya D. McCarthy ∇ Arturo Oncevay ♥Luis Chiruzzo 4 John E. Ortega Ω Gustavo A. Giménez-Lugo ♣Rolando Coto-Solano φ Katharina Kann ♦♦ University ...
The most challenging issue with low-resource languages is the difficulty of obtaining enough language resources. In this paper, we propose a language service framework for low-resource languages that enables the automatic creation and customization of new resources from existing ones. To achieve...
Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyothi, Sunayana Sitaram, Vivek Seshadri Language Resources and Evaluation Confere...
Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyothi, Sunayana Sitaram, Vivek Seshadri Language Resources and Evaluation Conference (LREC)|May 2020 ...
Recently, very large language models (LLMs) have shown exceptional performance on several English NLP tasks with just in-context learning (ICL), but their utility in other languages is still underexplored. We investigate their effectiveness for NLP tasks in low-resource languages (LRLs), especially...
Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyothi, Sunayana Sitaram, Vivek Seshadri Language Resources and Evaluation ...
-lingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher res...