This corpus is to be used when analyzing or studying Go code. For example, when one wants to change the Go language and estimate how much existing code would need to be adapted.For now, this repository simply contains a table with module information, including where to find the source code...
Low cardinality (a few distinct values that repeat throughout documents in your search corpus). Short descriptive values (one or two words) that render nicely in a navigation tree. The values within a field, and not the field name itself, produce the facets in a faceted navigation structure....
we assume that the MD&A corpus contains V words, and the initial word vector dimension of each seed word isV. Theword2vecmodel reduces the dimension of each word vector to ensure that the word vector dimension is set in such a way that it can summarise the meaning...
Text: contains the raw text of the news. Link: contains the URL of the source. Note that some instances have an empty header intentionally because the source omitted it. 📝 How to cite If you use the corpus please cite the following articles: Gómez-Adorno, H., Posadas-Durán, J. P...
https://github.com/roneysco/Fake.BR-Corpus We make this news collection available in the Github repository: https://github.com/marianacaravanti/PU-LP-for-fake-news-detection https://aosfatos.org/noticias/ https://piaui.folha.uol.com.br/lupa/ https://g1.globo.com/fato-ou-fake/ https:...
To date, this approach has identified a hierarchy of over 700,000 topics within the Microsoft Academic Knowledge corpus. In our dataset of 166,356 COVID-19 research articles, the average paper is associated with 9 FoS from different levels in this hierarchy and in total, 65,427 unique ...
The Moodle platform has been used for the Algorithm Design course for more than 12 years. Before the COVD-19 pandemic, it was used as a repository for course slides and other support documents, for laboratory documentation, and for announcements and discussions on a forum. The latter was ...
For example, the boundary of entities in a corpus is usually labeled through a sequence of word tags. The hidden Markov model (HMM) is widely used for this type of crowdsourcing labeling task. In the sequence tagging task, the position of the to-be-tagged element influences its type, ...
This repository provides a schema that is based on STIX2, and contains MITRE ATT&CK as an example dataset to start exploring this threat intelligence platform. More in this blog post. VirusBay VirusBay is a web-based, collaboration platform that connects security operations center (SOC) ...
This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds tokenization rules on top of spaCy's rule-based tokenizer, a POS tagger and syntactic parser trained on biomedical data and an entity span detectio...