The SUBTLEX-US corpus has been parsed with the CLAWS tagger, so that researchers have information about the possible word classes (parts‐of‐speech, or PoSs) of the entries. Five new columns have been added to the SUBTLEX-US word frequency list: the dominant ...
Here is 1 public repository matching this topic... A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency. countwordenglishamericanenfrequenceen-ussubtlexsubtlexus UpdatedFeb 13, 2020 JavaScript Improve this page Add a description, image, and links to thesubtlextopic page so...
SUBTLEX-CAT is a word frequency and contextual diversity database for Catalan, obtained from a 278-million-word corpus based on subtitles supplied from broadcast Catalan television. Like all previous SUBTLEX corpora, it comprises subtitles from films and TV series. In addition, it includes a wider...
We constructed several phonological similarity networks (neighbors differ in exactly one consonant or vowel phoneme) using words from a lexicon based on the SUBTLEX-US English corpus, distinguishing networks by size and word representation (i.e., lemma vs. word form). The resulting networks are ...
The last advantage of our corpus is the availability of metadata provided for most files, including production type, year, title, duration, and original lan- guage, among other information, which has allowed us to examine the corpus in more depth. Moreover, it has allowed us to construct ...