Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other ...
N TheLanguageCorpusSystemofModernChineseStudy(LCSM CS)wordfrequencies,basedonacorpusof20millioncharactersofwhich2 millionhavebeen seg mentedintowordsandassignedtheirparts-of-speech(PoS)[8];availableathttp://.dw hyyjzx.co m/cgi-bin/yuliao/,checkedonSepte mber24,2009). N TheCenterforChineseLing...