After I've installed the latest corpkit (2.1.1), I wanted to parse my corpus (which worked in the previous version - at least for approx. 40% of the texts) and received the message NameError: global name 'Corpus' is not defined. What did I do wrong? Below you will find the log...
I believe the bug is due to another module importing numpy, and then the bit that decides which shannon implementation to use breaks: first time, numpy is not available, and it's all good. second time (probably because I've plotted something and imported numpy as a side effect, the broke...
The nltk module is running with other libraries in the corpus folder. My Code I've already tried putting 'import nltk' at first but it is still the same, and also I've tried 'from nltk.tokenize import 'PunktSentenceTokenizer'. I don't know why the Python shell can't find the definit...
However, according to our preliminary investigation on the reference corpus made by Franze?n et al. (2002), which is annotated with 1,745 protein names, most protein name fragments are fundamen- tally nouns (85%), and thus POS taggers are unlikely to be helpful to distinguish protein names...