Terms extraction error

admin Staff asked 12 years ago


I am trying to execute a terms extraction script on a DB I have uploaded to the manager but the jobs end with a deprecation error. I’ve pasted the error text below. The log can be accessed here: http://manager.cortext.net/logs/40332.

Could you please help me run the script or explain the cause for the error? Is there something I can do from my end to work around the problem?

Thank you,

Error text:
lib/treetaggerwrapper.py:852: DeprecationWarning: os.popen2 is deprecated. Use the subprocess module.
self.taginput,self.tagoutput = os.popen2(tagcmd)
Traceback (most recent call last):
File “/srv/local/web/cortext/manager/scripts/terms_extractor/terms_extractor.py”, line 2086, in
data_extracted = map(extract_terms_year, years)
File “/srv/local/web/cortext/manager/scripts/terms_extractor/terms_extractor.py”, line 1801, in extract_terms_year
cle_occ,cle_formes = stem_words_detailed(nlemmes_corpus,formes_nlemmes,language,stem_dict)#5.stemming
File “/srv/local/web/cortext/manager/scripts/terms_extractor/terms_extractor.py”, line 967, in stem_words_detailed
cle = ‘ ‘.join(list(map(lambda x: stem_dict[x],txt)))
File “/srv/local/web/cortext/manager/scripts/terms_extractor/terms_extractor.py”, line 967, in
cle = ‘ ‘.join(list(map(lambda x: stem_dict[x],txt)))
KeyError: u’\u05d4\u05d7\u05d9\u05d9\u05dd’