For a research project, I would need to analyse and compare two text corpus, one in French and the other in German.
I know that CorText offers both languages as features, but does it really work with both languages? Would the comparison be reliable?
CorTexT Manager use an endogenous approach: corpuses are grammatically tagged, in order to extract a specific type of words: noun phrases (by default bigram and more). For German and French the same library is used (treetagger), so the results should be compatible.
You should keep in mind that:
- French and German grammars are different;
- The size of the corpus is determinant for the selection of noun phrases in each corpus.
You have to perform tow (separated) extractions (lexical extraction), for the two languages, and to build you own dictionary of equivalence of concepts (using the main form column) you want to follow in the two corpuses. You may want to reindex the corpuses using this dictionary of equivalence of concepts.
I hope It helps,