Semantic annotation of Czech texts by using NLP tools

Jan Dědek, Peter Vojtáš


Czsem Mining Suite is mainly a GATE plugin that allows to use TectoMT and Treex tools inside GATE. Besides that is also a Information Extraction tool based on dependency linguistics. It is capable to learn tree queries (dependency based extraction rules) using Inductive Logic Programming (ILP).
The package also contains a prototype implementation of our Fuzzy ILP Classifier and an interface between the ILP methods and the Weka data mining software. The interface makes it possible to use the ILP methods as an ordinary Weka classifier for any classification task inside the Weka software. For the fuzzy ILP method, there is a requirement on the target (class) attribute: it has to be monotonizable (e.g. numeric).

Where to get it

Open source software hosted at sourceforge.


Contact email:



Research group at the department:

Web Semantization Research Group

Supporting research projects and grants:

GACR 201/09/H057, GAUK 31009, MSMT MSM0021620838, GACR P202/10/0761, TACR TA02010182


  • Dědek J., Vojtáš P., Vomlelová M.: Fuzzy ILP Classification of web reports after linguistic text mining, in Information Processing and Management, Vol. 48, Num. 3, ISSN: 0306-4573, pp. 438-450, 2012 - online
  • Dědek J.: Towards semantic annotation supported by dependency linguistics and ILP, in Lecture Notes in Computer Science, Vol. 2010, Num. 6497, ISSN: 0302-9743, pp. 297-304, 2010
