Framework for Data Scraping and Semantization

Miloslav Beňo, Jakub Míšek, Filip Zavoral


This system is designed for efficient extraction of large amount of data from the web pages. AgentMat processing is based on an XML-based language describing the given extraction task in a declarative way. The task description consists of system components, which connected together are able to perform the desired functionality on a general web page. Thanks to this scraping system the raw contents from the irregularly updated and unstructured web pages can be kept categorized and accessed together with the semantic metadata.

Where to get it

Development of AgentMat is discontinued. Current project addressing the same problem is LinqToWeb

Contact email:



Research group at the department:

Web Semantization Research Group

Supporting research projects and grants:

GACR 201/09/0990, GAUK 28910, GACR P202/10/0761


  • Beňo M., Míšek J., Zavoral F.: AgentMat: Framework for Data Scraping and Semantization, in 3rd International Conference on Research Challenges in Information Science, Fez, Morocco, IEEE Computer Society Press, ISBN: 978-1-4244-2864-9, pp. 253-264, 2009
The content of this web site is licensed under Creative Commons Attribution-NonCommercial 3.0 Czech Republic