Skip to main content
Presentation
Archiving the Greek Web
Proc. of the 4th International Web Archiving Workshop (2004)
  • C. Lampos
  • Magdalini Eirinaki, San Jose State University
  • D. Jevtuchova
  • M. Vazirgiannis, Athens University of Economics and Business
Abstract
Web sites have become an increasingly important part of every country’s information and cultural heritage. For this reason, Web archiving has become an issue for many national libraries. In this paper, we present a first attempt to archive the Greek Web. This project is divided in two parts; the first part concerns the collection of the majority of Greek Web pages. The second part focuses on the knowledge extraction from this archive, in order to classify it in semantically coherent clusters. Considerations concerning the criteria that should be set in order to characterize a Web page as “Greek” are discussed. A combination of IR and content mining techniques is applied in order to semantically characterize the collected content. We especially address the bilingualism issue arising because the content is written in both Greek and English. The collected Web pages are finally classified into meaningful clusters, facilitating the searching of the archive.
Disciplines
Publication Date
September, 2004
Citation Information
C. Lampos, Magdalini Eirinaki, D. Jevtuchova and M. Vazirgiannis. "Archiving the Greek Web" Proc. of the 4th International Web Archiving Workshop (2004)
Available at: http://works.bepress.com/magdalini_eirinaki/26/