Archiving the Greek WebProc. of the 4th International Web Archiving Workshop (2004)
AbstractWeb sites have become an increasingly important part of every country’s information and cultural heritage. For this reason, Web archiving has become an issue for many national libraries. In this paper, we present a first attempt to archive the Greek Web. This project is divided in two parts; the first part concerns the collection of the majority of Greek Web pages. The second part focuses on the knowledge extraction from this archive, in order to classify it in semantically coherent clusters. Considerations concerning the criteria that should be set in order to characterize a Web page as “Greek” are discussed. A combination of IR and content mining techniques is applied in order to semantically characterize the collected content. We especially address the bilingualism issue arising because the content is written in both Greek and English. The collected Web pages are finally classified into meaningful clusters, facilitating the searching of the archive.
Publication DateSeptember, 2004
Citation InformationC. Lampos, Magdalini Eirinaki, D. Jevtuchova and M. Vazirgiannis. "Archiving the Greek Web" Proc. of the 4th International Web Archiving Workshop (2004)
Available at: http://works.bepress.com/magdalini_eirinaki/26/