"Archiving the Greek Web" by C. Lampos

Selected Works of Magdalini Eirinaki

Follow Contact

Presentation

Archiving the Greek Web

Proc. of the 4th International Web Archiving Workshop (2004)

C. Lampos
Magdalini Eirinaki, San Jose State University
D. Jevtuchova
M. Vazirgiannis, Athens University of Economics and Business

Link

Abstract

Web sites have become an increasingly important part of every country’s information and cultural heritage. For this reason, Web archiving has become an issue for many national libraries. In this paper, we present a first attempt to archive the Greek Web. This project is divided in two parts; the first part concerns the collection of the majority of Greek Web pages. The second part focuses on the knowledge extraction from this archive, in order to classify it in semantically coherent clusters. Considerations concerning the criteria that should be set in order to characterize a Web page as “Greek” are discussed. A combination of IR and content mining techniques is applied in order to semantically characterize the collected content. We especially address the bilingualism issue arising because the content is written in both Greek and English. The collected Web pages are finally classified into meaningful clusters, facilitating the searching of the archive.

Disciplines

Computer Engineering

Publication Date

September, 2004

Citation Information

C. Lampos, Magdalini Eirinaki, D. Jevtuchova and M. Vazirgiannis. "Archiving the Greek Web" Proc. of the 4th International Web Archiving Workshop (2004)
Available at: http://works.bepress.com/magdalini_eirinaki/26/