Skip to main content
Article
Cross-Language Retrieval for Arabic Texts: The Creation of an English-Arabic Cross-Language Information Retrieval Environment
Center for Natural Language Processing, School of Information Studies (2005)
  • Robert N Oddy, Syracuse University
  • Anne R. Diekema, Syracuse University
  • Jean Hannouche, Syracuse University
  • Grant Ingersoll, Syracuse University
  • Elizabeth D. Liddy, Syracuse University
Abstract
An English-Arabic Cross-Language Information Retrieval Environment was created in which the user can query an Arabic database in English and retrieve a set of relevant Arabic documents. The retrieved Arabic documents will be automatically translated into English to facilitate readability by the English language user. Proper names of people, places, and organizations are extracted from the retrieved documents and transliterated from Arabic into English. They are presented to the user and serve to provide a brief summarization of the retrieved document. Another feature of the AIR design is the user’s ability to group searches and search results into what we call Topics which persist between sessions and can be managed by the individual user. The guiding principle in the AIR system is to get away from the English query as soon as possible and rely on relevance feedback to refine the Arabic version of the query, thereby providing the user with helpful information as quickly as possible. High precision query translation comes from the combination of different lexical resources to improve translation probabilities of initial query terms, and also to provide high-quality data for the interactive sense-disambiguation tool. The lexical combinatory resource includes machine-readable dictionaries, ontologies, machine translation lexicons, encyclopedias, and comparable corpora.
Keywords
  • cross-language information retrieval,
  • CLIR,
  • translation,
  • transliteration,
  • Arabic language,
  • evaluation,
  • lexical resource creation,
  • natural language processing.
Publication Date
2005
Publisher Statement
“Permission is granted by Center for Natural Language Processing, School of Information Studies for SUrface to distribute this article. All rights reserved to Center for Natural Language Processing, School of Information Studies. Please refer to the journal's copyright policy for more information.”
Citation Information
Robert N Oddy, Anne R. Diekema, Jean Hannouche, Grant Ingersoll, et al.. "Cross-Language Retrieval for Arabic Texts: The Creation of an English-Arabic Cross-Language Information Retrieval Environment" Center for Natural Language Processing, School of Information Studies (2005)
Available at: http://works.bepress.com/anne_diekema/58/