At University at Albany, the M.E. Grenander Department of Special Collections and Archives has been digitizing two unique collections: Albany Student Newspaper (1916 – president) and New York’s Civil Service Employees Association newsletter (1932 – present). To make the content of these two collections searchable, the Library Systems Department has been experimenting with Google Mini, an enterprise search appliance, to crawl, normalize, and index the collections’ 1,000-plus PDF files, as well as to develop a customized search interface and result presentation using XSLT (Extensible Stylesheet Language Transformations) script. In the process, we have identified several factors that will affect the quality of the search results and relevancy ranking, including the accuracy of the optical character recognition, the granularity of the file structure, and the metadata of the records. Finally, we were able to further improve and influence the search quality by employing Mini’s unique features, including KeyMatch, Related/Suggested Queries, Self-Learning (Heuristic) Spell Checker, and Result Biasing. In our presentation, we would like to share our 3-month experience in deploying Google Mini with the audience.
- Google Mini,
- Digital collection
Available at: http://works.bepress.com/win_shih/18/