Skip to main content
Article
Automated Classification to Improve the Efficiency of Weeding Library Collections
The Journal of Academic Librarianship (2018)
  • Kiri Lou Wagstaff
  • Geoffrey Liu, San Jose State University
Abstract
Previous studies have shown that weeding a library collection benefits patrons and increases circulation rates.
However, the time required to review the collection and make weeding decisions presents a formidable obstacle.
This study empirically evaluated methods for automatically classifying weeding candidates. A data set containing
80,346 items from a large-scale weeding project running from 2011 to 2014 at Wesleyan University was
used to train six machine learning classifiers to predict a weeding decision of either ‘Keep’ or ‘Weed’ for each
candidate. The study found statistically significant agreement (p=0.001) between classifier predictions and
librarian judgments for all classifier types. The naive Bayes and linear support vector machine classifiers had the
highest recall (fraction of items weeded by librarians that were identified by the algorithm), while the k-nearest neighbor classifier had the highest precision (fraction of recommended candidates that librarians had chosen to
weed). The variables found to be most relevant were: librarian and faculty votes for retention, item age, and the
presence of copies in other libraries.
Keywords
  • Data mining,
  • Automated classifiers,
  • Academic library,
  • Collection weeding
Publication Date
Spring February 14, 2018
DOI
https://doi.org/10.1016/j.acalib.2018.02.001
Citation Information
Kiri Lou Wagstaff and Geoffrey Liu. "Automated Classification to Improve the Efficiency of Weeding Library Collections" The Journal of Academic Librarianship Vol. 44 (2018) p. 238 - 247 ISSN: 0099-1333
Available at: http://works.bepress.com/geoffrey-liu/22/