John Mark Ockerbloom Copyright (c) 2008 All rights reserved. http://works.bepress.com/john_mark_ockerbloom Recent documents in John Mark Ockerbloom en-us Sun, 30 Nov 2008 16:33:43 PST 3600 Watching Our Backs: Community Verification of Digital Preservation Systems http://works.bepress.com/john_mark_ockerbloom/8 http://works.bepress.com/john_mark_ockerbloom/8 Fri, 14 Nov 2008 08:55:08 PST Librarians and faculty agree that information preservation is one of the essential roles of libraries. Yet, as the information we manage increasingly becomes digital, we have to rely on new methods of preserving this information that have not been fully tested. While developing and auditing for best practices is important, we must also verify that preservation systems actually perform as we hope they will, preferably long before we have to fall back on them.In this talk, I will show ways in which this verification can be done now, by the community, with reasonable cost and demonstrable efficacy. Specifically, I will describe Penn's failure recovery tests of LOCKSS, which uncovered issues with the system's performance and reliability, and helped lead to improvements addressing these issues. I will also discuss initiatives being organized through CRL to assess distributed auditing and community knowledge sharing to test and improve LOCKSS, Portico, and other shared preservation systems. John Mark Ockerbloom Preservation Promoting discovery and use of repository content: An architectural perspective http://works.bepress.com/john_mark_ockerbloom/7 http://works.bepress.com/john_mark_ockerbloom/7 Wed, 08 Oct 2008 18:25:46 PDT Slides and notes for a talk I gave at a NARA/UMD conference. (The notes include a full script, though it differs slightly from the talk as delivered.)In this talk, I stress the importance of effective discovery as an essential component of (and aid to) preservation. I advocate the importance of opening up information, system, and social architectures to do so, with examples that include subject maps, the DLF ILS-DI work, VCat, and PennTags. Some of the material in the talk was adapted from the "High Quality Discovery in a Web 2.0 World" talk I gave for Palinet. John Mark Ockerbloom Information discovery Preservation High Quality Discovery in a Web 2.0 World: Architectures for Next Generation Catalogs http://works.bepress.com/john_mark_ockerbloom/6 http://works.bepress.com/john_mark_ockerbloom/6 Wed, 28 May 2008 11:58:51 PDT Issues of information and systems architecture underly many of the current debates over the future of cataloging. This talk discusses some ways in which the architecture of the catalog is being redesigned to combine the rich information architecture of library metadata with the robust systems architecture of many Web-based discovery systems. I will show "subject map" discovery systems that better exploit the relationships in complex ontologies like LCSH, and discuss a Digital Library Federation initiative to promote standards supporting interoperability between discovery systems and ILS data and services. I will also touch on the role of networked architectures in improving the quality and efficiency of library cataloging. John Mark Ockerbloom Information discovery The Next Mother Lode for Large-scale Digitization? Historic Serials, Copyrights, and Shared Knowledge http://works.bepress.com/john_mark_ockerbloom/5 http://works.bepress.com/john_mark_ockerbloom/5 Wed, 30 Apr 2008 05:53:26 PDT Much of the publicity around recent mass-digitization projects focuses on the millions of books they promise to make freely readable online. Because of copyright, though, most of the books provided in full will be of mainly historical interest. But much of the richest historical text content is not in books at all, but in the newspapers, magazines, newsletters, and scholarly journals where events are reported firsthand, stories and essays make their debut, research findings are announced and critiqued, and issues of the day debated. Back runs of many of these serials are available in major research institutions but often in few other places. But they have the potential for much more intensive use, by a much wider community, if they are digitized and made generally accessible.In this talk, we will discuss an inventory we have conducted at Penn of periodicals copyright renewals. We found that copyrights of the vast majority of mid-20th-century American serials of historical interest were not renewed to their fullest possible extent. The inventory reveals a rich trove of copyright-free digitizable serial content from major periodicals as late as the 1960s. Drawing on our experience with this inventory's production and previous registry development, we will also show how low-cost, scalable knowledge bases could be built from this inventory to help libraries more easily identify freely digitizable serial content, and collaborate in making it digitally available to the world. Our initial raw inventory can be found at http://onlinebooks.library.upenn.edu/cce/firstperiod.html John Mark Ockerbloom Copyright Archiving and Preserving PDF Files http://works.bepress.com/john_mark_ockerbloom/4 http://works.bepress.com/john_mark_ockerbloom/4 Wed, 02 Apr 2008 12:45:46 PDT Since its release in mid 1993, Adobe Portable Document Format (PDF) has become a widely used standard for electronic document distribution worldwide in many institutional settings. Much of its popularity comes from its ability to faithfully encode both the text and the visual appearance of source documents, preserving their fonts, formatting, colors, and graphics. PDF files can be viewed, navigated, and printed with a free Adobe Acrobat Reader, available on all major computing platforms. PDF has many applications and is commonly used to publish government, public, and academic documents. Many of the electronic journals and other digital resources acquired by libraries are published in PDF format.As libraries grow more dependent on electronic resources, they need to consider how they can preserve these resources for the long term. Many libraries retain back runs of print journals that are over 100 years old, and which are still consulted by researchers. No digital technology has lasted nearly that long, and many data formats have already become obsolete and not easily readable in a much shorter time period. This document discusses ways that libraries can plan for the preservation of electronic journals and other digital resources in PDF format. After a brief discussion of the file specifications and the future plans for PDF, the article focuses on issues related to preservation of PDF files. John Mark Ockerbloom Preservation Mapping the library future: Subject navigation for today's and tomorrow's library catalogs http://works.bepress.com/john_mark_ockerbloom/3 http://works.bepress.com/john_mark_ockerbloom/3 Wed, 23 Jan 2008 08:56:54 PST My ALA Mindwinter 2008 presentation slides on subject maps. For more details on how subject maps are created, see the New Maps of the Library white paper from 2006. John Mark Ockerbloom Information discovery New Maps of the Library: Building Better Subject Discovery Tools Using Library of Congress Subject Headings http://works.bepress.com/john_mark_ockerbloom/2 http://works.bepress.com/john_mark_ockerbloom/2 Wed, 23 Jan 2008 08:26:33 PST We describe tools in development at the University of Pennsylvania to generate and display interactive "subject maps" for exploring library collections. Based on the Library of Congress Subject Headings (LCSH), these maps are automatically built from existing authority records, a collection's bibliographic records, and optional local "tweaks" for local interests and search patterns. Users can explore these maps via ordinary text- based web browsing, and browse clusters of related research resources. We now provide these maps for small collections like The Online Books Page, and are experimenting with maps for the entire Penn Library catalog. We hope to enable users to take full advantage of the rich conceptual relationships in LCSH-based library collections, and effectively browse increasingly diverse and dispersed library collections. John Mark Ockerbloom Information discovery Copyright and Provenance: Some Practical Problems http://works.bepress.com/john_mark_ockerbloom/1 http://works.bepress.com/john_mark_ockerbloom/1 Wed, 23 Jan 2008 08:26:31 PST Copyright clearance is an increasingly complex and expensive impediment to the digitization and reuse of information. Clearing copyright issues in a reliable and cost-effective manner for works created in the last 100 years can involve establishing complex provenance chains for the works, their copyrights, and their licenses. This paper gives an overview of some of the practical provenance-related issues and challenges in clearing copyrights at large scale, and discusses efforts to more efficiently gather and share information and its copyright provenance. John Mark Ockerbloom Copyright