Caring for Metadata: Taking a Strategic Approach

Richard Gartner explores curating digital library metadata with the care and attention as the objects it supports.
As digital collections have been created and have grown to the large corpus that they form today, libraries, archives and other repositories have (hopefully) taken great care to look after them. The disciplines of digital curation and digital preservation have established an extensive set of protocols and techniques as a means of ensuring that these collections are accessible now and will remain so well into the future. The etymology of the word ‘curation’ says it all: from the Latin curare, care for, this is what we aim to do in the management of these valuable resources into which so much time, effort and expense have been expended.

The Link Between Data and Metadata

But what about the metadata that forms such an essential component of the processes of digital curation? Metadata is far from easy to produce and often far from cheap: it may, in fact, cost more to create that the digital resources that it documents. It is easy to forget just how extensive an array of metadata is needed for a digital object. Descriptive metadata, equivalent to the bibliographic information in a standard catalogue record, is only part of the story. Much more extensive is often the ‘administrative metadata’ needed behind the scenes to support the object: this includes the technical information needed to decipher its component files, the preservation metadata needed to ensure its longevity and the rights metadata needed to protect the intellectual property rights surrounding it. To these we usually have to add ‘structural metadata’, the type needed to link the component files of a digital object into a coherent whole; in the case of a digitised book, for instance, this is what orders its pages into sequence and divides them into chapters.

It is easy to see metadata as an adjunct to the data in a collection and so somehow less important but the two are symbiotically linked and should be treated as equal partners in making such a collection a digital library. It is a valuable commodity in its own right and should be handled with the same care and attention as the digital objects that it supports. This is why we need to care for, literally to curate, our metadata. But how is this best to be done?

Eight Basic Principles for Curating Metadata

I hope that I’ve answered this question at least in part in my recent book for Facet Metadata in the Digital Library: Building and Integrated Strategy with XML. Here I lay out eight basic principles for curating metadata with the due care and attention that it deserves. They range from the most basic, including acknowledging its equal importance to data in the first place, to the most specific, such as ensuring that it supports all stages of the digital curation lifecycle and is designed to enable its own preservation, and that of the data it refers to, in the long term. To these can be added employing standards whenever possible, ensuring that every component of the metadata and data is readily and unambiguously identified, and controlling its content when this is feasible. Above all, metadata should be implemented with a coherent overall strategy, one that is independent of any platform on which a digital library is hosted.

Producing coherence from the array of metadata needed to support an object in a digital library is far from simple. Multiple standards are generally needed to cover the requirements mentioned earlier and these must be bound together into something with a clear internal logic. An initial step in the right direction is the adoption of a suitable syntax or encoding mechanism: in my book I recommend XML (eXensible Markup Language), which can handle metadata of the greatest complexity within a logical, highly interoperable format which is also easy to preserve in the long term.

XML for Metadata

One of the great advantages of XML for metadata is its ability to bring metadata in multiple schemas together into a single, coherent whole. XML schemas define a set of metadata elements and their interrelationships, all of which can potentially be highly complex. Metadata encoded in multiple schemas can be brought together into a single file in what is known as a ‘packaging schema’: one of the most widely-used of these is METS (Metadata Encoding and Description Standard). All of the metadata for a potentially complex digital object (for instance a digitised book of several hundred images) can be brought together in this way into a single, discrete file which can be curated as readily as the data that it references itself.

This is a big step forward in curating and preserving metadata and data. Much of digital preservation and curation is centred on the use of packages of data and metadata with clear and discrete boundaries on which their processes and procedures can be based. The widely-adopted OAIS (Open Archival Information System) model for digital preservation, for instance, relies heavily on packages of this kind ( Adopting this approach, which is centred the creation of these, should go a long way to treating metadata (and, of course, the data it refers to) with the care and attention it deserves.

About the Author

Richard Gartner is a librarian and academic whose primary area of research is the theory and practice of metadata. He is currently the Digital Librarian at the Warburg Institute in the University of London where he established and is responsible for the Institute's digital library. He previously worked at the Bodleian Library, Oxford where he instigated the Library's first digitisation programmes and devised the metadata strategy for the Oxford Digital Library. More recently he was a lecturer in the Department of Digital Humanities at King's College London where he taught and researched metadata theory and practice and digital curation. Richard has written over 50 publications on metadata in the academic literature and is the author of the widely-read book Metadata: Shaping Knowledge from Antiquity to the Semantic Web (Springer, 2016).

