I’m a metadata nerd and proud of it. I admit to the uninitiated (and let’s face it, the initiated) metadata and cataloguing might seem a bit – dull. It’s a strange feeling that you find something so boring very exciting. But here’s the thing, if you don’t describe a resource properly nobody will be able to find it, and what’s the point in that? And if you describe the resource using standardised tools then that’s even more useful again. I can see beauty in authority control and subject access. Yes, beauty. Everything is linked, and everything fits together. A catalogue is a bit like a giant brain, where one record connects to another related record.
For Organisation of Information, Cataloguing and Metadata (IS40520) we were tasked, in groups, to create a collection, real or hypothetical, together with a comprehensive consultation document to illustrate how we would manage our collection. Below are extracts from the final assignment my group submitted. I took charge of the metadata section (I’m a metadata nerd dontcha know) so will focus on that section for this post.
In Dublin’s Fair City: Images of Dublin through the ages
This consultation document will outline the various considerations in relation to the management of metadata for a collection of digital images entitled, ‘In Dublin’s Fair City.’ For the most part this is a theoretical or proposed collection. With the anniversary of the 1916 Easter Rising upcoming in 2016, there is huge interest in Dublin city from both historians and tourists. This is a collection of images of Dublin city through the ages. The collection is originally based on historical images of Dublin found on the National Library of Ireland’s Flickr site but other images of the city will be accepted into the collection and it will include born digital images of Dublin up to the present day. The collection is a digital collection and only includes digitised images or born digital images; physical photographs are not part of the collection. The collection is aimed at documenting the changing face of Dublin through the years and the intended audience would be anyone with an interest in the evolution of the city, e.g. tourists, people with cultural interests, historians and perhaps those with an interest in the changing architecture of the city.
Metadata Scheme
Good quality and robust metadata is essential both for efficiency within the library but also to enhance the findability of the item from the point of view of the user (Capell & Ginn, 2009, Jung-ran and Tosak, 2010). Jung-ran and Tosaka (2010) state that MARC is the most widely used metadata schema across digital repositories and collections, with Dublin Core being second most widely used metadata schema, followed by VRA Core among others. Wendler (2004) also explains that “images present a different descriptive challenge than mass-produced published works such as books and serials”. With these considerations in mind we evaluated three metadata standards for our collection; MARC, Dublin Core and VRA Core, deciding finally on Dublin Core.
Cole (2007) sets out his principles for good metadata. These include the appropriateness of the metadata to the materials and users, interoperability, standard controlled vocabularies and a clear statement on the conditions and terms of use. The controlled vocabularies used are discussed in Subject Access, below. We kept these principles in mind when considering which standard to use.
MARC
MARC is recognised as the most suitable metadata encoding format in the library world; however, there are often complaints about the number of data elements and their complexity (Liu, 2007). In fact the National Library have used MARC records to catalogue the original images that we have used in our collection.
Based on the principles above we declined to use MARC as our metadata standard as, although highly suitable for cataloguing more traditional library items, we felt it was not the most suitable standard for digital images. We also felt given that once the library is set up it may not be run by a trained librarian. It may be run by a lay person who may struggle with the complexity of MARC.
We also felt that MARC would struggle with interoperability. Hider (2012) explains that MARC was developed well before the internet was conceived; let alone the web in its current incarnation. Although MARC allows for rich resource description, it is not easily interpretable by modern computers.
Finally Zeng and Qin (2008) refer to MARC as highly structured and semantically rich metadata. However they note that MARC does not fare well with regards to management needs including intellectual property and preservation. As we would need to record the provenance of the digital images as well as any licenses for use associated with the images we again regarded MARC as not the most suitable metadata standard for our purposes.
VRA Core
If MARC was deemed unsuitable for our digital image collection perhaps VRA Core would better suit our needs. VRA Core was developed in response to the needs of visual resource collections and recognizes that most works of art do not include the title, creator, imprint and series information found in the bibliographic world (Zeng and Qin, 2008). Attig, Copeland and Pelikan (2004) explain succinctly “The VRA Core consists of a single element set, with elements that may be repeated as many times as necessary to describe works of visual culture as well as the images that document those works.” Miller (2012) explains that, while mostly used within the museum community VRA Core is sometimes used for visual resources outside of just museums.
Preservation efforts often lead to the creating of digital images depicting the item being preserved which lead to the development of VRA Core by the Visual Resource Association which can used to describe both the image and the content of the image (Liu, 2007).
The potential benefit of VRA Core in terms of our collection would be that ability to catalogue both the work, or actual scene, as well as cataloguing the digital image itself. However, as VRA Core is not as widely adopted as Dublin Core we felt that interoperability could be an issue. We also felt that for our collection VRA Core would be unnecessarily complicated for management of the collection after initial set-up. Finally, as we were choosing our LMS in conjunction with compatible metadata schemas, we felt the combination of Omeka together with Dublin Core would be more appropriate for our collection than VRA Core would be.
Dublin Core
Dublin Core is a generic metadata standard that was primarily developed for specifically for the web environment (Hider, 2012). It was designed to focus on describing a wide range of networked resources, specifically to address the difficulty in finding items on the web (Liu, 2007). The Dublin Core schema comprises a set of 15 elements and was established in the late 1990s and was designed to be applicable across various domains. These 15 ‘core’ elements are broad and generic enough to be suitable for describing a wide range of resources (Miller, 2012). While Dublin Core can’t describe objects in as rich a manner as MARC, it was designed to be a minimum that can be extended (Hider, 2012). It was also designed to be computer readable.
Zeng and Qin (2008) note that Dublin Core is the most well-known and widely used metadata element set for data structures. They also state that Dublin Core has the most mapped element sets among and across domain-specific and community orientated metadata standards. Weagley et al. (2010), Greenberg (2005), Tedd and Large (2005), Liu (2007) and Arms (2001) (as cited in Shirley & Liew, 2011) judge Dublin Core to be an ideal metadata scheme for accomplishing interoperability.
For these reasons we feel that Dublin Core is the most useful metadata standard for our collection; it is appropriate to the type of resource chosen for our collection, interoperable as well as giving us the opportunity to record rights management metadata.
Zeng and Qin (2008) divide the Dublin Core elements into three main categories: content, intellectual property and instantiation or manifestation. This is useful for the purposes of our collection as it allows us to catalogue the content of the image, for example the creator and description of the original physical image, the intellectual property for the image, for example who published it and the licensing rights and finally the manifestation of the digital image itself, for example the file name and the format of the digital image.
Metadata Fields
The fields we are using contain a combination of descriptive or content fields, administrative or intellectual property and instantiation metadata as described in Zeng and Qin (2008).
The descriptive or contents fields include Title, Creator , Description, Subject, Date Accepted, Coverage Temporal and Coverage Spatial. We recognize that for some older images not all of the information for each of these fields may not be available, however all fields should be recorded where possible. For each image within the collection a Title, Description, Subject and Date Accepted will be mandatory as the content for these fields can be created by the manager of the library as described below.
If the image is taken from an existing collection the existing formally assigned title should be used. In the case of images from the National Library of Ireland, the title can vary from the main catalogue and their Flickr account. In this case the title of the digital image as recorded on Flickr will be used, as we are cataloguing the digital image rather than the physical image. However, in the case that the image has no formally assigned title the librarian or manager will need to create one. The title should be distinctive and informative. The Creator represents the original photographer of the image. Subject is covered in more detail in Subject Access below, while Description is a free text description of the image, with the aim of encompassing as wide a range of information as possible.
In terms of the date we record the date the item was accepted into the library using the DC qualifier Date Accepted. Coverage is used to record both the temporal and spatial coverage of the item. Coverage with the refinement Temporal is used to record the time period when the photo was taken. This is because for some older images only a date range or approximate date may be available. This will also aid users looking for images covering a specific period in time. Coverage with the refinement Spatial is used to record the location the image was taken. This is recorded using geolocation to allow users to plot the exact location on a map. For older photos in the collection the location depicted in the image may have changed quite dramatically.
The intellectual property fields include Publisher, Identifier and Source, as well as Rights and Rights Holder. The Publisher of the item recognises the provenance of the item. For example, for images from the National Library of Ireland, the NLI is the publisher. The Identifier will be the file name which is assigned based on the criteria discussed in File Management. Similarly Source records the source of the original item, for example the call number for the original item in the original institution.
We, as humans, have been concerned with property and intellectual rights. Copyright serves to make clear who owns what works, however the issue of copyright and rights management is further complicated in the digital arena (Liu, 2007). Rights management metadata attempts to address rights management issues when dealing with digital resources, therefore it will be mandatory to record the rights for the images in our collection. We record both the Rights in the form of a copyright statement or a restriction on use statement, as well as the Rights Holder.
As well as describing the contents of the image and the intellectual property values associated with the image we also record the instantiation of the image. Type and Format are also recorded for the items in our collection.
Conclusion
In this metadata consultation document we have outlined our considerations and subsequent choices for the library management system, metadata schemas, content standard and subject access. Our theoretical collection will form part of the centenary celebrations for the 1916 rising and consists of a number of images of Dublin through the years, built by Fáilte Ireland. The initial stimulus for the collection came from the National Library of Ireland’s Flickr site and the collection contains a combination of digitised and born digital images.
In terms of the library management systems our selection criteria included a system that offered value for money, flexibility and adequate support as well as metadata viewing, editing tools, publishing tools, easy “ingest” tools, exporting/data sharing tools and geo-tagging tools. We considered Extensis Portfolio, a proprietary digital asset management system, VuFind, an open source resource discovery system and finally Omeka, an open source web publishing platform for the display of library, museum, archives and scholarly collections and exhibitions. Omeka was chosen as our LMS because it offers extensive metadata capturing tools, CSV importation of objects, facilitation of data harvesting and sharing through OAI compatibility, the possibility to use geo-tagging and social tagging.
Our metadata schema was chosen based on the appropriateness of the metadata to our materials and users, interoperability, standard controlled vocabularies and a clear statement on the conditions and terms of use. We evaluated three metadata schemas, MARC, Dublin Core and VRA Core, deciding finally on Dublin Core. DC was designed with the web in mind and the DC metadata elements are broad enough to describe a wide range of objects, including digital images. It is also considered to be highly interoperable and gives us the ability to record rights management metadata.
As a result of choosing Dublin Core as our metadata standard the need for a content standard was less essential. However, we chose to use content standards for a number of elements within the metadata, for example the format and date fields.
A number of different standards were also considered for subject access, including Getty thesauri, Library of Congress Subject Headings (LCSH) and the use of folksonomies. We finally decided on a combination of LCSH and LC Thesaurus of Graphic Images together with folksonomies or social tagging, which would encourage our users to engage with the collection.
We finally considered the file management and workflow for the library together with the cost implications of our choices.
We feel our choices, as outlined above, will present a polished and engaging user experience for our potential patrons, while also being manageable for a single librarian to run. The combination of Omeka together with Dublin Core allows us to present an aesthetically interesting collection to our users. The use of geo-tagging will also allow our users to trace the exact location of some of the older images from the collection. The combination of Omeka and Dublin Core will also be user friendly for the librarian to manage, while the use of selected content standards and LCSH controlled vocabularies will result in richly described images that are easily findable and interoperable with other library systems.
References
Arms, C.R. (2001), “Some observations on metadata and digital libraries”, Conference on Bibliographic Control New Millennium (Library of Congress), available at: http://www.loc.gov/catdir/bibcontrol/arms_paper.html (accessed 30 April 2013).
Attig, J., Copeland, A., & Pelikan, M. (2004). Context and Meaning: The Challenges of Metadata for a Digital Image Library within the University. College & Research Libraries, 65(3), 251-261.
Capell, L., & Ginn, L. (2009). Digital Collections: Design and Practice. Mississippi Libraries, 73(1), 3-7.
Cole, T.W. (2007). Principles for good digital collections. In Kresh D. (Ed.) The whole digital library handbook (pp. 302- 304). Chicago: American Library Association.
Greenberg, J. (2005), “Understanding metadata and metadata schemes”, in Smiraglia, R.P. (Ed.), Metadata: A Cataloger’s Prime, Haworth Information Press, New York, pp. 17-56.
Hider, P. (2012). Information resource description. London: Facet; Washington, DC: ALA Editions.
Jung-ran, P., & Tosaka, Y. (2010). Metadata Creation Practices in Digital Repositories and Collections: Schemata, Selection Criteria, and Interoperability. Information Technology & Libraries, 29(3), 104-116.
Liu, J. (2007). Metadata and It’s Applications in the Digital Library. Westport, CT: Libraries Unlimited.
Miller, S. (2012). Metadata for Digital Collections. London, New York: Neal-Schuman
Shirley, L., & Liew, C. (2011). Metadata quality and interoperability of GLAM digital images. Aslib Proceedings, 63(5), 484-498.
Tedd, L.A. and Large, A. (2005), “Standards and interoperability”, in Tedd, L.A. and Large, A. (Eds), Digital Libraries, K.G. Saur Verlag GmbH, Munchen, pp. 85-107.
Weagley, J., Gelches, E. and Park, J.R. (2010), “Interoperability and metadata quality in digital video repositories: a study of Dublin Core”, Journal of Library Metadata, 10(1), pp. 37-57.
Wendler, R. (2004). The Eye of the beholder: challenges of image description and access at Harvard. In Hillman, D.I. & Westbrooks E.L. (Eds.) Metadata in Practice (pp. 51-69). Chicago: American Library Association.
Zeng, Marcia Lei & Jian Qin. 2008. Metadata. New York, NY: Neal-Schuman.