“Metadata” and “the future of the humanities” require each other; the same division of labor produces them.
Our institutions and professional specializations have worked to alienate us from this division of labor. Even today, we might imagine the work of describing, indexing, and relating as essentially separate from the work of researching, analyzing, and interpreting. Catalogers and historians, for example, typically train separately, work out of sight of one another, and–apart from rare gatherings like the Orphan Film Symposium–attend different conferences. Yet although they live divergent professional lives, neither could do their work without the other. There are not first “histories,” then catalogs. Historiography presupposes the catalog and vice versa.
The point could be extended, mutatis mutandis, to the full Orphan’s triumvirate of archivists-academics-artists, that is, beyond catalogers and historians to any standardized descriptive practice–the archival finding aid, for example–and to any intellectual undertaking involving the interpretation of precedents–that is, to anything we might call “research.” The division of labor I aim to evoke is broad, elaborate, and well established. Small wonder that we would have difficulty grasping it whole.
Although it is well-established, this division of labor is not stable. Precisely because “metadata” names a field of activity more expansive than cataloging, it supplements and usefully estranges customary job descriptions. Some metadata, for example, is made by machines, which means that engineers and coders help create it. This is an important difference, but it is not the only one. “Metadata” and it associated processes also invite us to reimagine earlier forms of work and to rethink the hierarchy of discourses that encouraged the historian to sign her name as author, but not the cataloger. I’d like to develop this thought through three examples. Each will be shorter than the last.
1. A Global “Going to the Show” ?
The University of Amsterdam’s fantastic Cinema Context project, led by Karel Dibbets, inspired The University of North Carolina’s “Going to the Show” project. Both projects demonstrate how new ways of organizing evidence can encourage new forms of interpretation. “Going to the Show” begins by asking where, historically, movie-going happened in North Carolina. There’s nothing newfangled about this question, and familiar types of sources allow an answer. As for past generations of research projects, city directories, newspapers, and fire insurance maps tell us where the theaters were. The project does something novel, however, when it brings those sources online and allows other evidence and interpretations to accumulate around them. Names of proprietors, newspaper clippings, photographs, linked video and audio, even financial records, all accrete around the theater site.
A method informs this accumulation, one that the project’s lead historian Bobby Allen asserts as an argument. The answer to the question “What is film?,” Allen contends, ought to be “an experience” (see “Getting to Going to the Show,” New Review of Film & Television Studies 8.3 (2010): 264–276). Accordingly, the project documents filmgoing experience along at least two different axes. “Going to Show” reconstructs the experience of filmgoing by means of a series of temporal slices. These slices illuminate, for example, where one might have gone to the show in racially segregated Wilmington and also encourage us to think about the entire program of entertainment in which films would have been shown along with illustrated songs.
In addition to the temporal slice or snapshot of what moviegoing may have been like at particular places and times, it is also possible to slide spatially across the map, and in this way situate moviegoing in relation to habits of daily life: to place moviegoing, for example, in relation to the shop next door. “Going to the Show’s” spin-off project “Main Street” takes this direction.
This Orphanista audience is doubtlessly already thinking about the need to fill in the map with the churches, schools, homes, and other historic venues where we know that films have also been shown. Just as we can imagine adding all manner of sources to the map, so too can we imagine objects of investigation other than “experience.” One could map distribution or marketing strategies, for instance, and in fact “Cinema Context’s” dataset makes that easier than “Going to the Show’s.”
These projects are good to think with because they adapt themselves so readily to different types of questions and to different scales of analysis. Once we start thinking in this way, the possibilities of evidentiary accumulation seem potentially limitless, and we are forced to confront the two-fold problem of abundance limned a decade ago by Roy Rosenzweig, who urged attention both to the unprecedented boom in the availability of historical resources and to the fragility of digital evidence.
Although, as Allen also underscores, “Going to the Show” encourages us to imagine a project of potentially limitless accumulation, it is absolutely certain that we will never achieve anything like comprehensive coverage. Even if our preservation efforts, starting now, were generously funded and flawlessly executed, we could never hope to succeed in creating a dense evidentiary bubble around every screening of every print of every film. It is highly likely, moreover, that some of the digital evidence currently available will not survive for future researchers–because we may not succeed in migrating it to new formats. Additionally, these projects underscore that documentary evidence of film screenings will always overwhelm the number of surviving film titles. Instances where it is possible to provide intriguingly rich and dense information will therefore become notable examples. This kind of exemplarity is the consolation prize for preservation failures. In other words, the fact that the accumulation will be partial should certainly inform how we go about it, but it is not a reason to eschew the enterprise.
So why not make a global “Going to the Show”? Both Amerstdam’s and North Carolina’s projects envision scaling up in this way, and Allen’s classes are workshopping the idea using an innovative plugin for the popular WordPress software. The pilot projects illuminate, I think, that the obstacles to this global scaling up lie not in the availability of data, nor in the technical details, nor even with the financial requirements–although these would each be significant issues. The main problems, rather, are organizational; they are simultaneously metadata problems and division of labor problems.
Google already shows us what a global “Going to the Show” map might look like and, in so doing, also points to a first type of metadata challenge. Where Google’s search algorithm returns three separate sets of results for “cinema”
we might want a single map that returned all of these. That is, we would need our maps to group comparable exhibition venues that might have different names. This is a basic controlled vocabulary problem, made more complicated by linguistic variety, but nonetheless the kind of problem catalogers have been solving for many hundreds of years. New tools like seminum.org, use by Europeana, and Onomy.org, used by the Media Ecology Project (MEP) help in different ways to solve it (Dartmouth scholar Mark Williams presented on MEP at the Symposium) . While building and using controlled vocabularies requires skill and training, how to go about it is not fundamentally mysterious.
The same goes for a second metadata challenge we would confront in building a truly global “Going to the Show.” Since we could not reasonably expect a single enterprise to conduct this global project, we would need a way to pool resources such that the contributions of far flung archives and researchers could be searched all at once. Several examples of ambitious federated access platforms already exist, including Europeana and the Digital Public Library of America.
It is possible for diverse institutions to pool their resources in this way thanks to another kind of metadata standardization, one important version of which is the Open Archive Initiative’s Protocol for Metadata Harvesting. With OAI-PHM a repository converts, if necessary, records formatted in its local standard into records that employ a standard Dublin Core field set. Each record must include a unique identifier that links back to the asset in the original repository. Typically this URI can be used to display the asset. A service provider then harvests the records in order to make them available for federated search and access, as the DPLA has done, for example, with the same Wilmington fire insurance maps one might find on the “Going to the Show” site (try some searches, you’ll see).
The protocol provides a mechanism for the service provider to refresh periodically its harvest from contributing repositories. This architecture allows any number of repositories to contribute records and any number of service providers to harvest them. If we imagine “Global Going to Show” as an OAI-PHM service provider, it would harvest Dublin Core records describing and pointing to maps, newspaper stories, photos, ledgers, film titles, and so on contributed by libraries and archives worldwide.
For example, the harvester might scoop up a Dublin Core record dervived from a PBCore record at the University of South Carolina’s Moving Image Research Collections (MIRC) and use the coverage information there to place an icon on “Going to the Show’s” Charlotte Map (NB: this is an hypothetical example),
which would then link to this Movietone footage of the 1929 Confederate Reunion.
In the next couple of years, MIRC will be bringing 14,000 some Fox newsfilm titles online, thanks to an NEH funded project led by Director Heather Heckman. And this is only one, hardly the largest, of a number of repositories working to make its materials available in this way. So the potential here is great. Although it takes cleverness, thoughtfulness, and time to make these things work, it seems clear that the problems of federated searching of and access to rich media materials are now more on the order of improving the wheel than inventing it.
The same goes too, for, the an alternative to the “harvest” approach adopted by Columbia University’s MediaThread tool, which is part of the Media Ecology Project (MEP) tool set. In this approach, rather than rely on a service provider to “harvest,” researchers can “graze” repositories websites and, by means of a Java script bookmarklet, add resources of interest to collections they create. (Many thanks to MEP developer John Bell for the “harvest”-“graze” distinction.) They can also annotate materials they accumulate there, a capability that calls attention to a limitation of the OAI-PHM model with respect to our global “Going to the Show” project.
Accumulating materials for federated searching does not yet achieve the core innovation of that project, which lies not in “finding” or serving up bits of evidence but in the ability to establish new relationships among them. Federated search will allow researchers to discover relationships among a particular theater, newspaper story, or film title, but does not in itself allow them to share the relationships they discover or add detail to the records–in the way that “Going to the Show” makes each theater a node around which evidence and interpretation can accumulate. This problem of accumulating relationships and details has also been thought through, although practical applications in the world of rich media archives and humanities research are, so far as I know, not nearly as advanced.
The Resource Description Framework (RDF) offers one important way to solve this kind of problem. There are many flavors, but the basic idea is elegantly simple. Where a conventional hyperlink has a source, that you click on, and target, where it takes you, an RDF “triple” has a source (or subject), a target (or object), and description of the nature of the relationship between the two (or predicate). RDF triples lend themselves to diagrams and can also be rendered as sentences. The idea can be extended to express relationships of considerable scope and complexity.
The OpenAnnotation flavor of RDF, to pick one germane example, attaches notes to anything that can be treated as a digital object: a text or passage in a text, an image or section of an image, a film clip, a segment of a film clip, or another note. Among its key features is the provision for recording the provenance of the annotation (who made it and when) and including a “motivation” for making it. Even if you didn’t already know about this framework, I think you can see how it makes it easy to attach interpretations of evidence, or bundles of evidence, to wide variety of objects. If we ever imagined that creating metadata was fundamentally separate from research activity, OpenAnnotation makes such an assumption seem quaint indeed.
Let’s say then, that global “Going to Show” will use OpenAnnotation, or some other RDF flavor, to build relationships among its objects. These notes and relationships could then be harvested through a process like OAI-PHM. (In fact, OAI-ORE moved in this direction, although I’m not sure how it is progressing.) All manner of researchers worldwide could then contribute to a vast, flexible, and extensible global “Going to the Show” platform. The Media Ecology Project is currently developing a Metadata Server that will use RDF to allow this kind of collaborative work.
The collective labor of global “Going to the Show” would involve many important and interesting questions about how the varieties of expertise held by historians, theorists, geographers, critics, archivists, librarians, software developers, and so on, would all come to bear. In addition to providing a pilot project for global “Going to the Show,” North Carolina’s site indicates how such teams will be acknowledged. Its credits page should remind us of a longstanding division of labor, even as it also calls attention to an emerging one. Those of us who write books that use archival resources often thank the archives and archivists who help us, but we do not often credit the catalogers and programmers who may have made it possible to find those archives in the first place–because we do not know who they are. In contrast to the typical acknowledgements page, digital project likes “Going to the Show” typically provide credits more appropriate for the kind of recognition “metadata” demands, which is not recognition of the magnanimous author and her generous helpers, but rather of teams of variously skilled collaborators that, together, make it possible for humanities research to have a future.
2. The Filmstrip
If global “Going to the Show” focuses our attention on the scene of consumption, shifting our focus to the scene of production, and preservation, brings a somewhat different set of metadata problems and possibilities into view. Our archive at the University of South Carolina has been interested in producing digital surrogates of archival film elements that capture the filmstrip edge-to-edge. We’re not alone in this, but one could not call it a standard practice. The advantages of preserving an edge-to-edge scan can be readily perceived in these few seconds of a hand-colored Cines Italia travelog film from 1911. Colorlab produced a color preservation print of the film, so some version of the image content within the framelines will be available for generations yet unborn. Once the original print decays, however, this digital copy will be the best surviving evidence of the work process that produced this particular print. And I do mean particular—as we not only get a sense of how the stenciling was applied, but also of the particular individual who applied her fingerprint to the edge of the frame. (Thanks to MIRC Newsfilm Curator Greg Wilsbacher for his help with these illustrations.)
By design, photochemical duplication and celluloid projection were meant to eliminate such traces, but we can learn something about past practice when and where they survive.
I am tempted to call that fingerprint “metadata.” But there is a far clearer instance of metadata encoded on the edges of the frame, one with which every archivist is familiar. This is the edge code indicating when the stock was manufactured, as we see on the work print for A Frontier Post.
Much like a contemporary datestamp, Kodak’s square circle marks this stock as fabricated in 1925. The edge code example, I think, makes clear why we might not want to call the fingerprint “metadata.” That case offers, as yet, no known “key” allowing us to see the fingerprint as the stamp of a particular worker. Thus the fingerprint presents a puzzle; it offers the possibility of identifying who precisely colored the print without telling us, yet, who she was.
There’s something crucial to be said here too about relationship between symbolic and iconographic information. While a picture of the Kodak square and circle can be rendered in Unicode characters ( ■ ●) without losing its ability to mean “1925,” a comparable transposition cannot occur with the messy stencil edges. A great many of the metadata challenges we currently face, it seems to me, require us to think through when and how we can, or should, transcode images into more readily searched symbols. Those symbols won’t only be texts, but also computer generated numerical representations of patterns found in the sounds and images, as attempts to automate facial or gesture recognition, for example, are demonstrating.
A final case will, I think, illuminate the potential scope and interest of this problem set. This image shows a Fox Movietone negative with its variable density optical sound track.
Unlike the edge code, the track is not “information about” the film stock. More like the stenciling, it provides evidence of a technical process which might then become a research focus–how and when were such tracks produced, etc. Like the edge code datestamp, however, the track is consistently “encoded” information, albeit of a much more complicated sort.
I recently had the privilege of working with a team of archivists, mathematicians, programmers, and technicians as part of project to develop an open source software program (AEO-Light) that can transcode this pattern of unevenly spaced light and dark lines into the kinds of 1s and 0s that contemporary digital devices can interpret as sound.
In doing this work, we worried a good deal about fidelity. Could the software render sound quality comparable to traditional photoelectric processes? This comparison itself opened up an historical can of worms that held interest quite apart from the question of quality control. For example, we spent some time discussing whether the algorithm should account for the process of photochemical duplication. Negative tracks like this one were never meant to be photoelectrically “read” as sounds. Rather, they were meant to be transcoded into positive images that, when run through a projector, would generate electrical impulses that could be processed into sounds. Although the negative to positive flip may strike us a simple inversion, photographic emulsions are not uniformly sensitive to light across the spectrum. Thus, very bright whites on the negative do not turn into black blacks on the positive at the same rate as values in the grays. The laboratory processes of photochemical duplication must account for this curve, which would affect the pattern of light and dark on the soundtrack every bit as much as it does the image area. The question to us, then, was: should our algorithm attempt to “reverse engineer” this process and build the HD curve into the sound reconstruction algorithm? And, if we did this, where would it all end? Would we attempt to take into account the waveform transformations enacted by tube amplifying equipment and 1930s speaker horns as well?
For good reasons, we decided to defer this problem set to a future round of software development, and to focus on capturing as much sound information encoded in the negative as possible—without adding anything—rather than on recreating the sound quality that early 30s process might have produced. AEO-Light software development thus brought to my attention a history of practical, technical, and aesthetic choices involved in sound encoding and transcoding. And it forced me to consider our project as a continuation of that history. For quite some time, in turns out, mathematicians have been as important to the process of making and preserving film sound as filmmakers, lab techs, and archivists. If global “Going to the Show” envisions exhibition sites as the nodal points around which the evidence of film history accretes, we might similarly imagine the filmstrip, including its edge information, as offering points of attachment for accumulating metadata about production. I think we would imagine a history that would include not only work processes and legacy metadata forms, but also the very history of information transcoding in which new digital initiatives also participate.
3. Participatory Cataloging
Last fall I joined WGBH’s participatory cataloging project with a co-author, John Marx (WGBH archivist Karen Cariani discussed this project during the Symposium). We contributed metadata for two shows hosted by I. A. Richards during WBGH’s second season on-air in 1957-58. John and I had become interested in Richards as part of our project to explore the history of the American research university as a media institution. Richards, the “Father of new criticism,” was interesting to us precisely because he expressed prolific scorn for mass media as the antithesis of poetry, on the one hand, while, on the other, he worked steadily to make and promote alternative media, sometimes in service of poetry. This first episode of Sense of Poetry gives you a flavor for the results.
You can see more at OpenVault, where you can also read the metadata we wrote, and you can read more about our thoughts on Richards on our blog. To conclude this talk, I want to underscore that the success of WGBH’s project is attributable neither to cutting-edge technical development, nor to lavish funding, but rather to excellent organization. This is not to say that the technology or the funding is easy. The genius of WGBH’s project, however, lies in the challenging and ongoing people-work of identifying researchers and encouraging them to do something slightly different than they typically do–that is, to conceive catalog description as one of the results of research as opposed to the starting point for it.
Of course, archivists have long known that good descriptions of rich media required research. What’s new here is not the work, but the division of it. And the reflection on it, which encourages us to think about how to reproduce such labor models in the future. Metadata creation is rapidly becoming part of every humanist’s job description, and it’s a good thing too. I am not proposing that there will cease to be a difference between, say, the job of historian and that of cataloger or coder. We will continue to need specializations–probably more of them–but we can, it seems to me, make some of the more alienating features of current division of labor obsolete as new digital research and publishing platforms shift of our sense both of the processes and the products of humanist inquiry.