For quite awhile now, I’ve been thinking about the value of aggregating content and information in museum collections. I think it is generally accepted now that museums and collections of many kinds need to make larger portions of their collections available online to the public, and efforts to digitize collections and webify collections data are producing wonderful results. At the same time, aside from good public relations, what’s in it for the museums and for scholarship in general? What new information or new research directions might emerge from aggregations of museum content? Not surprisingly, in natural history and biodiversity research, the power of numbers, of volume, has been recognized for a long time. Single specimens are nice as types, but in order to learn something about ecological systems and evolution, you need statistically valid numbers. In cultural heritage collections, the possibilities are less clear. Some recent work in England has been interesting though perhaps more from the perspective of studying the history of museums and even of colonialism. Museum studies are still especially interested in the individual object or the subcollection. Rather than focusing on the unique individual object or specimen, what can we learn by unlocking and aggregating content in collections? What research questions emerge? What are the limitations and the opportunities?
Here’s one idea I’ve been thinking about that would pertain to Anthropology and Archaeology collections. We could look at the combinations of material and technique across culture, time and space. We’d expect certain combinations to be visible, but I suspect we would be surprised on numerous occasions. The semantic index that supports the Phoebe A. Hearst Museum of Anthropology’s Delphi system could be an excellent source for this project. I might even revisit some of my dissertation materials. Yikes…
What would be problematic about such a study? Data quality within and across collections would be an important consideration. Would we know which objects were documented at a sufficient level of detail? Would we know which parts of the collection were studied more closely? Would we know which museum specialists were “good” at their jobs? Would we know which objects or collections had been reviewed by multiple museum specialists? The number of biases would be large and problematic. But hey, I’m an archaeologist by training. I’m used to studying a messy data set and making a large number of assumptions.
What kinds of things can be done to address these biases? We could select only sets of data that had been carefully studied, but that in and of itself will create bias. We could try to enrich the data ourselves, but the sheer scope of that effort is terrifying. That’s where crowd sourcing, tagging and annotation could come in. By getting our collections online and allowing other experts (including the public) to enrich our content, we can incrementally improve the quality of our information. Other projects are showing how this can be productively done. However, how much work has been done on assessing the quality of tagging and annotation in a setting such as this? Interestingly the CalPhotos system has been allowing reviewers to annotate and re-identify species for many years. CalPhotos then might provide a good context in which to study the results of annotation and review.