IST Home > IST Division > Data Services > Blog

Local Navigation:


Archive for the ‘Data Repository Management’ Category

 

 

MVP Spotlight- August 2009

Monday, August 10th, 2009 by Elizabeth Ha
Each month, we highlight news relating to digital scholarship, access and preservation at Berkeley and around the world. To contribute, email Lizzy Ha. On Campus Opencast Matterhorn Project Awarded Funding from Mellon and Hewlett Foundations http://www.opencastproject.org/content/opencast_matterhorn_project_awarded_funding_mellon_and_hewlett_foundations http://www.berkeley.edu/news/media/releases/2009/07/28_matterhorn.shtml The Opencast Matterhorn project recently received 1.3 million dollars from the Andrew W. Mellon and William and Flora Hewlett foundations. Scheduled to be launched [...]

CollectionSpace 0.1 Hello World release

Wednesday, August 5th, 2009 by Chris Hoffman

The CollectionSpace project team has released version 0.1 of its open-source collection management system for museums. This Hello World release focuses on tying the technology layers together around the function of basic object entry. Those interested in collections are encouraged to experiment with the Hello World release and provide feedback to the project team. CollectionSpace is funded by the Andrew W. Mellon Foundation.  Read the iNews article for more information:

http://inews.berkeley.edu/articles/Aug2009/CollectionSpace

Also worth noting, Carl Goodman and Megan Forbes (of Museum of the Moving Image) recently visited the Getty Research Institute (Los Angeles) and the the San Francisco Museum of Modern Art.  Their presentations are hosted on the CollectionSpace wiki and provide a great overview of where CollectionSpace is right now.  Here’s a link to the Getty presentation:

http://wiki.collectionspace.org/display/collectionspace/Getty+Presentation+July+2009

It’s great to see this first version which focuses on technical integration.  IST will work closely with the CollectionSpace project team, as well as with campus museums, to ensure that this solution will be one that helps us manage, study, and share the world-class collections for which UC Berkeley is responsible.

MVP Accomplishments, as of July 2009

Tuesday, July 28th, 2009 by Elizabeth Ha
Engagement with Partners •      We are in the middle of a process to re-engage with current partners in order to ensure that our current services were meeting their needs.  Made a few small changes to the services to enhance the utility for current partners. •     Engaged new partners including the Berkeley Art Museum/Pacific Film Archive [...]

MVP Spotlight- July 2009

Friday, July 10th, 2009 by Elizabeth Ha
Each month, we highlight news relating to digital scholarship, access and preservation at Berkeley and around the world. To contribute, email Lizzy Ha. On Campus Visual Resources Collection History of Art, UC Berkeley Image news and tech tips from the Visual Resources Collection http://havrc.blogspot.com/ The History of Art Visual Resource Center (HAVRC), a partner of the MVP, recently launched a blog [...]

Big Data issue of Nature: uneven, but worth reading

Thursday, September 11th, 2008 by Patrick Schmitz

The topic of Big Data and the associated trends for research are part of our future here at DS. The recent issue of Nature looks at issues and trends around the topic, and while uneven, has some good material in it that folks should check out. Here’s my blow by blow on the sections:

The opening editorial calls for push to make annotating data be a major component of research and of grants. Sound familiar? Let’s hope funders listen.

The section on the next Google trots out a lot of familiar and frankly pretty dull options. Skip it.

Big data: Data wrangling poses important question about data collection. We might have the sense is that there is so much data, it is just a matter of managing it. However, David Goldston notes that there are also huge holes in the dataverse, and these are a result of political policy. Further, if a political entity controls the data, politics can (and will) shape and filter the data in fair-reaching ways.

Cory Doctorow’s Gee whiz piece is irritating (unless you’re into technoporn), and is easy to skip.

A piece on wikiomics is an excellent description of how community can make a difference, and the social dynamics of a collaboratory.

Cliff Lynch has a good piece on what data production projects must do to rationalize their data management, and what services must be provided by groups like IST/DS, to support these projects.

Frankel & Reid present an interesting discussion of mining and visualization, and include a compelling, cautionary note:

“The ingrained habits of highly trained scientists make them rarely as adventurous as these young minds. We think we are on the path to insight when shading reveals contours in 3D renderings, or when bursts of red appear on heat maps, for example. But the algorithms used to produce the graphics may create illusions or embed assumptions. The human visual system creates in the brain an apparent understanding of what a picture represents, not necessarily a picture of the underlying science. Unless we know all the steps from hypothesis to understanding — by conversing with theorists, experimentalists, instrument and software developers, visualization scientists, graphic artists and cognitive psychologists — we cannot be sure whether a display is accurate or misleading.”

The closing essay is human interest and could be skipped in the interest of time. However, it is short, and like the best human interest stories, is surprising and inspiring.

Indianapolis Museum of Art Dashboard

Thursday, September 4th, 2008 by Chris Hoffman

I was talking this morning with Peter Cava (Data Warehouse Services Manager here at UC Berkeley) about the (potential) intersection between business intelligence systems used for administrative systems here and the kinds of data aggregation and analysis performed by research scientists and faculty working with museum collections. Afterward, I was looking at the preliminary program for the Museum Computer Network conference this fall and saw that Rob Stein at Indianapolis Museum of Art is giving a talk about a dashboard that they have developed to help measure various aspects about the museum’s performance. I can’t resist when these kinds of connections happen — they always lead in interesting directions. The IMA Dashboard is up on the web, and they should be applauded for their emphasis on transparency. I also enjoyed reading the blog post at the Powerhouse Museum in Sydney featuring an interview with Rob about the project, and this pointed me to a report written by Maxwell L. Anderson for the Getty, titled “Metrics of Success in Art Museums.”

Now it’s time to get in touch with Rob!

Research directions using aggregated museum data sets

Sunday, August 10th, 2008 by Chris Hoffman

For quite awhile now, I’ve been thinking about the value of aggregating content and information in museum collections. I think it is generally accepted now that museums and collections of many kinds need to make larger portions of their collections available online to the public, and efforts to digitize collections and webify collections data are producing wonderful results. At the same time, aside from good public relations, what’s in it for the museums and for scholarship in general? What new information or new research directions might emerge from aggregations of museum content? Not surprisingly, in natural history and biodiversity research, the power of numbers, of volume, has been recognized for a long time. Single specimens are nice as types, but in order to learn something about ecological systems and evolution, you need statistically valid numbers. In cultural heritage collections, the possibilities are less clear. Some recent work in England has been interesting though perhaps more from the perspective of studying the history of museums and even of colonialism. Museum studies are still especially interested in the individual object or the subcollection. Rather than focusing on the unique individual object or specimen, what can we learn by unlocking and aggregating content in collections? What research questions emerge? What are the limitations and the opportunities?

Here’s one idea I’ve been thinking about that would pertain to Anthropology and Archaeology collections. We could look at the combinations of material and technique across culture, time and space. We’d expect certain combinations to be visible, but I suspect we would be surprised on numerous occasions. The semantic index that supports the Phoebe A. Hearst Museum of Anthropology’s Delphi system could be an excellent source for this project. I might even revisit some of my dissertation materials. Yikes…

What would be problematic about such a study? Data quality within and across collections would be an important consideration. Would we know which objects were documented at a sufficient level of detail? Would we know which parts of the collection were studied more closely? Would we know which museum specialists were “good” at their jobs? Would we know which objects or collections had been reviewed by multiple museum specialists? The number of biases would be large and problematic. But hey, I’m an archaeologist by training. I’m used to studying a messy data set and making a large number of assumptions.

What kinds of things can be done to address these biases? We could select only sets of data that had been carefully studied, but that in and of itself will create bias. We could try to enrich the data ourselves, but the sheer scope of that effort is terrifying. That’s where crowd sourcing, tagging and annotation could come in. By getting our collections online and allowing other experts (including the public) to enrich our content, we can incrementally improve the quality of our information. Other projects are showing how this can be productively done. However, how much work has been done on assessing the quality of tagging and annotation in a setting such as this? Interestingly the CalPhotos system has been allowing reviewers to annotate and re-identify species for many years. CalPhotos then might provide a good context in which to study the results of annotation and review.

BECHAMEL project at NCSA combines preservation and semantic services

Friday, August 8th, 2008 by Patrick Schmitz

U. of Illinois is getting a chunk of NDIPP money to develop their BECHAMEL framework that identifies semantic vulnerabilities in metadata, as a means of supporting digital preservation services. What does this mean? Here’s a good quote:

“For example, the meta-data for a digital file—a photo or map or document—might include a field called “creator.” Putting a name like “John Smith” in this field might seem sufficient, but does that really identify the creator of the information? In 50 years will a future researcher be able to pinpoint which of the world’s many “John Smiths” created the information?

BECHAMEL flags risks like that one, or such as numerical values that aren’t accompanied by error ranges.”

There’s only a little more info in the article, but there are some papers on a research page at the uiuc site. David Dubin’s recent paper provides some better details. He describes their earlier BECHAMEL work as “a research environment for proposing and testing theories of the meaning of markup.” It is a Prolog app connected to an RDF store (Kowari, losing favor to Mulgara).

It sounds like some of what they’re doing is to recognize that lots of so-called structured markup (including, im my opinion, lots of RDF) is actually semantic-free and amounts to free text annotations with some weak hints (e.g., “dc:creator”). The question is whether the project will yield useful tools or more guidelines that are unrealistic in deployment. Their near term goal seems to be the conversion of entity references in free text (e.g., in  a dc:creator element) to RDF references to vocabularies. Is a reference to the concept of “San Francisco, CA” in a gazetteer more useful than the same free text? Probably. But will an RDF pointer to a FOAF description of “John Smith”be much more useful than the free text? I doubt it.  Nevertheless, a project worth watching.

Keck Hydrowatch sensor network project featured on KQED Quest

Tuesday, July 29th, 2008 by Chris Hoffman

As reported on the UCB home page, the Keck Hydrowatch sensor network project was featured on the July 22nd edition of KQED’s science program, Quest. Collin Bode, a programmer who occasionally sits in the BSCIT office when he’s not working in the field stations, makes a few appearances in the video. Collin is working with Ginger Ogle and John Deck to develop a system for the retrieval, storage, and display of automated time-series data from the sensor network deployed for Keck Hydrowatch. They are implementing a data architecture designed by CUAHSI called the “Observations Data Model” (moving it to the open source MySQL database and extending it to handle logging and other capabilities) and developing the necessary scripts and libraries to input data from multiple sources. In the second phase of the project, a web-based system for viewing and downloading the sensor network data will be developed.

Mashed Museum wiki and group in UK

Monday, June 30th, 2008 by Chris Hoffman

Mike Ellis, frequent presenter at Museums and the Web, recently published a note on the MCN list describing a Mashed Museum day that was recently held in the UK.

————–

Dear MCN

I thought you might be interested to see a brief(ish) video I hacked
together following the MCG “Mashed Museum” day which happened on the
18th June, the day before the UK Museums on the Web Conference.

See http://blip.tv/file/1029060

Further coverage continues at www.mashedmuseum.org.uk

Cheers!

Mike

————

They’ve also set up a Google Group at http://groups.google.com/group/mashedmuseum. From here, there are several other links that might take you in some interesting directions! For instance, check out the hoard.it prototype at http://feeds.boxuk.com/museums/.


UC Berkeley UC Berkeley CIO Campuswide IT Service Providers
Site Map Contact Webmaster